This package contains utility functions that are widely used by functions in all other sub-packages of histomicstk

histomicstk.utils.compute_tile_foreground_fraction(slide_path, im_fgnd_mask_lres, fgnd_seg_scale, it_kwargs, tile_position=None)[source]

Computes the fraction of foreground of a single tile or all tiles in a whole slide image given the binary foreground mask computed from a low resolution version of the slide.

  • slide_path (str) – path to an image or slide
  • im_fgnd_mask_lres (array_like) – A binary foreground mask computed at a low-resolution
  • fgnd_seg_scale (double) – The scale/magnification at which the foreground mask im_fgnd_mask_lres was computed
  • it_kwargs (dict) –
    A dictionary of any key:value parameters (e.g. defining the scale,
    tile_size, region etc) in addition to tile_position that need to be passed to large_image.TileSource.getSingleTile to get the tile.
  • tile_position (int or None) – A linear 0-based index of a tile for which the foreground needs to be computed. If set to None, the foreground fraction of all tiles will be computed.

tile_fgnd_frac – A value between 0 and 1 indicating the fraction of foreground pixels present in the tile indicated by tile_position. If tile_position is set to None, then a 1D array containing the foreground fraction of all tiles will be returned.

Return type:

double or array_like

histomicstk.utils.convert_matrix_to_image(m, shape)[source]

Convert a column matrix of pixels to a 3D image given by shape. The number of channels is taken from m, not shape. If shape has length 2, the matrix is returned unchanged. This is the inverse of convert_image_to_matrix:

im == convert_matrix_to_image(convert_image_to_matrix(im), im.shape)


Convert an image (MxNx3 array) to a column matrix of pixels (3x(M*N)). It will pass through a 2D array unchanged.


Discrete Laplacian with edge-value extrapolation.

Calculates the discrete Laplacian of an input image. Edge values are calculated by using linear extrapolation of second differences. This is consistent with the way that Matlab calculates the discrete Laplacian.

Parameters:im_input (array_like) – A floating-point intensity image.
Returns:im_lap – The discrete Laplacian of im_lap.
Return type:array_like

Calculates the eigenvectors of the hessian volumes ‘H’ generated by Hessian.py

Parameters:im_hess (array_like) – M x N x 4 hessian matrix - H[:,:,0] = dxx, H[:,:,1] = H[:,:,2] = dxy, H[:,:,3] = dyy.
  • lamda (array_like) – M x N x 2 image of eigenvalues.
  • v1 (array_like) – M x N x 2 eigenvector for lamda(:,:,0)
  • v2 – M x N x 2 eigenvector for lamda(:,:,1)

Exclude columns from m that have infinities or nans. In the context of color deconvolution, these occur in conversion from RGB to SDA when the source has 0 in a channel.

histomicstk.utils.gradient_diffusion(im_dx, im_dy, im_fgnd_mask, mu=5, lamda=5, iterations=10, dt=0.05)[source]

Diffusion of gradient field using Navier-Stokes equation. Used for smoothing/denoising a gradient field.

Takes as input a gradient field image (dX, dY), and a mask of the foreground region, and then iteratively solves the Navier-Stokes equation to diffuse the vector field and align noisy gradient vectors with their surrounding signals.

  • im_dx (array_like) – Horizontal component of gradient image.
  • im_dy (array_like) – Vertical component of gradient image.
  • im_fgnd_mask (array_like) – Binary mask where foreground objects have value 1, and background objects have value 0. Used to restrict influence of background vectors on diffusion process.
  • mu (float) – Weight parmeter from Navier-Stokes equation - weights divergence and Laplacian terms. Default value = 5.
  • lamda (float) – Weight parameter from Navier-Stokes equation - used to weight divergence. Default value = 5.
  • iterations (float) – Number of time-steps to use in solving Navier-Stokes. Default value = 10.
  • dt (float) – Timestep to be used in solving Navier-Stokes. Default value = 0.05.

  • im_vx (array_like) – Horizontal component of diffused gradient.
  • im_vy (array_like) – Vertical component of diffused gradient.

See also



[1]G. Li et al “3D cell nuclei segmentation based on gradient flow tracking” in BMC Cell Biology,vol.40,no.8, 2007.
histomicstk.utils.hessian(im_input, sigma)[source]

Calculates hessian of image I convolved with a gaussian kernel with covariance C = [Sigma^2 0; 0 Sigma^2].

  • im_input (array_like) – M x N grayscale image.
  • sigma (double) – standard deviation of gaussian kernel.

im_hess – M x N x 4 hessian matrix - im_hess[:,:,0] = dxx, im_hess[:,:,1] = im_hess[:,:,2] = dxy, im_hess[:,:,3] = dyy.

Return type:


histomicstk.utils.merge_colinear(x, y)[source]

Processes boundary coordinates in polyline with vertices X, Y to remove redundant colinear points. Polyline is not assumed to be open or closed.

  • x (array_like) – One dimensional array of horizontal boundary coordinates.
  • y (array_like) – One dimensional array of vertical boundary coordinates.

  • xout (array_like) – X with colinear boundary points removed.
  • yout (array_like) – Y with colinear boundary points removed.

histomicstk.utils.fit_poisson_mixture(im_input, mu=None, tol=0.1)[source]

Generates a Poisson mixture model to fit pixel intensities for foreground/background masking.

Takes as input an array or intensity image ‘I’ and optimizes a two-component poisson model describing foreground and background intensity models. This model can be used to describe the probability that a pixel comes from foreground versus background. The poisson distribution assumes discrete values and so is suitable for integral valued intensity images. Assumes that foreground intensities are lower (darker) than background.

  • im_input (array_like) – A hematoxylin intensity image obtained from ColorDeconvolution.
  • mu (double) – Optional mean value of signal to optimize. Calculated from input if defined as ‘None’. Default value = None.

  • thresh (double) – Optimal threshold for distinguishing foreground and background.
  • im_fgnd (array_like) – An intensity image with values in the range [0, 1] representing foreground probabiities for each pixel.
  • im_bgnd (array_like) – An intensity image with values in the range [0, 1] representing background probabiities for each pixel.


[2]Y. Al-Kofahi et al “Improved Automatic Detection and Segmentation of Cell Nuclei in Histopathology Images” in IEEE Transactions on Biomedical Engineering,vol.57,no.4,pp.847-52, 2010.
histomicstk.utils.sample_pixels(slide_path, sample_fraction=None, magnification=None, tissue_seg_mag=1.25, min_coverage=0.1, background=False, sample_approximate_total=None, tile_grouping=256)[source]

Generates a sampling of pixels from a whole-slide image.

Useful for generating statistics or Reinhard color-normalization or adaptive deconvolution. Uses mixture modeling approach to focus sampling in tissue regions.

  • slide_path (str) – path and filename of slide.
  • sample_fraction (double) – Fraction of pixels to sample. Must be in the range [0, 1].
  • magnification (double) – Desired magnification for sampling. Default value : None (for native scan magnification).
  • tissue_seg_mag (double, optional) – low resolution magnification at which foreground will be segmented. Default value = 1.25.
  • min_coverage (double, optional) – minimum fraction of tile covered by tissue for it to be included in sampling. Ranges between [0,1). Default value = 0.1.
  • background (bool, optional) – sample the background instead of the foreground if True. min_coverage then refers to the amount of background. Default value = False
  • sample_approximate_total (int, optional) – use instead of sample_fraction to specify roughly how many pixels to sample. The fewer tiles are excluded, the more accurate this will be.
  • tile_grouping (int, optional) – Number of tiles to process as part of a single task.

pixels – A Nx3 matrix of RGB pixel values sampled from the whole-slide.

Return type:



If Dask is configured, it is used to distribute the computation.

histomicstk.utils.simple_mask(im_rgb, bandwidth=2, bgnd_std=2.5, tissue_std=30, min_peak_width=10, max_peak_width=25, fraction=0.1, min_tissue_prob=0.05)[source]

Performs segmentation of the foreground (tissue) Uses a simple two-component Gaussian mixture model to mask tissue areas from background in brightfield H&E images. Kernel-density estimation is used to create a smoothed image histogram, and then this histogram is analyzed to identify modes corresponding to tissue and background. The mode peaks are then analyzed to estimate their width, and a constrained optimization is performed to fit gaussians directly to the histogram (instead of using expectation-maximization directly on the data which is more prone to local minima effects). A maximum-likelihood threshold is then derived and used to mask the tissue area in a binarized image.

  • im_rgb (array_like) – An RGB image of type unsigned char.
  • bandwidth (double, optional) – Bandwidth for kernel density estimation - used for smoothing the grayscale histogram. Default value = 2.
  • bgnd_std (double, optional) – Standard deviation of background gaussian to be used if estimation fails. Default value = 2.5.
  • tissue_std (double, optional) – Standard deviation of tissue gaussian to be used if estimation fails. Default value = 30.
  • min_peak_width (double, optional) – Minimum peak width for finding peaks in KDE histogram. Used to initialize curve fitting process. Default value = 10.
  • max_peak_width (double, optional) – Maximum peak width for finding peaks in KDE histogram. Used to initialize curve fitting process. Default value = 25.
  • fraction (double, optional) – Fraction of pixels to sample for building foreground/background model. Default value = 0.10.
  • min_tissue_prob (double, optional) – Minimum probability to qualify as tissue pixel. Default value = 0.05.

im_mask – A binarized version of I where foreground (tissue) has value ‘1’.

Return type: