# histomicstk.features¶

This package contains functions to computing a variety of image-based features that quantify the appearance and/or morphology of an objects/regions in the image. These are needed for classifying objects (e.g. nuclei) and regions (e.g. tissues) found in histopathology images.

histomicstk.features.compute_fsd_features(im_label, K=128, Fs=6, Delta=8, rprops=None)[source]

Calculates Fourier shape descriptors for each objects.

Parameters
• im_label (array_like) – A labeled mask image wherein intensity of a pixel is the ID of the object it belongs to. Non-zero values are considered to be foreground objects.

• K (int, optional) – Number of points for boundary resampling to calculate fourier descriptors. Default value = 128.

• Fs (int, optional) – Number of frequency bins for calculating FSDs. Default value = 6.

• Delta (int, optional) – Used to dilate nuclei and define cytoplasm region. Default value = 8.

• rprops (output of skimage.measure.regionprops, optional) – rprops = skimage.measure.regionprops( im_label ). If rprops is not passed then it will be computed inside which will increase the computation time.

Returns

fdata – object/label.

Return type

Pandas data frame containing the FSD features for each

References

1

D. Zhang et al. “A comparative study on shape retrieval using Fourier descriptors with different shape signatures,” In Proc. ICIMADE01, 2001.

histomicstk.features.compute_global_cell_graph_features(centroids, neighbor_distances=array([10., 20., 30., 40., 50.]), neighbor_counts=(3, 5, 7))[source]

Compute global (i.e., not per-nucleus) features of the nuclei with the given centroids based on the partitioning of the space into Voronoi cells and on the induced graph structure.

Parameters
• centroids (array_like) – Nx2 numpy array of nuclear centroids

• neighbor_distances (array_like) – Radii to count neighbors in

• neighbor_counts (sequence) – Sequence of numbers of neighbors, each of which is used to compute statistics relating to the distance required to reach that many neighbors.

Returns

props – A single-row DataFrame with the following columns:

• voronoi_…: Voronoi diagram features

• area_…: Polygon area features

• peri_…: Polygon perimeter features

• max_dist_…: Maximum distance in polygon features

• delaunay_…: Delaunay triangulation features

• sides_…: Triangle side length features

• area_…: Triangle area features

• mst_branches_…: Minimum spanning tree branch features

• density_…: Density features

• neighbors_in_distance_…

• 0, 1, …, len(neighbor_distances) - 1: Neighbor count within given radius features.

• distance_for_neighbors_…

• 0, 1, …, len(neighbor_counts) - 1: Minimum distance to enclose count neighbors features

The “…”s are meant to signify that what precedes is the start of a column name. At the end of each column name is one of ‘mean’, ‘stddev’, ‘min_max_ratio’, and ‘disorder’. ‘min_max_ratio’ is the minimum-to-maximum ratio, and disorder is stddev / (mean + stddev).

Return type

pandas.DataFrame

Note

The indices for the density features are with respect to the sorted values of the corresponding argument sequence.

References

2

Doyle, S., Agner, S., Madabhushi, A., Feldman, M., & Tomaszewski, J. (2008, May). Automated grading of breast cancer histopathology using spectral clustering with textural and architectural image features. In Biomedical Imaging: From Nano to Macro, 2008. ISBI 2008. 5th IEEE International Symposium on (pp. 496-499). IEEE.

Calculates gradient features from an intensity image.

Parameters
• im_label (array_like) – A labeled mask image wherein intensity of a pixel is the ID of the object it belongs to. Non-zero values are considered to be foreground objects.

• im_intensity (array_like) – Intensity image

• num_hist_bins (int, optional) – Number of bins used to computed the gradient histogram of an object. Histogram is used to energy and entropy features. Default is 10.

• rprops (output of skimage.measure.regionprops, optional) – rprops = skimage.measure.regionprops( im_label ). If rprops is not passed then it will be computed inside which will increase the computation time.

Returns

fdata – A pandas dataframe containing the gradient features listed below for each object/label.

Return type

pandas.DataFrame

Notes

List of gradient features computed by this function:

Skewness of gradient data. Value is 0 when all values are equal.

Kurtosis of gradient data. Value is -3 when all values are equal.

Energy of the gradient magnitude histogram of object pixels

Entropy of the gradient magnitude histogram of object pixels.

Sum of canny filtered gradient data.

Mean of canny filtered gradient data.

References

3

Daniel Zwillinger and Stephen Kokoska. “CRC standard probability and statistics tables and formulae,” Crc Press, 1999.

histomicstk.features.compute_haralick_features(im_label, im_intensity, offsets=None, num_levels=None, gray_limits=None, rprops=None)[source]

Calculates 26 Haralick texture features for each object in the given label mask.

These features are derived from gray-level co-occurence matrix (GLCM) that is a two dimensional histogram containing the counts/probabilities of co-occurring intensity values with a given neighborhood offset in the region occupied by an object in the image.

Parameters
• im_label (array_like) – An ND labeled mask image wherein intensity of a pixel is the ID of the object it belongs to. Non-zero values are considered to be foreground objects.

• im_intensity (array_like) – An ND single channel intensity image

• offsets (array_like, optional) –

A (num_offsets, num_image_dims) array of offset vectors specifying the distance between the pixel-of-interest and its neighbor. Note that the first dimension corresponds to the rows.

See histomicstk.features.graycomatrixext for more details.

• num_levels (unsigned int, optional) –

An integer specifying the number of gray levels For example, if NumLevels is 8, the intensity values of the input image are scaled so they are integers between 1 and 8. The number of gray levels determines the size of the gray-level co-occurrence matrix.

Default: 2 for binary/logical image, 32 for numeric image

• gray_limits (array_like, optional) –

A two-element array specifying the desired input intensity range. Intensity values in the input image will be clipped into this range.

Default: [0, 1] for boolean-valued image, [0, 255] for integer-valued image, and [0.0, 1.0] for-real valued image

Returns

fdata – A pandas dataframe containing the haralick features.

Return type

pandas.DataFrame

Notes

This function computes the following list of haralick features derived from normalized GLCMs (P) of the given list of neighborhood offsets:

Haralick.ASM.Mean, Haralick.ASM.Rangefloat

Mean and range of the angular second moment (ASM) feature for GLCMs of all offsets. It is a measure of image homogeneity and is computed as follows:

$ASM = \sum_{i,j=0}^{levels-1} p(i,j)^2$
Haralick.Contrast.Mean, Haralick.Contrast.Rangefloat

Mean and range of the Contrast feature for GLCMs of all offsets. It is a measure of the amount of variation between intensities of neighboiring pixels. It is equal to zero for a constant image and increases as the amount of variation increases. It is computed as follows:

$Contrast = \sum_{i,j=0}^{levels-1} (i-j)^2 p(i,j)$
Haralick.Correlation.Mean, Haralick.Correlation.Rangefloat

Mean and range of the Correlation feature for GLCMs of all offsets. It is a measure of correlation between the intensity values of neighboring pixels. It is computed as follows:

$Correlation = \sum_{i,j=0}^{levels-1} p(i,j)\left[\frac{(i-\mu_i) (j-\mu_j)}{\sigma_i \sigma_j}\right]$
Haralick.SumOfSquares.Mean, Haralick.SumOfSquares.Rangefloat

Mean and range of the SumOfSquares feature for GLCMs of all offsets. It is a measure of variance and is computed as follows:

$SumofSquare = \sum_{i,j=0}^{levels-1} (i - \mu)^2 p(i,j)$
Haralick.IDM.Mean, Haralick.IDM.Rangefloat

Mean and range of the inverse difference moment (IDM) feature for GLCMS of all offsets. It is a measure of homogeneity and is computed as follows:

$IDM = \sum_{i,j=0}^{levels-1} \frac{1}{1 + (i - j)^2} p(i,j)$
Haralick.SumAverage.Mean, Haralick.SumAverage.Rangefloat

Mean and range of sum average feature for GLCMs of all offsets. It is computed as follows:

\begin{align}\begin{aligned}\begin{split}SumAverage = \sum_{k=2}^{2 levels} k p_{x+y}(k), \qquad where \\\end{split}\\\begin{split}p_{x+y}(k) = \sum_{i,j=0}^{levels-1} \delta_{i+j, k} p(i,j) \\\end{split}\\\begin{split}\delta_{m,n} = \left\{ \begin{array}{11} 1 & {\rm when ~} m=n \\ 0 & {\rm when ~} m \ne n \end{array} \right.\end{split}\end{aligned}\end{align}
Haralick.SumVariance.Mean, Haralick.SumVariance.Rangefloat

Mean and range of sum variance feature for the GLCMS of all offsets. It is computed as follows:

$SumVariance = \sum_{k=2}^{2 levels} (k - SumEntropy) p_{x+y}(k)$
Haralick.SumEntropy.Mean, Haralick.SumEntropy.Rangefloat

Mean and range of the sum entropy features for GLCMS of all offsets. It is computed as follows:

$SumEntropy = - \sum_{k=2}^{2 levels} p_{x+y}(k) \log(p_{x+y}(k))$
Haralick.Entropy.Mean, Haralick.Entropy.Rangefloat

Mean and range of the entropy features for GLCMs of all offsets. It is computed as follows:

$Entropy = - \sum_{i,j=0}^{levels-1} p(i,j) \log(p(i,j))$
Haralick.DifferenceVariance.Mean, Haralick.DifferenceVariance.Rangefloat

Mean and Range of the difference variance feature of GLCMs of all offsets. It is computed as follows:

\begin{align}\begin{aligned}\begin{split}DifferenceVariance = {\rm variance \ of ~} p_{x-y}, \qquad where \\\end{split}\\p_{x-y}(k) = \sum_{i,j=0}^{levels-1} \delta_{|i-j|, k} p(i,j)\end{aligned}\end{align}
Haralick.DifferenceEntropy.Mean, Haralick.DifferenceEntropy.Rangefloat

Mean and range of the difference entropy feature for GLCMS of all offsets. It is computed as follows:

$DifferenceEntropy = {\rm entropy \ of ~} p_{x-y}$
Haralick.IMC1.Mean, Haralick.IMC1.Rangefloat

Mean and range of the first information measure of correlation feature for GLCMs of all offsets. It is computed as follows:

\begin{align}\begin{aligned}\begin{split}IMC1 = \frac{HXY - HXY1}{\max(HX,HY)}, \qquad where \\\end{split}\\\begin{split}HXY = -\sum_{i,j=0}^{levels-1} p(i,j) \log(p(i,j)) \\\end{split}\\\begin{split}HXY1 = -\sum_{i,j=0}^{levels-1} p(i,j) \log(p_x(i) p_y(j)) \\\end{split}\\\begin{split}HX = -\sum_{i=0}^{levels-1} p_x(i) \log(p_x(i)) \\\end{split}\\\begin{split}HY = -\sum_{j=0}^{levels-1} p_y(j) \log(p_y(j)) \\\end{split}\\\begin{split}p_x(i) = \sum_{j=1}^{levels} p(i,j) \\\end{split}\\p_y(j) = \sum_{j=1}^{levels} p(i,j)\end{aligned}\end{align}
Haralick.IMC2.Mean, Haralick.IMC2.Rangefloat

Mean and range of the second information measure of correlation feature for GLCMs of all offsets. It is computed as follows:

\begin{align}\begin{aligned}\begin{split}IMC2 = [1 - \exp(-2(HXY2 - HXY))]^{1/2}, \qquad where \\\end{split}\\HXY2 = -\sum_{i,j=0}^{levels-1} p_x(i) p_y(j) \log(p_x(i) p_y(j))\end{aligned}\end{align}

References

4

Haralick, et al. “Textural features for image classification,” IEEE Transactions on Systems, Man, and Cybernatics, vol. 6, pp: 610-621, 1973.

5

Luis Pedro Coelho. “Mahotas: Open source software for scriptable computer vision,” Journal of Open Research Software, vol 1, 2013.

histomicstk.features.compute_intensity_features(im_label, im_intensity, num_hist_bins=10, rprops=None, feature_list=None)[source]

Calculate intensity features from an intensity image.

Parameters
• im_label (array_like) – A labeled mask image wherein intensity of a pixel is the ID of the object it belongs to. Non-zero values are considered to be foreground objects.

• im_intensity (array_like) – Intensity image.

• num_hist_bins (int, optional) – Number of bins used to computed the intensity histogram of an object. Histogram is used to energy and entropy features. Default is 10.

• rprops (output of skimage.measure.regionprops, optional) – rprops = skimage.measure.regionprops( im_label ). If rprops is not passed then it will be computed inside which will increase the computation time.

• feature_list (list, default is None) – list of intensity features to return. If none, all intensity features are returned.

Returns

fdata – A pandas dataframe containing the intensity features listed below for each object/label.

Return type

pandas.DataFrame

Notes

List of intensity features computed by this function:

Intensity.Minfloat

Minimum intensity of object pixels.

Intensity.Maxfloat

Maximum intensity of object pixels.

Intensity.Meanfloat

Mean intensity of object pixels

Intensity.Medianfloat

Median intensity of object pixels

Intensity.MeanMedianDifffloat

Difference between mean and median intensities of object pixels.

Intensity.Stdfloat

Standard deviation of the intensities of object pixels

Intensity.IQR: float

Inter-quartile range of the intensities of object pixels

Median absolute deviation of the intensities of object pixels

Intensity.Skewnessfloat

Skewness of the intensities of object pixels. Value is 0 when all intensity values are equal.

Intensity.Kurtosisfloat

Kurtosis of the intensities of object pixels. Value is -3 when all values are equal.

Intensity.HistEnergyfloat

Energy of the intensity histogram of object pixels

Intensity.HistEntropyfloat

Entropy of the intensity histogram of object pixels.

References

6

Daniel Zwillinger and Stephen Kokoska. “CRC standard probability and statistics tables and formulae,” Crc Press, 1999.

histomicstk.features.compute_morphometry_features(im_label, rprops=None)[source]

Calculate morphometry features for each object

Parameters
• im_label (array_like) – A labeled mask image wherein intensity of a pixel is the ID of the object it belongs to. Non-zero values are considered to be foreground objects.

• rprops (output of skimage.measure.regionprops, optional) – rprops = skimage.measure.regionprops( im_label ). If rprops is not passed then it will be computed inside which will increase the computation time.

Returns

fdata – A pandas dataframe containing the morphometry features for each object/label listed below.

Return type

pandas.DataFrame

Notes

List of morphometry features computed by this function:

Orientation.Orientationfloat

Angle between the horizonal axis and the major axis of the ellipse that has the same second moments as the region, ranging from -pi/2 to pi/2 counter-clockwise.

Size.Areaint

Number of pixels the object occupies.

Size.ConvexHullAreaint

Number of pixels of convex hull image, which is the smallest convex polygon that encloses the region.

Size.MajorAxisLengthfloat

The length of the major axis of the ellipse that has the same normalized second central moments as the object.

Size.MinorAxisLengthfloat

The length of the minor axis of the ellipse that has the same normalized second central moments as the region.

Size.Perimeterfloat

Perimeter of object which approximates the contour as a line through the centers of border pixels using a 4-connectivity.

Shape.Circularity: float

A measure of how similar the shape of an object is to the circle

Shape.Eccentricityfloat

A measure of aspect ratio computed to be the eccentricity of the ellipse that has the same second-moments as the object region. Eccentricity of an ellipse is the ratio of the focal distance (distance between focal points) over the major axis length. The value is in the interval [0, 1). When it is 0, the ellipse becomes a circle.

Shape.EquivalentDiameterfloat

The diameter of a circle with the same area as the object.

Shape.Extentfloat

Ratio of area of the object to its axis-aligned bounding box.

Shape.FractalDimensionfloat

Minkowski–Bouligand dimension, aka. the box-counting dimension. It is a measure of boundary complexity. See https://en.wikipedia.org/wiki/Minkowski%E2%80%93Bouligand_dimension

Shape.MinorMajorAxisRatiofloat

A measure of aspect ratio. Ratio of minor to major axis of the ellipse that has the same second-moments as the object region

Shape.Solidityfloat

A measure of convexity computed as the ratio of the number of pixels in the object to that of its convex hull.

Shape.HuMoments-kfloat

Where k ranges from 1-7 are the 7 Hu moments features. The first six moments are translation, scale and rotation invariant, while the seventh moment flips its sign if the shape is a mirror image. See https://learnopencv.com/shape-matching-using-hu-moments-c-python/

Shape.WeightedHuMoments-kfloat

Same as Hu moments, but instead of using the binary mask, using the intensity image.

histomicstk.features.compute_nuclei_features(im_label, im_nuclei=None, im_cytoplasm=None, fsd_bnd_pts=128, fsd_freq_bins=6, cyto_width=8, num_glcm_levels=32, morphometry_features_flag=True, fsd_features_flag=True, intensity_features_flag=True, gradient_features_flag=True, haralick_features_flag=True)[source]

Calculates features for nuclei classification

Parameters
• im_label (array_like) – A labeled mask image wherein intensity of a pixel is the ID of the object it belongs to. Non-zero values are considered to be foreground objects.

• im_nuclei (array_like) – Nucleus channel intensity image.

• im_cytoplasm (array_like) – Cytoplasm channel intensity image.

• fsd_bnd_pts (int, optional) – Number of points for boundary resampling to calculate fourier descriptors. Default value = 128.

• fsd_freq_bins (int, optional) – Number of frequency bins for calculating FSDs. Default value = 6.

• cyto_width (float, optional) – Estimated width of the ring-like neighborhood region around each nucleus to be considered as its cytoplasm. Default value = 8.

• num_glcm_levels (int, optional) –

An integer specifying the number of gray levels For example, if NumLevels is 32, the intensity values of the input image are scaled so they are integers between 0 and 31. The number of gray levels determines the size of the gray-level co-occurrence matrix.

Default: 32

• morphometry_features_flag (bool, optional) – A flag that can be used to specify whether or not to compute morphometry (size and shape) features. See histomicstk.features.compute_morphometry_features for more details.

• fsd_features_flag (bool, optional) – A flag that can be used to specify whether or not to compute Fouried shape descriptor (FSD) features. See histomicstk.features.compute_fsd_features for more details.

• intensity_features_flag (bool, optional) – A flag that can be used to specify whether or not to compute intensity features from the nucleus and cytoplasm channels. See histomicstk.features.compute_fsd_features for more details.

• gradient_features_flag (bool, optional) – A flag that can be used to specify whether or not to compute gradient/edge features from intensity and cytoplasm channels. See histomicstk.features.compute_gradient_features for more details.

• haralick_features_flag (bool, optional) – A flag that can be used to specify whether or not to compute haralick features from intensity and cytoplasm channels. See histomicstk.features.compute_haralick_features for more details.

Returns

fdata – A pandas data frame containing the features listed below for each object/label

Return type

pandas.DataFrame

Notes

List of features computed by this function

Identifier

Location of the nucleus and its code in the input labeled mask. Columns are prefixed by Identifier.. These include …

Identifier.Label (int) - nucleus label in the input labeled mask

Identifier.Xmin (int) - Left bound

Identifier.Ymin (int) - Upper bound

Identifier.Xmax (int) - Right bound

Identifier.Ymax (int) - Lower bound

Identifier.CentroidX (float) - X centroid (columns)

Identifier.CentroidY (float) - Y centroid (rows)

Identifier.WeightedCentroidX (float) - intensity-weighted X centroid

Identifier.WeightedCentroidY (float) - intensity-weighted Y centroid

Morphometry (size, shape, and orientation) features of the nuclei

See histomicstk.features.compute_morphometry_features for more details. Feature names prefixed by Size., Shape., or Orientation..

Fourier shape descriptor features

See histomicstk.features.compute_fsd_features for more details. Feature names are prefixed by FSD.

Intensity features for the nucleus and cytoplasm channels

See histomicstk.features.compute_fsd_features for more details. Feature names are prefixed by Nucleus.Intensity. for nucleus features and Cytoplasm.Intensity. for cytoplasm features.

Gradient/edge features for the nucleus and cytoplasm channels

See histomicstk.features.compute_gradient_features for more details. Feature names are prefixed by Nucleus.Gradient. for nucleus features and Cytoplasm.Gradient. for cytoplasm features.

Haralick features for the nucleus and cytoplasm channels

See histomicstk.features.compute_haralick_features for more details. Feature names are prefixed by Nucleus.Haralick. for nucleus features and Cytoplasm.Haralick. for cytoplasm features.

histomicstk.features.graycomatrixext(im_input, im_roi_mask=None, offsets=None, num_levels=None, gray_limits=None, symmetric=False, normed=False, exclude_boundary=False)[source]

Computes gray-level co-occurence matrix (GLCM) within a region of interest (ROI) of an image. GLCM is a 2D histogram/matrix containing the counts/probabilities of co-occuring intensity values at a given offset within an ROI of an image.

Read the documentation to know the default values used for each of the optional parameter in different scenarios.

Parameters
• im_input (array_like) – Input single channel intensity image

A binary mask specifying the region of interest within which to compute the GLCM. If not specified GLCM is computed for the the entire image.

Default: None

• offsets (array_like, optional) –

A (num_offsets, num_image_dims) array of offset vectors specifying the distance between the pixel-of-interest and its neighbor. Note that the first dimension corresponds to the rows.

Because this offset is often expressed as an angle, the following table lists the offset values that specify common angles for a 2D image, given the pixel distance D.

Angle (deg)

offset [y, x]

0

[0 D]

45

[-D D]

90

[-D 0]

135

[-D -D]

Default - 1D: np.array([1]) - 2D : numpy.array([ [1, 0], [0, 1], [1, 1], [1, -1] ]) - 3D and higher: numpy.identity(num_image_dims)

• num_levels (unsigned int, optional) –

An integer specifying the number of gray levels For example, if NumLevels is 8, the intensity values of the input image are scaled so they are integers between 1 and 8. The number of gray levels determines the size of the gray-level co-occurrence matrix.

Default: 2 for binary/logical image, 32 for numeric image

• gray_limits (array_like, optional) –

A two-element array specifying the desired input intensity range. Intensity values in the input image will be clipped into this range.

Default: [0, 1] for boolean-valued image, [0, 255] for integer-valued image, and [0.0, 1.0] for-real valued image

• symmetric (bool, optional) –

A boolean value that specifies whether or not the ordering of values in pixel pairs is considered while creating the GLCM matrix.

For example, if Symmetric is True, then while calculating the number of times the value 1 is adjacent to the value 2, both 1,2 and 2,1 pairings are counted. GLCM created in this way is symmetric across its diagonal.

Default: False

• normed (bool, optional) –

A boolean value specifying whether or not to normalize glcm.

Default: False

• exclude_boundary (bool, optional) –

Specifies whether or not to exclude a pixel-pair if the neighboring pixel in the pair is outside im_roi_mask. Has an effect only when im_roi_mask is specified.

Default: False

Returns

glcm – num_levels x num_levels x num_offsets array containing the GLCM for each offset.

Return type

array_like

References

7

Haralick, R.M., K. Shanmugan, and I. Dinstein, “Textural Features for Image Classification”, IEEE Transactions on Systems, Man, and Cybernetics, Vol. SMC-3, 1973, pp. 610-621.

8

Haralick, R.M., and L.G. Shapiro. Computer and Robot Vision: Vol. 1, Addison-Wesley, 1992, p. 459.