histomicstk.features¶

This package contains functions to computing a variety of image-based features that quantify the appearance and/or morphology of an objects/regions in the image. These are needed for classifying objects (e.g. nuclei) and regions (e.g. tissues) found in histopathology images.

histomicstk.features.compute_fsd_features(im_label, K=128, Fs=6, Delta=8, rprops=None)[source]¶

Calculates Fourier shape descriptors for each objects.

Parameters:

im_label (array_like) – A labeled mask image wherein intensity of a pixel is the ID of the object it belongs to. Non-zero values are considered to be foreground objects.
K (int, optional) – Number of points for boundary resampling to calculate fourier descriptors. Default value = 128.
Fs (int, optional) – Number of frequency bins for calculating FSDs. Default value = 6.
Delta (int, optional) – Used to dilate nuclei and define cytoplasm region. Default value = 8.
rprops (output of skimage.measure.regionprops, optional) – rprops = skimage.measure.regionprops( im_label ). If rprops is not passed then it will be computed inside which will increase the computation time.

Returns:

fdata – object/label.

Return type:

Pandas data frame containing the FSD features for each

References

histomicstk.features.compute_global_cell_graph_features(centroids, neighbor_distances=None, neighbor_counts=(3, 5, 7))[source]¶

Compute global (i.e., not per-nucleus) features of the nuclei with the given centroids based on the partitioning of the space into Voronoi cells and on the induced graph structure.

Parameters:

centroids (array_like) – Nx2 numpy array of nuclear centroids
neighbor_distances (array_like) – Radii to count neighbors in
neighbor_counts (sequence) – Sequence of numbers of neighbors, each of which is used to compute statistics relating to the distance required to reach that many neighbors.

Returns:

props – A single-row DataFrame with the following columns:

voronoi_…: Voronoi diagram features
- area_…: Polygon area features
- peri_…: Polygon perimeter features
- max_dist_…: Maximum distance in polygon features
delaunay_…: Delaunay triangulation features
- sides_…: Triangle side length features
- area_…: Triangle area features
mst_branches_…: Minimum spanning tree branch features
density_…: Density features
- neighbors_in_distance_…
  - 0, 1, …, len(neighbor_distances) - 1: Neighbor count within given radius features.
- distance_for_neighbors_…
  - 0, 1, …, len(neighbor_counts) - 1: Minimum distance to enclose count neighbors features

The “…”s are meant to signify that what precedes is the start of a column name. At the end of each column name is one of ‘mean’, ‘stddev’, ‘min_max_ratio’, and ‘disorder’. ‘min_max_ratio’ is the minimum-to-maximum ratio, and disorder is stddev / (mean + stddev).

Return type:

pandas.DataFrame

Note

The indices for the density features are with respect to the sorted values of the corresponding argument sequence.

References

histomicstk.features.compute_gradient_features(im_label, im_intensity, num_hist_bins=10, rprops=None)[source]¶

Calculates gradient features from an intensity image.

Parameters:

im_label (array_like) – A labeled mask image wherein intensity of a pixel is the ID of the object it belongs to. Non-zero values are considered to be foreground objects.
im_intensity (array_like) – Intensity image
num_hist_bins (int, optional) – Number of bins used to computed the gradient histogram of an object. Histogram is used to energy and entropy features. Default is 10.
rprops (output of skimage.measure.regionprops, optional) – rprops = skimage.measure.regionprops( im_label ). If rprops is not passed then it will be computed inside which will increase the computation time.

Returns:

fdata – A pandas dataframe containing the gradient features listed below for each object/label.

Return type:

pandas.DataFrame

Notes

List of gradient features computed by this function:

Gradient.Mag.Meanfloat: Mean of gradient data.
Gradient.Mag.Stdfloat: Standard deviation of gradient data.
Gradient.Mag.Skewnessfloat: Skewness of gradient data. Value is 0 when all values are equal.
Gradient.Mag.Kurtosisfloat: Kurtosis of gradient data. Value is -3 when all values are equal.
Gradient.Mag.HistEnergyfloat: Energy of the gradient magnitude histogram of object pixels
Gradient.Mag.HistEnergyfloat: Entropy of the gradient magnitude histogram of object pixels.
Gradient.Canny.Sumfloat: Sum of canny filtered gradient data.
Gradient.Canny.Meanfloat: Mean of canny filtered gradient data.

References

histomicstk.features.compute_haralick_features(im_label, im_intensity, offsets=None, num_levels=None, gray_limits=None, rprops=None)[source]¶

Calculates 26 Haralick texture features for each object in the given label mask.

These features are derived from gray-level co-occurence matrix (GLCM) that is a two dimensional histogram containing the counts/probabilities of co-occurring intensity values with a given neighborhood offset in the region occupied by an object in the image.

Parameters:

im_label (array_like) – An ND labeled mask image wherein intensity of a pixel is the ID of the object it belongs to. Non-zero values are considered to be foreground objects.
im_intensity (array_like) – An ND single channel intensity image.
offsets (array_like, optional) –
A (num_offsets, num_image_dims) array of offset vectors specifying the distance between the pixel-of-interest and its neighbor. Note that the first dimension corresponds to the rows.

See histomicstk.features.graycomatrixext for more details.
num_levels (unsigned int, optional) –
An integer specifying the number of gray levels For example, if NumLevels is 8, the intensity values of the input image are scaled so they are integers between 1 and 8. The number of gray levels determines the size of the gray-level co-occurrence matrix.

Default: 2 for binary/logical image, 32 for numeric image
gray_limits (array_like, optional) –
A two-element array specifying the desired input intensity range. Intensity values in the input image will be clipped into this range.

Default: [0, 1] for boolean-valued image, [0, 255] for integer-valued image, and [0.0, 1.0] for-real valued image

Returns:

fdata – A pandas dataframe containing the haralick features.

Return type:

pandas.DataFrame

Notes

This function computes the following list of haralick features derived from normalized GLCMs (P) of the given list of neighborhood offsets:

Haralick.ASM.Mean, Haralick.ASM.Rangefloat: Mean and range of the angular second moment (ASM) feature for GLCMs of all offsets. It is a measure of image homogeneity and is computed as follows:

\[ASM = \sum_{i,j=0}^{levels-1} p(i,j)^2\]
Haralick.Contrast.Mean, Haralick.Contrast.Rangefloat: Mean and range of the Contrast feature for GLCMs of all offsets. It is a measure of the amount of variation between intensities of neighboiring pixels. It is equal to zero for a constant image and increases as the amount of variation increases. It is computed as follows:

\[Contrast = \sum_{i,j=0}^{levels-1} (i-j)^2 p(i,j)\]
Haralick.Correlation.Mean, Haralick.Correlation.Rangefloat: Mean and range of the Correlation feature for GLCMs of all offsets. It is a measure of correlation between the intensity values of neighboring pixels. It is computed as follows:

\[Correlation = \sum_{i,j=0}^{levels-1} p(i,j)\left[\frac{(i-\mu_i) (j-\mu_j)}{\sigma_i \sigma_j}\right]\]
Haralick.SumOfSquares.Mean, Haralick.SumOfSquares.Rangefloat: Mean and range of the SumOfSquares feature for GLCMs of all offsets. It is a measure of variance and is computed as follows:

\[SumofSquare = \sum_{i,j=0}^{levels-1} (i - \mu)^2 p(i,j)\]
Haralick.IDM.Mean, Haralick.IDM.Rangefloat: Mean and range of the inverse difference moment (IDM) feature for GLCMS of all offsets. It is a measure of homogeneity and is computed as follows:

\[IDM = \sum_{i,j=0}^{levels-1} \frac{1}{1 + (i - j)^2} p(i,j)\]
Haralick.SumAverage.Mean, Haralick.SumAverage.Rangefloat: Mean and range of sum average feature for GLCMs of all offsets. It is computed as follows:

\[ \begin{align}\begin{aligned}\begin{split}SumAverage = \sum_{k=2}^{2 levels} k p_{x+y}(k), \qquad where \\\end{split}\\\begin{split}p_{x+y}(k) = \sum_{i,j=0}^{levels-1} \delta_{i+j, k} p(i,j) \\\end{split}\\\begin{split}\delta_{m,n} = \left\{ \begin{array}{11} 1 & {\rm when ~} m=n \\ 0 & {\rm when ~} m \ne n \end{array} \right.\end{split}\end{aligned}\end{align} \]
Haralick.SumVariance.Mean, Haralick.SumVariance.Rangefloat: Mean and range of sum variance feature for the GLCMS of all offsets. It is computed as follows:

\[SumVariance = \sum_{k=2}^{2 levels} (k - SumEntropy) p_{x+y}(k)\]
Haralick.SumEntropy.Mean, Haralick.SumEntropy.Rangefloat: Mean and range of the sum entropy features for GLCMS of all offsets. It is computed as follows:

\[SumEntropy = - \sum_{k=2}^{2 levels} p_{x+y}(k) \log(p_{x+y}(k))\]
Haralick.Entropy.Mean, Haralick.Entropy.Rangefloat: Mean and range of the entropy features for GLCMs of all offsets. It is computed as follows:

\[Entropy = - \sum_{i,j=0}^{levels-1} p(i,j) \log(p(i,j))\]
Haralick.DifferenceVariance.Mean, Haralick.DifferenceVariance.Rangefloat: Mean and Range of the difference variance feature of GLCMs of all offsets. It is computed as follows:

\[ \begin{align}\begin{aligned}\begin{split}DifferenceVariance = {\rm variance \ of ~} p_{x-y}, \qquad where \\\end{split}\\p_{x-y}(k) = \sum_{i,j=0}^{levels-1} \delta_{|i-j|, k} p(i,j)\end{aligned}\end{align} \]
Haralick.DifferenceEntropy.Mean, Haralick.DifferenceEntropy.Rangefloat: Mean and range of the difference entropy feature for GLCMS of all offsets. It is computed as follows:

\[DifferenceEntropy = {\rm entropy \ of ~} p_{x-y}\]
Haralick.IMC1.Mean, Haralick.IMC1.Rangefloat: Mean and range of the first information measure of correlation feature for GLCMs of all offsets. It is computed as follows:

\[ \begin{align}\begin{aligned}\begin{split}IMC1 = \frac{HXY - HXY1}{\max(HX,HY)}, \qquad where \\\end{split}\\\begin{split}HXY = -\sum_{i,j=0}^{levels-1} p(i,j) \log(p(i,j)) \\\end{split}\\\begin{split}HXY1 = -\sum_{i,j=0}^{levels-1} p(i,j) \log(p_x(i) p_y(j)) \\\end{split}\\\begin{split}HX = -\sum_{i=0}^{levels-1} p_x(i) \log(p_x(i)) \\\end{split}\\\begin{split}HY = -\sum_{j=0}^{levels-1} p_y(j) \log(p_y(j)) \\\end{split}\\\begin{split}p_x(i) = \sum_{j=1}^{levels} p(i,j) \\\end{split}\\p_y(j) = \sum_{j=1}^{levels} p(i,j)\end{aligned}\end{align} \]
Haralick.IMC2.Mean, Haralick.IMC2.Rangefloat: Mean and range of the second information measure of correlation feature for GLCMs of all offsets. It is computed as follows:

\[ \begin{align}\begin{aligned}\begin{split}IMC2 = [1 - \exp(-2(HXY2 - HXY))]^{1/2}, \qquad where \\\end{split}\\HXY2 = -\sum_{i,j=0}^{levels-1} p_x(i) p_y(j) \log(p_x(i) p_y(j))\end{aligned}\end{align} \]

References

histomicstk.features.compute_intensity_features(im_label, im_intensity, num_hist_bins=10, rprops=None, feature_list=None)[source]¶

Calculate intensity features from an intensity image.

Parameters:

im_label (array_like) – A labeled mask image wherein intensity of a pixel is the ID of the object it belongs to. Non-zero values are considered to be foreground objects.
im_intensity (array_like) – Intensity image.
num_hist_bins (int, optional) – Number of bins used to computed the intensity histogram of an object. Histogram is used to energy and entropy features. Default is 10.
rprops (output of skimage.measure.regionprops, optional) – rprops = skimage.measure.regionprops( im_label ). If rprops is not passed then it will be computed inside which will increase the computation time.
feature_list (list, default is None) – list of intensity features to return. If none, all intensity features are returned.

Returns:

fdata – A pandas dataframe containing the intensity features listed below for each object/label.

Return type:

pandas.DataFrame

Notes

List of intensity features computed by this function:

Intensity.Minfloat: Minimum intensity of object pixels.
Intensity.Maxfloat: Maximum intensity of object pixels.
Intensity.Meanfloat: Mean intensity of object pixels
Intensity.Medianfloat: Median intensity of object pixels
Intensity.MeanMedianDifffloat: Difference between mean and median intensities of object pixels.
Intensity.Stdfloat: Standard deviation of the intensities of object pixels
Intensity.IQR: float: Inter-quartile range of the intensities of object pixels
Intensity.MAD: float: Median absolute deviation of the intensities of object pixels
Intensity.Skewnessfloat: Skewness of the intensities of object pixels. Value is 0 when all intensity values are equal.
Intensity.Kurtosisfloat: Kurtosis of the intensities of object pixels. Value is -3 when all values are equal.
Intensity.HistEnergyfloat: Energy of the intensity histogram of object pixels
Intensity.HistEntropyfloat: Entropy of the intensity histogram of object pixels.

References

histomicstk.features.compute_morphometry_features(im_label, rprops=None)[source]¶

Calculate morphometry features for each object

Parameters:

im_label (array_like) – A labeled mask image wherein intensity of a pixel is the ID of the object it belongs to. Non-zero values are considered to be foreground objects.
rprops (output of skimage.measure.regionprops, optional) – rprops = skimage.measure.regionprops( im_label ). If rprops is not passed then it will be computed inside which will increase the computation time.

Returns:

fdata – A pandas dataframe containing the morphometry features for each object/label listed below.

Return type:

pandas.DataFrame

Notes

List of morphometry features computed by this function:

Orientation.Orientationfloat: Angle between the horizontal axis and the major axis of the ellipse that has the same second moments as the region, ranging from -pi/2 to pi/2 counter-clockwise.
Size.Areaint: Number of pixels the object occupies.
Size.ConvexHullAreaint: Number of pixels of convex hull image, which is the smallest convex polygon that encloses the region.
Size.MajorAxisLengthfloat: The length of the major axis of the ellipse that has the same normalized second central moments as the object.
Size.MinorAxisLengthfloat: The length of the minor axis of the ellipse that has the same normalized second central moments as the region.
Size.Perimeterfloat: Perimeter of object which approximates the contour as a line through the centers of border pixels using a 4-connectivity.
Shape.Circularity: float: A measure of how similar the shape of an object is to the circle
Shape.Eccentricityfloat: A measure of aspect ratio computed to be the eccentricity of the ellipse that has the same second-moments as the object region. Eccentricity of an ellipse is the ratio of the focal distance (distance between focal points) over the major axis length. The value is in the interval [0, 1). When it is 0, the ellipse becomes a circle.
Shape.EquivalentDiameterfloat: The diameter of a circle with the same area as the object.
Shape.Extentfloat: Ratio of area of the object to its axis-aligned bounding box.
Shape.FractalDimensionfloat: Minkowski–Bouligand dimension, aka. the box-counting dimension. It is a measure of boundary complexity. See https://en.wikipedia.org/wiki/Minkowski%E2%80%93Bouligand_dimension
Shape.MinorMajorAxisRatiofloat: A measure of aspect ratio. Ratio of minor to major axis of the ellipse that has the same second-moments as the object region
Shape.Solidityfloat: A measure of convexity computed as the ratio of the number of pixels in the object to that of its convex hull.
Shape.HuMoments-kfloat: Where k ranges from 1-7 are the 7 Hu moments features. The first six moments are translation, scale and rotation invariant, while the seventh moment flips its sign if the shape is a mirror image. See https://learnopencv.com/shape-matching-using-hu-moments-c-python/
Shape.WeightedHuMoments-kfloat: Same as Hu moments, but instead of using the binary mask, using the intensity image.

histomicstk.features.compute_nuclei_features(im_label, im_nuclei=None, im_cytoplasm=None, fsd_bnd_pts=128, fsd_freq_bins=6, cyto_width=8, num_glcm_levels=32, morphometry_features_flag=True, fsd_features_flag=True, intensity_features_flag=True, gradient_features_flag=True, haralick_features_flag=True, tile_info=None, im_nuclei_seg_mask=None, format=None, return_nuclei_annotation=False)[source]¶

Calculates features for nuclei classification

Parameters:

im_label (array_like) – A labeled mask image wherein intensity of a pixel is the ID of the object it belongs to. Non-zero values are considered to be foreground objects.
im_nuclei (array_like) – Nucleus channel intensity image.
im_cytoplasm (array_like) – Cytoplasm channel intensity image.
fsd_bnd_pts (int, optional) – Number of points for boundary resampling to calculate fourier descriptors. Default value = 128.
fsd_freq_bins (int, optional) – Number of frequency bins for calculating FSDs. Default value = 6.
cyto_width (float, optional) – Estimated width of the ring-like neighborhood region around each nucleus to be considered as its cytoplasm. Default value = 8.
num_glcm_levels (int, optional) –
An integer specifying the number of gray levels For example, if NumLevels is 32, the intensity values of the input image are scaled so they are integers between 0 and 31. The number of gray levels determines the size of the gray-level co-occurrence matrix.

Default: 32
morphometry_features_flag (bool, optional) – A flag that can be used to specify whether or not to compute morphometry (size and shape) features. See histomicstk.features.compute_morphometry_features for more details.
fsd_features_flag (bool, optional) – A flag that can be used to specify whether or not to compute Fouried shape descriptor (FSD) features. See histomicstk.features.compute_fsd_features for more details.
intensity_features_flag (bool, optional) – A flag that can be used to specify whether or not to compute intensity features from the nucleus and cytoplasm channels. See histomicstk.features.compute_fsd_features for more details.
gradient_features_flag (bool, optional) – A flag that can be used to specify whether or not to compute gradient/edge features from intensity and cytoplasm channels. See histomicstk.features.compute_gradient_features for more details.
haralick_features_flag (bool, optional) – A flag that can be used to specify whether or not to compute haralick features from intensity and cytoplasm channels. See histomicstk.features.compute_haralick_features for more details.
return_nuclei_annotation (bool, optional) – Returns the nuclei annotation if kept True

Returns:

fdata (pandas.DataFrame) – A pandas data frame containing the features listed below for each object/label
nuclei_annot_list (List) – List containing the boundaries of segmented nuclei in the input image.

Notes

List of features computed by this function

Identifier

Location of the nucleus and its code in the input labeled mask. Columns are prefixed by Identifier.. These include …

Identifier.Label (int) - nucleus label in the input labeled mask

Identifier.Xmin (int) - Left bound

Identifier.Ymin (int) - Upper bound

Identifier.Xmax (int) - Right bound

Identifier.Ymax (int) - Lower bound

Identifier.CentroidX (float) - X centroid (columns)

Identifier.CentroidY (float) - Y centroid (rows)

Identifier.WeightedCentroidX (float) - intensity-weighted X centroid

Identifier.WeightedCentroidY (float) - intensity-weighted Y centroid

Morphometry (size, shape, and orientation) features of the nuclei

See histomicstk.features.compute_morphometry_features for more details. Feature names prefixed by Size., Shape., or Orientation..

Fourier shape descriptor features

See histomicstk.features.compute_fsd_features for more details. Feature names are prefixed by FSD.

Intensity features for the nucleus and cytoplasm channels

See histomicstk.features.compute_fsd_features for more details. Feature names are prefixed by Nucleus.Intensity. for nucleus features and Cytoplasm.Intensity. for cytoplasm features.

Gradient/edge features for the nucleus and cytoplasm channels

See histomicstk.features.compute_gradient_features for more details. Feature names are prefixed by Nucleus.Gradient. for nucleus features and Cytoplasm.Gradient. for cytoplasm features.

Haralick features for the nucleus and cytoplasm channels

See histomicstk.features.compute_haralick_features for more details. Feature names are prefixed by Nucleus.Haralick. for nucleus features and Cytoplasm.Haralick. for cytoplasm features.

Angle (deg)	offset [y, x]
0	[0 D]
45	[-D D]
90	[-D 0]
135	[-D -D]