Finding cellular regions with superpixel analysis¶
Overview:
Whole-slide images often contain artifacts like marker or acellular regions that need to be avoided during analysis. In this example we show how HistomicsTK can be used to develop saliency detection algorithms that segment the slide at low magnification to generate a map to guide higher magnification analyses. Here we show how superpixel analysis can be used to locate hypercellular regions that correspond to tumor-rich content.
This uses Simple Linear Iterative Clustering (SLIC) to get superpixels at a low slide magnification to detect cellular regions. The first step of this pipeline detects tissue regions (i.e. individual tissue pieces) using the get_tissue_mask
method of the histomicstk.saliency
module. Then, each tissue piece is processed separately for accuracy and disk space efficiency. It is important to keep in mind that this does NOT rely on a tile iterator, but loads the entire tissue region (but NOT
the whole slide) in memory and passes it on to skimage.segmentation.slic
method. Not using a tile iterator helps keep the superpixel sizes large enough to correspond to tissue boundaries.
Once superpixels are segmented, the image is deconvolved and features are extracted from the hematoxylin channel. Features include intensity and possibly also texture features. Then, a mixed component Gaussian mixture model is fit to the features, and median intensity is used to rank superpixel clusters by ‘cellularity’ (since we are working with the hematoxylin channel).
Note that the decison to fit a gaussian mixture model instead of using K-means clustering is a design choice. If you’d like to experiment, feel free to try other methods of classifying superpixels into clusters using other approaches.
Additional functionality includes contour extraction to get the final segmentation boundaries of cellular regions and to visualize them in HistomicsUI using one’s preferred colormap.
Here are some sample results:
From left to right: Slide thumbnail, superpixel classifications, contiguous cellular/acellular regions
Where to look?
|_ histomicstk/
|_saliency/
|_cellularity_detection.py
|_tests/
|_test_saliency.py
[1]:
import tempfile
import girder_client
import numpy as np
from histomicstk.annotations_and_masks.annotation_and_mask_utils import (
delete_annotations_in_slide)
from histomicstk.saliency.cellularity_detection_superpixels import (
Cellularity_detector_superpixels)
import matplotlib.pylab as plt
from matplotlib.colors import ListedColormap
%matplotlib inline
# color map
vals = np.random.rand(256,3)
vals[0, ...] = [0.9, 0.9, 0.9]
cMap = ListedColormap(1 - vals)
Prepwork¶
[2]:
APIURL = 'http://candygram.neurology.emory.edu:8080/api/v1/'
SAMPLE_SLIDE_ID = '5d586d76bd4404c6b1f286ae'
# SAMPLE_SLIDE_ID = "5d8c296cbd4404c6b1fa5572"
gc = girder_client.GirderClient(apiUrl=APIURL)
gc.authenticate(apiKey='kri19nTIGOkWH01TbzRqfohaaDWb6kPecRqGmemb')
# This is where the run logs will be saved
logging_savepath = tempfile.mkdtemp()
# color normalization values from TCGA-A2-A3XS-DX1
cnorm_thumbnail = {
'mu': np.array([9.24496373, -0.00966569, 0.01757247]),
'sigma': np.array([0.35686209, 0.02566772, 0.02500282]),
}
# from the ROI in Amgad et al, 2019
cnorm_main = {
'mu': np.array([8.74108109, -0.12440419, 0.0444982]),
'sigma': np.array([0.6135447, 0.10989545, 0.0286032]),
}
[3]:
# deleting existing annotations in target slide (if any)
delete_annotations_in_slide(gc, SAMPLE_SLIDE_ID)
Initialize the cellularity detector¶
[4]:
print(Cellularity_detector_superpixels.__init__.__doc__)
Init Cellularity_Detector_Superpixels object.
Arguments:
-----------
gc : object
girder client object
slide_id : str
girder ID of slide
verbose : int
0 - Do not print to screen
1 - Print only key messages
2 - Print everything to screen
3 - print everything including from inner functions
monitorPrefix : str
text to prepend to printed statements
logging_savepath : str or None
where to save run logs
suppress_warnings : bool
whether to suppress warnings
cnorm_params : dict
Reinhard color normalization parameters. Accepted keys: thumbnail
and main (since thumbnail normalization is different from color
normalization of tissue at target magnification. Each entry is a
dict containing values for mu and sigma. This is either given
here or can be set using self.set_color_normalization_values().
May be left unset if you do not want to normalize.
get_tissue_mask_kwargs : dict
kwargs for the get_tissue_mask() method.
MAG : float
magnification at which to detect cellularity
spixel_size_baseMag : int
approximate superpixel size at base (scan) magnification
compactness : float
compactness parameter for the SLIC method. Higher values result
in more regular superpixels while smaller values are more likely
to respect tissue boundaries.
deconvolve : bool
Whether to deconvolve and use hematoxylin channel for feature
extraction. Must be True to ranks spixel clusters by cellularity.
use_grayscale : bool
If True, grayscale image is used with SLIC. May be more robust to
color variations from slide to slide and more efficient.
use_intensity : bool
Whether to extract intensity features from the hematoxylin channel.
This must be True to rank spuerpixel clusters by cellularity.
use_texture : bool
Whether to extract Haralick texture features from Htx channel. May
not necessarily improve results when used in conjunction with
intensity features.
keep_feats : list
Name of intensity features to use. See
histomicstk.features.compute_intensity_features.
Using fewer informative features may result in better
gaussian mixture modeling results.
n_gaussian_components : int
no of gaussian mixture model components
max_cellularity : int
Range [0, 100] or None. If None, normalize visualization RGB values
for each tissue piece separately, else normalize by given number.
opacity : float
opacity of superpixel polygons when posted to DSA.
0 (no opacity) is more efficient to render.
opacity_contig : float
opacity of contiguous region polygons when posted to DSA.
0 (no opacity) is more efficient to render.
lineWidth : float
width of line when displaying superpixel boundaries.
cMap : object
matplotlib color map to use when visualizing cellularity
visualize_tissue_boundary : bool
whether to visualize result from tissue detection component
visualize_spixels : bool
whether to visualize superpixels, color-coded by cellularity
visualize_contiguous : bool
whether to visualize contiguous cellular regions
In this example, and as the default behavior, we use a handful of informative intensity features extracted from the hematoxylin channel after color deconvolution to fit a gaussian mixture model. Empirically (on a few test slides), this seems to give better results than using the full suite of intensity and texture features available. Feel free to experiment with this and find the optimum combination of features for your application.
[5]:
# init cellularity detector
cds = Cellularity_detector_superpixels(
gc, slide_id=SAMPLE_SLIDE_ID,
MAG=3.0, compactness=0.1, spixel_size_baseMag=256 * 256,
max_cellularity=40,
visualize_spixels=True, visualize_contiguous=True,
get_tissue_mask_kwargs={
'deconvolve_first': False,
'n_thresholding_steps': 2,
'sigma': 1.5,
'min_size': 500 },
verbose=2, monitorPrefix='test',
logging_savepath=logging_savepath)
Saving logs to: /tmp/tmpt7dygwhf/2019-09-29_18-04.log
Set the color normalization values¶
You can choose to reinhard color normalize the slide thumbnail and/or the tissue image at target magnificaion. You can either provide the mu and sigma values directly or provide the path to an image from which to infer these values. Please refer to the color_normalization module for reinhard normalization implementation details. In this example, we use a “high-sensitivity, low-specificity” strategy to detect tissue, followed by the more specific cellularity detection module. In other words, the tissue_detection module is used to detect all tissue, and only exclude whitespace and marker. Here we do NOT perform color normalization before tissue detection (empirically gives worse results), but we do normalize when detecting the cellular regions within the tissue.
[6]:
# set color normalization for thumbnail
# cds.set_color_normalization_values(
# mu=cnorm_thumbnail['mu'],
# sigma=cnorm_thumbnail['sigma'], what='thumbnail')
# set color normalization values for main tissue
cds.set_color_normalization_values(
mu=cnorm_main['mu'], sigma=cnorm_main['sigma'], what='main')
Run the detector¶
[7]:
print(cds.run.__doc__)
Run cellularity detection and optionally visualize result.
This runs the cellularity detection +/- visualization pipeline and
returns a list of CD_single_tissue_piece objects. Each object has
the following attributes
tissue_mask : np array
mask of where tissue is at target magnification
ymin : int
min y coordinate at base (scan) magnification
xmin : int
min x coordinate at base (scan) magnification
ymax : int
max y coordinate at base (scan) magnification
xmax : int
max x coordinate at base (scan) magnification
spixel_mask : np array
np array where each unique value represents one superpixel
fdata : pandas DataFrame
features extracted for each superpixel. Index corresponds to
values in the spixel_mask. This includes a 'cluster' column
indicatign which cluster this superpixel belongs to.
cluster_props : dict
properties of each superpixel cluster, including its assigned
cellularity score.
[8]:
tissue_pieces = cds.run()
test: set_slide_info_and_get_tissue_mask()
test: Tissue piece 1 of 2
test: Tissue piece 1 of 2: set_tissue_rgb()
test: Tissue piece 1 of 2: set_superpixel_mask()
test: Tissue piece 1 of 2: set_superpixel_features()
test: Tissue piece 1 of 2: set_superpixel_assignment()
test: Tissue piece 1 of 2: assign_cellularity_scores()
test: Tissue piece 1 of 2: visualize_individual_superpixels()
test: Tissue piece 1 of 2: Posting doc 1 of 5
test: Tissue piece 1 of 2: Posting doc 2 of 5
test: Tissue piece 1 of 2: Posting doc 3 of 5
test: Tissue piece 1 of 2: Posting doc 4 of 5
test: Tissue piece 1 of 2: Posting doc 5 of 5
test: Tissue piece 1 of 2: visualize_contiguous_superpixels()
test: Tissue piece 1 of 2: Posting doc 1 of 5
test: Tissue piece 1 of 2: Posting doc 2 of 5
test: Tissue piece 1 of 2: Posting doc 3 of 5
test: Tissue piece 1 of 2: Posting doc 4 of 5
test: Tissue piece 1 of 2: Posting doc 5 of 5
test: Tissue piece 2 of 2
test: Tissue piece 2 of 2: set_tissue_rgb()
test: Tissue piece 2 of 2: set_superpixel_mask()
test: Tissue piece 2 of 2: set_superpixel_features()
test: Tissue piece 2 of 2: set_superpixel_assignment()
test: Tissue piece 2 of 2: assign_cellularity_scores()
test: Tissue piece 2 of 2: visualize_individual_superpixels()
test: Tissue piece 2 of 2: Posting doc 1 of 5
test: Tissue piece 2 of 2: Posting doc 2 of 5
test: Tissue piece 2 of 2: Posting doc 3 of 5
test: Tissue piece 2 of 2: Posting doc 4 of 5
test: Tissue piece 2 of 2: Posting doc 5 of 5
test: Tissue piece 2 of 2: visualize_contiguous_superpixels()
test: Tissue piece 2 of 2: Posting doc 1 of 5
test: Tissue piece 2 of 2: Posting doc 2 of 5
test: Tissue piece 2 of 2: Posting doc 3 of 5
test: Tissue piece 2 of 2: Posting doc 4 of 5
test: Tissue piece 2 of 2: Posting doc 5 of 5
Check the results¶
The resultant list of objects correspond to the results for each “tissue piece” detected in the slide. You may explore various attributes like the offset coordinates, tissue mask, superpixel labeled mask, superpixel feature data, and superpixel cluster properties.
[9]:
plt.imshow(tissue_pieces[0].tissue_mask, cmap=cMap)
[9]:
<matplotlib.image.AxesImage at 0x7f7d1a3c0c50>
[10]:
plt.imshow(tissue_pieces[0].spixel_mask, cmap=cMap)
[10]:
<matplotlib.image.AxesImage at 0x7f7d1c35ad10>
[11]:
tissue_pieces[0].fdata.head()
[11]:
Intensity.Mean | Intensity.Median | Intensity.Std | Intensity.IQR | Intensity.HistEntropy | cluster | |
---|---|---|---|---|---|---|
5 | 86.639486 | 74.0 | 50.329524 | 34.0 | 1.775608 | 4 |
6 | 73.010711 | 68.0 | 26.269239 | 20.0 | 1.341687 | 4 |
67 | 88.820514 | 73.0 | 56.899073 | 49.0 | 1.942993 | 5 |
68 | 72.959455 | 67.0 | 31.681785 | 21.0 | 1.349226 | 4 |
71 | 100.068075 | 79.0 | 60.721122 | 67.0 | 1.989196 | 5 |
[12]:
tissue_pieces[0].cluster_props
[12]:
{1: {'cellularity': 20, 'color': 'rgb(253,253,255)'},
2: {'cellularity': 37, 'color': 'rgb(167,0,0)'},
3: {'cellularity': 47, 'color': 'rgb(127,0,0)'},
4: {'cellularity': 26, 'color': 'rgb(255,105,105)'},
5: {'cellularity': 29, 'color': 'rgb(255,29,29)'}}
Check the visualization on HistomicsUI¶
Now you may go to the slide on Digital Slide Archive and check the posted annotations.