{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Color thresholding semantic segmentation\n", "\n", "Whole-slide images often contain artifacts like marker or acellular regions that\n", "need to be avoided during analysis. In this example we show how HistomicsTK can\n", "be used to develop saliency detection algorithms that segment the slide at low\n", "magnification to generate a map to guide higher magnification analyses. Here we\n", "show how how colorspace analysis can detect various elements such as inking\n", "or blood, as well as dense cellular regions, to improve the quality of\n", "subsequent image analysis tasks.\n", "\n", "This uses a thresholding and stain unmixing based pipeline to detect\n", "highly-cellular regions in a slide. The `run()` method of the\n", "`CDT_single_tissue_piece()` class has the key steps of the pipeline.\n", "\n", "Additional functionality includes contour extraction to get the final segmentation boundaries and to visualize them in DSA using one's preferred styles.\n", "\n", "**Here are some sample results:**\n", "\n", "![saliency_results](https://user-images.githubusercontent.com/22067552/80079317-1bcaa580-851e-11ea-9353-a435a2afc6eb.jpg)\n", "\n", "**Where to look?**\n", "\n", "```\n", " |_ histomicstk/\n", " |_saliency/\n", " |_cellularity_detection_thresholding.py \n", " |_tests/\n", " |_test_saliency.py\n", "```" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import tempfile\n", "import girder_client\n", "import numpy as np\n", "from pandas import read_csv\n", "from histomicstk.annotations_and_masks.annotation_and_mask_utils import (\n", " delete_annotations_in_slide)\n", "from histomicstk.saliency.cellularity_detection_thresholding import (\n", " Cellularity_detector_thresholding)\n", "\n", "import matplotlib.pylab as plt\n", "from matplotlib.colors import ListedColormap\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prepwork" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "APIURL = 'http://candygram.neurology.emory.edu:8080/api/v1/'\n", "SAMPLE_SLIDE_ID = '5d8c296cbd4404c6b1fa5572'\n", "\n", "gc = girder_client.GirderClient(apiUrl=APIURL)\n", "gc.authenticate(apiKey='kri19nTIGOkWH01TbzRqfohaaDWb6kPecRqGmemb')\n", "\n", "# This is where the run logs will be saved\n", "logging_savepath = tempfile.mkdtemp()\n", "\n", "# read GT codes dataframe\n", "GTcodes = read_csv('../../histomicstk/saliency/tests/saliency_GTcodes.csv')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# deleting existing annotations in target slide (if any)\n", "delete_annotations_in_slide(gc, SAMPLE_SLIDE_ID)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Let's explore the GTcodes dataframe" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
groupoverlay_orderGT_codeis_roiis_background_classcolorcomments
0outside_tissue-125500rgb(40,40,40)NaN
1roi025400rgb(0,0,0)NaN
2not_specified025301rgb(255,50,255)NaN
3blue_sharpie1600rgb(0,224,255)NaN
4blood2700rgb(255,255,0)NaN
5whitespace3800rgb(70,70,70)NaN
6maybe_cellular4900rgb(145,109,189)NaN
7top_cellular51000rgb(50,250,20)NaN
\n", "
" ], "text/plain": [ " group overlay_order GT_code is_roi is_background_class \\\n", "0 outside_tissue -1 255 0 0 \n", "1 roi 0 254 0 0 \n", "2 not_specified 0 253 0 1 \n", "3 blue_sharpie 1 6 0 0 \n", "4 blood 2 7 0 0 \n", "5 whitespace 3 8 0 0 \n", "6 maybe_cellular 4 9 0 0 \n", "7 top_cellular 5 10 0 0 \n", "\n", " color comments \n", "0 rgb(40,40,40) NaN \n", "1 rgb(0,0,0) NaN \n", "2 rgb(255,50,255) NaN \n", "3 rgb(0,224,255) NaN \n", "4 rgb(255,255,0) NaN \n", "5 rgb(70,70,70) NaN \n", "6 rgb(145,109,189) NaN \n", "7 rgb(50,250,20) NaN " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "GTcodes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initialize the cellularity detector" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Explore the docs\n", "\n", "Get some idea about the implementation details and default behavior. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Detect cellular regions in a slide using thresholding.\n", "\n", " This uses a thresholding and stain unmixing based pipeline\n", " to detect highly-cellular regions in a slide. The run()\n", " method of the CDT_single_tissue_piece() class has the key\n", " steps of the pipeline. In summary, here are the steps\n", " involved...\n", "\n", " 1. Detect tissue from background using the RGB slide\n", " thumbnail. Each \"tissue piece\" is analysed independently\n", " from here onwards. The tissue_detection modeule is used\n", " for this step. A high sensitivity, low specificity setting\n", " is used here.\n", "\n", " 2. Fetch the RGB image of tissue at target magnification. A\n", " low magnification (default is 3.0) is used and is sufficient.\n", "\n", " 3. The image is converted to HSI and LAB spaces. Thresholding\n", " is performed to detect various non-salient components that\n", " often throw-off the color normalization and deconvolution\n", " algorithms. Thresholding includes both minimum and maximum\n", " values. The user can set whichever thresholds of components\n", " they would like. The development of this workflow was focused\n", " on breast cancer so the thresholded components by default\n", " are whote space (or adipose tissue), dark blue/green blotches\n", " (sharpie, inking at margin, etc), and blood. Whitespace\n", " is obtained by thresholding the saturation and intensity,\n", " while other components are obtained by thresholding LAB.\n", "\n", " 4. Now that we know where \"actual\" tissue is, we do a MASKED\n", " color normalization to a prespecified standard. The masking\n", " ensures the normalization routine is not thrown off by non-\n", " tissue components.\n", "\n", " 5. Perform masked stain unmixing/deconvolution to obtain the\n", " hematoxylin stain channel.\n", "\n", " 6. Smooth and threshold the hematoxylin channel. Then\n", " perform connected component analysis to find contiguous\n", " potentially-cellular regions.\n", "\n", " 7. Keep the n largest potentially-cellular regions. Then\n", " from those large regions, keep the m brightest regions\n", " (using hematoxylin channel brightness) as the final\n", " salient/cellular regions.\n", "\n", " \n" ] } ], "source": [ "print(Cellularity_detector_thresholding.__doc__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The only required arguments to initialize are `gc`, `slide_id`, and `GTcodes`.\n", "Everything else is optional and assigned defaults, but you may want to read up on\n", "what each argument does to adjust to your specific needs. The default behavior\n", "is defined at the beginning of the `__init__()` method." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Init Cellularity_Detector_Superpixels object.\n", "\n", " Arguments:\n", " -----------\n", " gc : object\n", " girder client object\n", "\n", " slide_id : str\n", " girder ID of slide\n", "\n", " GTcodes : pandas Dataframe\n", " the ground truth codes and information dataframe.\n", " WARNING: Modified inside this method so pass a copy.\n", " This is a dataframe that is indexed by the annotation group name\n", " and has the following columns...\n", "\n", " group: str\n", " group name of annotation, eg. mostly_tumor\n", " overlay_order: int\n", " how early to place the annotation in the\n", " mask. Larger values means this annotation group is overlayed\n", " last and overwrites whatever overlaps it.\n", " GT_code: int\n", " desired ground truth code (in the mask).\n", " Pixels of this value belong to corresponding group (class)\n", " is_roi: bool\n", " whether this group encodes an ROI\n", " is_background_class: bool\n", " whether this group is the default fill value inside the ROI.\n", " For example, you may decide that any pixel inside the ROI\n", " is considered stroma.\n", " color: str\n", " rgb format. eg. rgb(255,0,0)\n", "\n", " The following indexes must be present...\n", " outside_tissue, not_specified, maybe_cellular, top_cellular\n", "\n", " verbose : int\n", " 0 - Do not print to screen\n", " 1 - Print only key messages\n", " 2 - Print everything to screen\n", " 3 - print everything including from inner functions\n", "\n", " monitorPrefix : str\n", " text to prepend to printed statements\n", "\n", " logging_savepath : str or None\n", " where to save run logs\n", "\n", " suppress_warnings : bool\n", " whether to suppress warnings\n", "\n", " MAG : float\n", " magnification at which to detect cellularity\n", "\n", " color_normalization_method : str\n", " Must be in ['reinhard', 'macenko_pca', 'none']\n", "\n", " target_W_macenko : np array\n", " 3 by 3 stain matrix for macenko normalization\n", " obtained using rgb_separate_stains_macenko_pca()\n", " and reordered such that hematoxylin and eosin are\n", " the first and second channels, respectively.\n", "\n", " target_stats_reinhard : dict\n", " must contains the keys mu and sigma. Mean and sigma\n", " of target image in LAB space for reinhard normalization.\n", "\n", " get_tissue_mask_kwargs : dict\n", " kwargs for the get_tissue_mask() method. This is used\n", " to detect tissue from the slide thumbnail.\n", "\n", " keep_components : list\n", " list of strings. Names of components to exclude by\n", " HSI thresholding. These much be present in the index\n", " of the GTcodes dataframe\n", "\n", " get_tissue_mask_kwargs2 : dict\n", " kwargs for get_tissue_mask() used for iterative smoothing\n", " and thresholding the component masks after initial\n", " thresholding using the user-defined HSI/LAB thresholds.\n", "\n", " hsi_thresholds : dict\n", " each entry is a dict containing the keys hue, saturation\n", " and intensity. Each of these is in turn also a dict\n", " containing the keys min and max. See default value below\n", " for an example.\n", "\n", " lab_thresholds : dict\n", " each entry is a dict containing the keys l, a, and b.\n", " Each of these is in turn also a dict containing the keys\n", " min and max. See default value below for an example.\n", "\n", " stain_unmixing_routine_params : dict\n", " kwargs passed as the stain_unmixing_routine_params\n", " argument to the deconvolution_based_normalization method\n", "\n", " cellular_step1_sigma : float\n", " sigma of gaussian smoothing for first cellularity step\n", "\n", " cellular_step1_min_size : int\n", " minimum contiguous size for first cellularity step\n", "\n", " cellular_step2_sigma : float\n", " sigma of gaussian smoothing for second cellularity step\n", "\n", " cellular_largest_n : int\n", " Number of large contiguous cellular regions to keep\n", "\n", " cellular_top_n : int\n", " Number of final \"top\" cellular regions to keep\n", "\n", " visualize : bool\n", " whether to visualize results in DSA\n", "\n", " opacity : float\n", " opacity of superpixel polygons when posted to DSA.\n", " 0 (no opacity) is more efficient to render.\n", "\n", " lineWidth : float\n", " width of line when displaying region boundaries.\n", "\n", " \n" ] } ], "source": [ "print(Cellularity_detector_thresholding.__init__.__doc__)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Saving logs to: /tmp/tmpclyolr1y/2019-10-27_17-51.log\n" ] } ], "source": [ "# init cellularity detector\n", "cdt = Cellularity_detector_thresholding(\n", " gc, slide_id=SAMPLE_SLIDE_ID, GTcodes=GTcodes,\n", " verbose=2, monitorPrefix='test',\n", " logging_savepath=logging_savepath)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Set the color normalization values (optional)\n", "\n", "By default, color normalization is performed using the macenko method and standardizing to\n", "a hematoxylin and eosin standard from the target image\n", "TCGA-A2-A3XS-DX1_xmin21421_ymin37486 from Amgad et al, 2019.\n", "\n", "If you don't like this behavior, and would prefer to use your own target image or a\n", "different color normalization method, use the set_color_normalization_method() below." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Set color normalization values to use from target image.\n", "\n", " Arguments:\n", " -----------\n", " ref_image_path : str\n", " path to target (reference) image\n", "\n", " color_normalization_method : str\n", " color normalization method to use. Currently, only\n", " 'reinhard' and 'macenko_pca' are accepted.\n", "\n", " \n" ] } ], "source": [ "print(cdt.set_color_normalization_target.__doc__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run the detector" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "test: set_slide_info_and_get_tissue_mask()\n", "test: Tissue piece 1 of 1\n", "test: Tissue piece 1 of 1: set_tissue_rgb()\n", "test: Tissue piece 1 of 1: initialize_labeled_mask()\n", "test: Tissue piece 1 of 1: assign_components_by_thresholding()\n", "test: Tissue piece 1 of 1: -- get HSI and LAB images ...\n", "test: Tissue piece 1 of 1: -- thresholding blue_sharpie ...\n", "test: Tissue piece 1 of 1: -- thresholding blood ...\n", "test: Tissue piece 1 of 1: -- thresholding whitespace ...\n", "test: Tissue piece 1 of 1: color_normalize_unspecified_components()\n", "test: Tissue piece 1 of 1: -- macenko normalization ...\n", "test: Tissue piece 1 of 1: find_potentially_cellular_regions()\n", "test: Tissue piece 1 of 1: find_top_cellular_regions()\n", "test: Tissue piece 1 of 1: visualize_results()\n" ] } ], "source": [ "tissue_pieces = cdt.run()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Check the results\n", "\n", "The resultant list of objects correspond to the results for each \"tissue piece\" detected in the slide. You may explore various attributes like the offset coordinates and labeled mask." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Tissue piece 0: xmin 30455 xmax 113472 ymin 5403 ymax 67297\n" ] } ], "source": [ "print(\n", " 'Tissue piece 0: ',\n", " 'xmin', tissue_pieces[0].xmin,\n", " 'xmax', tissue_pieces[0].xmax,\n", " 'ymin', tissue_pieces[0].ymin,\n", " 'ymax', tissue_pieces[0].ymax,\n", ")" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# color map\n", "tmp = tissue_pieces[0].labeled.copy()\n", "tmp[0, :256] = np.arange(256)\n", "vals = ['black'] * 256\n", "vals[6] = 'cyan' # sharpie / ink\n", "vals[7] = 'yellow' # blood\n", "vals[8] = 'grey' # whitespace\n", "vals[9] = 'indigo' # maybe cellular\n", "vals[10] = 'green' # salient / top cellular\n", "cMap = ListedColormap(vals)\n", "\n", "plt.figure(figsize=(10,10))\n", "plt.imshow(tmp, cmap=cMap)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Check the visualization on HistomicsUI\n", "\n", "Now you may go to the slide on Digital Slide Archive and check the posted annotations." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 2 }