{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Merging annotations from tiled arrays\n", "\n", "**Overview:**\n", "\n", "This notebook describes how to merge annotations generated by tiled analysis of a whole-slide image. Since tiled analysis is carried out on small tiles, the annotations produced by image segmentation algorithms will be disjoint at the tile boundaries, prohibiting analysis of large structures that span multiple tiles.\n", "\n", "The example presented below addresses the case where the annotations are stored in an array format that preserves the spatial organization of tiles. This scenario arises when iterating through the columns and rows of a tiled representation of a whole-slide image. Analysis of the organized array format is faster and preferred since the interfaces where annotations need to be merged are known. In cases where where the annotations to be merged do not come from tiled analysis, or where the tile results are not organized, an alternative method based on R-trees provides a slightly slower solution.\n", "\n", "This extends on some of the work described in Amgad et al, 2019:\n", "\n", "_Mohamed Amgad, Habiba Elfandy, Hagar Hussein, ..., Jonathan Beezley, Deepak R Chittajallu, David Manthey, David A Gutman, Lee A D Cooper, Structured crowdsourcing enables convolutional segmentation of histology images, Bioinformatics, 2019, btz083_" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**This is a sample result:**\n", "\n", "![polygon_merger](https://user-images.githubusercontent.com/22067552/80076675-84178800-851a-11ea-8f5d-552bca8402ed.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Implementation summary**\n", "\n", "In the tiled array approach the tiles must be rectangular and unrotated. The algorithm used merges polygons in coordinate space so that almost-arbitrarily large structures can be handled without encountering memory issues. The algorithm works as follows:\n", "\n", "- Extract contours from the given masks using functionality from the ``masks_to_annotations_handler.py``, making sure to account for contour offset so that all coordinates are relative to whole-slide image frame.\n", "\n", "- Identify contours that touch tile interfaces.\n", "\n", "- Identify shared edges between tiles.\n", "\n", "- For each shared edge, find contours that neighbor each other (using bounding box location) and verify that they should be paired using shapely.\n", "\n", "- Using 4-connectivity link all pairs of contours that are to be merged.\n", "\n", "- Use morphologic processing to dilate and fill gaps in the linked pairs and then erode to generate the final merged contour.\n", "\n", "This initial steps ensures that the number of comparisons made is ``<< n^2``. This is important since algorithm complexity plays a key role as whole slide images may contain tens of thousands of annotated structures.\n", "\n", "**Where to look?**\n", "\n", "```\n", "histomicstk/\n", " |_annotations_and_masks/\n", " |_polygon_merger.py\n", " |_tests/\n", " |_ test_polygon_merger.py\n", " |_ test_annotations_to_masks_handler.py\n", "```" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import os\n", "CWD = os.getcwd()\n", "import os\n", "import girder_client\n", "from pandas import read_csv\n", "\n", "from histomicstk.annotations_and_masks.polygon_merger import Polygon_merger\n", "from histomicstk.annotations_and_masks.masks_to_annotations_handler import (\n", " get_annotation_documents_from_contours )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Connect girder client and set parameters" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "APIURL = 'http://candygram.neurology.emory.edu:8080/api/v1/'\n", "SAMPLE_SLIDE_ID = '5d586d76bd4404c6b1f286ae'\n", "\n", "gc = girder_client.GirderClient(apiUrl=APIURL)\n", "gc.authenticate(interactive=True)\n", "# gc.authenticate(apiKey='kri19nTIGOkWH01TbzRqfohaaDWb6kPecRqGmemb')\n", "\n", "# read GTCodes dataframe\n", "PTESTS_PATH = os.path.join(CWD, '..', '..', 'tests')\n", "GTCODE_PATH = os.path.join(PTESTS_PATH, 'test_files', 'sample_GTcodes.csv')\n", "GTCodes_df = read_csv(GTCODE_PATH)\n", "GTCodes_df.index = GTCodes_df.loc[:, 'group']\n", "\n", "# This is where masks for adjacent rois are saved\n", "MASK_LOADPATH = os.path.join(\n", " PTESTS_PATH,'test_files', 'annotations_and_masks', 'polygon_merger_roi_masks')\n", "maskpaths = [\n", " os.path.join(MASK_LOADPATH, j) for j in os.listdir(MASK_LOADPATH)\n", " if j.endswith('.png')]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Polygon merger\n", "\n", "The ``Polygon_merger()`` is the top level function for performing the merging." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Methods to merge polygons in tiled masks.\n" ] } ], "source": [ "print(Polygon_merger.__doc__)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Init Polygon_merger object.\n", "\n", " Arguments:\n", " -----------\n", " maskpaths : list\n", " list of strings representing pathos to masks\n", " GTCodes_df : pandas DataFrame\n", " the ground truth codes and information dataframe.\n", " This is a dataframe that MUST BE indexed by annotation group name\n", " and has the following columns.\n", "\n", " group: str\n", " group name of annotation, eg. mostly_tumor.\n", " GT_code: int\n", " desired ground truth code (in the mask). Pixels of this value\n", " belong to corresponding group (class).\n", " color: str\n", " rgb format. eg. rgb(255,0,0).\n", " merge_thresh : int\n", " how close do the polygons need to be (in pixels) to be merged\n", " contkwargs : dict\n", " dictionary of kwargs to pass to get_contours_from_mask()\n", " discard_nonenclosed_background : bool\n", " If a background group contour is NOT fully enclosed, discard it.\n", " This is a purely aesthetic method, makes sure that the background\n", " contours (eg stroma) are discarded by default to avoid cluttering\n", " the field when posted to DSA for viewing online. The only exception\n", " is if they are enclosed within something else (eg tumor), in which\n", " case they are kept since they represent holes. This is related to\n", " https://github.com/DigitalSlideArchive/HistomicsTK/issues/675\n", " WARNING - This is a bit slower since the contours will have to be\n", " converted to shapely polygons. It is not noticeable for hundreds of\n", " contours, but you will notice the speed difference if you are\n", " parsing thousands of contours. Default, for this reason, is False.\n", " background_group : str\n", " name of bckgrd group in the GT_codes dataframe (eg mostly_stroma)\n", " roi_group : str\n", " name of roi group in the GT_Codes dataframe (eg roi)\n", " verbose : int\n", " 0 - Do not print to screen\n", " 1 - Print only key messages\n", " 2 - Print everything to screen\n", " monitorPrefix : str\n", " text to prepend to printed statements\n", "\n", " \n" ] } ], "source": [ "print(Polygon_merger.__init__.__doc__)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Run full pipeline to get merged contours.\n", "\n", " Returns:\n", " - pandas DataFrame: has the same structure as output from\n", " get_contours_from_mask().\n", "\n", " \n" ] } ], "source": [ "print(Polygon_merger.run.__doc__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Required arguments for initialization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Ground truth codes file" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This contains the ground truth codes and information dataframe. This is a dataframe that is indexed by the annotation group name and has the following columns:\n", "\n", "- ``group``: group name of annotation (string), eg. \"mostly_tumor\"\n", "- ``GT_code``: int, desired ground truth code (in the mask) Pixels of this value belong to corresponding group (class)\n", "- ``color``: str, rgb format. eg. rgb(255,0,0).\n", "\n", "**NOTE:**\n", "\n", "Zero pixels have special meaning and do NOT encode specific ground truth class. Instead, they simply mean 'Outside ROI' and should be IGNORED during model training or evaluation." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
groupoverlay_orderGT_codeis_roiis_background_classcolorcomments
group
roiroi025410rgb(200,0,150)NaN
evaluation_roievaluation_roi025310rgb(255,0,0)NaN
mostly_tumormostly_tumor1100rgb(255,0,0)core class
mostly_stromamostly_stroma2201rgb(255,125,0)core class
mostly_lymphocytic_infiltratemostly_lymphocytic_infiltrate1300rgb(0,0,255)core class
\n
", "text/plain": " group overlay_order \\\ngroup \nroi roi 0 \nevaluation_roi evaluation_roi 0 \nmostly_tumor mostly_tumor 1 \nmostly_stroma mostly_stroma 2 \nmostly_lymphocytic_infiltrate mostly_lymphocytic_infiltrate 1 \n\n GT_code is_roi is_background_class \\\ngroup \nroi 254 1 0 \nevaluation_roi 253 1 0 \nmostly_tumor 1 0 0 \nmostly_stroma 2 0 1 \nmostly_lymphocytic_infiltrate 3 0 0 \n\n color comments \ngroup \nroi rgb(200,0,150) NaN \nevaluation_roi rgb(255,0,0) NaN \nmostly_tumor rgb(255,0,0) core class \nmostly_stroma rgb(255,125,0) core class \nmostly_lymphocytic_infiltrate rgb(0,0,255) core class " }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "GTCodes_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### maskpaths\n", "\n", "These are absolute paths for the masks generated by tiled analysis to be used." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": "['TCGA-A2-A0YE-01Z-00-DX1.8A2E3094-5755-42BC-969D-7F0A2ECA0F39_left-45886_top-43750_mag-BASE.png',\n 'TCGA-A2-A0YE-01Z-00-DX1.8A2E3094-5755-42BC-969D-7F0A2ECA0F39_left-44862_top-45286_mag-BASE.png',\n 'TCGA-A2-A0YE-01Z-00-DX1.8A2E3094-5755-42BC-969D-7F0A2ECA0F39_left-44862_top-44262_mag-BASE.png',\n 'TCGA-A2-A0YE-01Z-00-DX1.8A2E3094-5755-42BC-969D-7F0A2ECA0F39_left-45374_top-44262_mag-BASE.png',\n 'TCGA-A2-A0YE-01Z-00-DX1.8A2E3094-5755-42BC-969D-7F0A2ECA0F39_left-44350_top-44262_mag-BASE.png']" }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[os.path.split(j)[1] for j in maskpaths[:5]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the pattern ```_left-123_``` and ```_top-123_``` is assumed to encode the x and y offset\n", "of the mask at base magnification. If you prefer some other convention, you will need to manually provide the\n", "parameter ``roi_offsets`` to the method ``Polygon_merger.set_roi_bboxes``." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Get dictionary of roi bounding boxes.\n", "\n", " Arguments:\n", " - roi_offsets: dict (default, None): dict indexed by maskname,\n", " each entry is a dict with keys top and left each is an integer.\n", " If None, then the x and y offset is inferred from mask name.\n", "\n", " Sets:\n", " - self.roiinfos: dict: dict indexed by maskname, each entry is a\n", " dict with keys top, left, bottom, right, all of which are integers.\n", "\n", " \n" ] } ], "source": [ "print(Polygon_merger.set_roi_bboxes.__doc__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Initialize and run the merger\n", "\n", "To keep things clean we discard background contours (in this case, stroma) that\n", "are now enclosed with another contour. See docs for ``masks_to_annotations_handler.py``\n", "if this is confusing. This is purely aesthetic." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "test: Set contours from all masks\n", "\n", "test: Set ROI bounding boxes\n", "\n", "test: Set shard ROI edges\n", "\n", "test: Set merged contours\n", "\n", "test: Get concatenated contours\n", "test: _discard_nonenclosed_background_group: discarded 4 contours\n" ] } ], "source": [ "pm = Polygon_merger(\n", " maskpaths=maskpaths, GTCodes_df=GTCodes_df,\n", " discard_nonenclosed_background=True, verbose=1,\n", " monitorPrefix='test')\n", "contours_df = pm.run()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### This is the result" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
groupcoloryminymaxxminxmaxhas_holestouches_edge-toptouches_edge-lefttouches_edge-bottomtouches_edge-rightcoords_xcoords_y
0mostly_tumorrgb(255,0,0)NaNNaNNaNNaN0NaNNaNNaNNaN46312,46312,46315,46316,46316,46316,46317,4631...44252,44253,44256,44256,44257,44257,44257,4425...
1mostly_tumorrgb(255,0,0)NaNNaNNaNNaN0NaNNaNNaNNaN45822,45822,45822,45823,45823,45823,45823,4582...43915,43916,43916,43916,43917,43917,43917,4391...
2mostly_tumorrgb(255,0,0)NaNNaNNaNNaN0NaNNaNNaNNaN44350,44350,44384,44384,44385,44385,44386,4438...44445,44446,44446,44445,44445,44445,44445,4444...
3mostly_tumorrgb(255,0,0)NaNNaNNaNNaN0NaNNaNNaNNaN44350,44350,44350,44350,44350,44350,44350,4435...44615,44615,44615,44771,44771,44772,44772,4477...
4mostly_tumorrgb(255,0,0)NaNNaNNaNNaN0NaNNaNNaNNaN44350,44350,44350,44350,44350,44350,44350,4435...45129,45129,45283,45283,45284,45284,45285,4528...
\n
", "text/plain": " group color ymin ymax xmin xmax has_holes touches_edge-top \\\n0 mostly_tumor rgb(255,0,0) NaN NaN NaN NaN 0 NaN \n1 mostly_tumor rgb(255,0,0) NaN NaN NaN NaN 0 NaN \n2 mostly_tumor rgb(255,0,0) NaN NaN NaN NaN 0 NaN \n3 mostly_tumor rgb(255,0,0) NaN NaN NaN NaN 0 NaN \n4 mostly_tumor rgb(255,0,0) NaN NaN NaN NaN 0 NaN \n\n touches_edge-left touches_edge-bottom touches_edge-right \\\n0 NaN NaN NaN \n1 NaN NaN NaN \n2 NaN NaN NaN \n3 NaN NaN NaN \n4 NaN NaN NaN \n\n coords_x \\\n0 46312,46312,46315,46316,46316,46316,46317,4631... \n1 45822,45822,45822,45823,45823,45823,45823,4582... \n2 44350,44350,44384,44384,44385,44385,44386,4438... \n3 44350,44350,44350,44350,44350,44350,44350,4435... \n4 44350,44350,44350,44350,44350,44350,44350,4435... \n\n coords_y \n0 44252,44253,44256,44256,44257,44257,44257,4425... \n1 43915,43916,43916,43916,43917,43917,43917,4391... \n2 44445,44446,44446,44445,44445,44445,44445,4444... \n3 44615,44615,44615,44771,44771,44772,44772,4477... \n4 45129,45129,45283,45283,45284,45284,45285,4528... " }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "contours_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Visualize results on HistomicsTK" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# deleting existing annotations in target slide (if any)\n", "existing_annotations = gc.get('/annotation/item/' + SAMPLE_SLIDE_ID)\n", "for ann in existing_annotations:\n", " gc.delete('/annotation/%s' % ann['_id'])\n", "\n", "# get list of annotation documents\n", "annotation_docs = get_annotation_documents_from_contours(\n", " contours_df.copy(), separate_docs_by_group=True,\n", " docnamePrefix='test',\n", " verbose=False, monitorPrefix=SAMPLE_SLIDE_ID + ': annotation docs')\n", "\n", "# post annotations to slide -- make sure it posts without errors\n", "for annotation_doc in annotation_docs:\n", " resp = gc.post(\n", " '/annotation?itemId=' + SAMPLE_SLIDE_ID, json=annotation_doc)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now you can go to HistomicsUI and confirm that the posted annotations make sense\n", "and correspond to tissue boundaries and expected labels." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 2 }