{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Merging polygons (general purpose)\n", "\n", "**Overview:**\n", "\n", "This notebook describes how to merge annotations that are generated in piecewise when annotating a large structure, or that arise in an annotation study when one user adds annotations to another user's work as corrections. In these cases there is a collection of annotations that overlap and need to be merged without any regular or predictable interfaces.\n", "\n", "The example presented below addresses this case using an R-tree algorithm that identifies merging candidates without exhuastive search. While this approach can also merge annotations generated by tiled analysis it is slower than the alternative.\n", "\n", "This extends on some of the work described in Amgad et al, 2019:\n", "\n", "_Mohamed Amgad, Habiba Elfandy, Hagar Hussein, ..., Jonathan Beezley, Deepak R Chittajallu, David Manthey, David A Gutman, Lee A D Cooper, Structured crowdsourcing enables convolutional segmentation of histology images, Bioinformatics, 2019, btz083_\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Here is a sample result:**\n", "\n", "![polygon_merger](https://user-images.githubusercontent.com/22067552/80076675-84178800-851a-11ea-8f5d-552bca8402ed.png)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "**Implementation summary**\n", "\n", "This algorithm merges annotations in coordinate space, which means it can merge very large structures without encountering memory issues. The algorithm works as follows:\n", "\n", "- Identify contours that that have the same label (e.g. tumor)\n", "\n", "- Add bounding boxes from these contours to an [R-tree](https://en.wikipedia.org/wiki/R-tree). The R-tree implementation used here is modified from [here](https://code.google.com/archive/p/pyrtree/) and uses k-means clustering to balance the tree.\n", "\n", "- Starting from the bottom of the tree, merge all contours from leafs that belong to the same nodes.\n", "\n", "- Move one level up the hierarchy, each time incorporating merged contours from nodes that share a common parent. This is repeated until there is one merged contour at the root node. The contours are first dilated slightly to make sure any small gaps are filled in the merged result, then are eroded by the same factor after merging.\n", "\n", "- Save the coordinates from each merged polygon in a new pandas DataFrame.\n", "\n", "This process ensures that the number of comparisons is ``<< n^2``. This is very important since algorithm complexity plays a key role as whole slide images may contain tens of thousands of annotated structures.\n", "\n", "**Where to look?**\n", "\n", "```\n", "|_ histomicstk/\n", " |_annotations_and_masks/\n", " |_polygon_merger_v2.py\n", " |_tests/\n", " |_ test_polygon_merger.py\n", "```" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "import os\n", "import sys\n", "CWD = os.getcwd()\n", "sys.path.append(os.path.join(CWD, '..', '..', 'histomicstk', 'annotations_and_masks'))\n", "import girder_client\n", "from histomicstk.annotations_and_masks.polygon_merger_v2 import Polygon_merger_v2\n", "from histomicstk.annotations_and_masks.masks_to_annotations_handler import (\n", " get_annotation_documents_from_contours, _discard_nonenclosed_background_group)\n", "from histomicstk.annotations_and_masks.annotation_and_mask_utils import parse_slide_annotations_into_tables" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "## 1. Connect girder client and set parameters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Connect girder client and set parameters" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "APIURL = 'http://candygram.neurology.emory.edu:8080/api/v1/'\n", "SOURCE_SLIDE_ID = '5d5d6910bd4404c6b1f3d893'\n", "POST_SLIDE_ID = '5d586d76bd4404c6b1f286ae'\n", "\n", "gc = girder_client.GirderClient(apiUrl=APIURL)\n", "# gc.authenticate(interactive=True)\n", "gc.authenticate(apiKey='kri19nTIGOkWH01TbzRqfohaaDWb6kPecRqGmemb')\n", "\n", "# get and parse slide annotations into dataframe\n", "slide_annotations = gc.get('/annotation/item/' + SOURCE_SLIDE_ID)\n", "_, contours_df = parse_slide_annotations_into_tables(slide_annotations)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Polygon merger\n", "\n", "The ``Polygon_merger_v2()`` is the top level function for performing the merging." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Init Polygon_merger object.\n", "\n", " Arguments:\n", " -----------\n", " contours_df : pandas DataFrame\n", " The following columns are needed.\n", "\n", " group : str\n", " annotation group (ground truth label).\n", " ymin : int\n", " minimun y coordinate\n", " ymax : int\n", " maximum y coordinate\n", " xmin : int\n", " minimum x coordinate\n", " xmax : int\n", " maximum x coordinate\n", " coords_x : str\n", " vertix x coordinates comma-separated values\n", " coords_y\n", " vertix y coordinated comma-separated values\n", " merge_thresh : int\n", " how close do the polygons need to be (in pixels) to be merged\n", " verbose : int\n", " 0 - Do not print to screen\n", " 1 - Print only key messages\n", " 2 - Print everything to screen\n", " monitorPrefix : str\n", " text to prepend to printed statements\n", "\n", " \n" ] } ], "source": [ "print(Polygon_merger_v2.__doc__)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Init Polygon_merger object.\n", "\n", " Arguments:\n", " -----------\n", " contours_df : pandas DataFrame\n", " The following columns are needed.\n", "\n", " group : str\n", " annotation group (ground truth label).\n", " ymin : int\n", " minimun y coordinate\n", " ymax : int\n", " maximum y coordinate\n", " xmin : int\n", " minimum x coordinate\n", " xmax : int\n", " maximum x coordinate\n", " coords_x : str\n", " vertix x coordinates comma-separated values\n", " coords_y\n", " vertix y coordinated comma-separated values\n", " merge_thresh : int\n", " how close do the polygons need to be (in pixels) to be merged\n", " verbose : int\n", " 0 - Do not print to screen\n", " 1 - Print only key messages\n", " 2 - Print everything to screen\n", " monitorPrefix : str\n", " text to prepend to printed statements\n", "\n", " \n" ] } ], "source": [ "print(Polygon_merger_v2.__init__.__doc__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Required arguments for initialization\n", "\n", "The only required argument is a dataframe of contours merge." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
annidxelementidxtypegroupcolorxminxmaxyminymaxbbox_areacoords_xcoords_y
000polylinemostly_tumorrgb(255,0,0)444574470344191442601697444614,44613,44608,44607,44602,44601,44596,4459...44191,44192,44192,44193,44193,44194,44194,4419...
101polylinemostly_tumorrgb(255,0,0)4435044682437504415413412844350,44350,44353,44354,44359,44360,44364,4436...43750,44154,44151,44151,44146,44146,44142,4414...
210polylineroirgb(200,0,150)4435044860437504426026010044350,44350,44860,44860,4435043750,44260,44260,43750,43750
320polylinemostly_lymphocytic_infiltratergb(0,0,255)4485644860439994403414044860,44858,44858,44857,44857,44856,44856,4485...43999,44001,44002,44003,44006,44007,44018,4401...
421polylinemostly_lymphocytic_infiltratergb(0,0,255)44788448604391243997612044823,44822,44819,44818,44817,44813,44812,4480...43912,43913,43913,43914,43914,43918,43918,4392...
\n", "
" ], "text/plain": [ " annidx elementidx type group color \\\n", "0 0 0 polyline mostly_tumor rgb(255,0,0) \n", "1 0 1 polyline mostly_tumor rgb(255,0,0) \n", "2 1 0 polyline roi rgb(200,0,150) \n", "3 2 0 polyline mostly_lymphocytic_infiltrate rgb(0,0,255) \n", "4 2 1 polyline mostly_lymphocytic_infiltrate rgb(0,0,255) \n", "\n", " xmin xmax ymin ymax bbox_area \\\n", "0 44457 44703 44191 44260 16974 \n", "1 44350 44682 43750 44154 134128 \n", "2 44350 44860 43750 44260 260100 \n", "3 44856 44860 43999 44034 140 \n", "4 44788 44860 43912 43997 6120 \n", "\n", " coords_x \\\n", "0 44614,44613,44608,44607,44602,44601,44596,4459... \n", "1 44350,44350,44353,44354,44359,44360,44364,4436... \n", "2 44350,44350,44860,44860,44350 \n", "3 44860,44858,44858,44857,44857,44856,44856,4485... \n", "4 44823,44822,44819,44818,44817,44813,44812,4480... \n", "\n", " coords_y \n", "0 44191,44192,44192,44193,44193,44194,44194,4419... \n", "1 43750,44154,44151,44151,44146,44146,44142,4414... \n", "2 43750,44260,44260,43750,43750 \n", "3 43999,44001,44002,44003,44006,44007,44018,4401... \n", "4 43912,43913,43913,43914,43914,43918,43918,4392... " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "contours_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Initialize and run the merger" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ": mostly_lymphocytic_infiltrate: set_contours_slice\n", ": mostly_lymphocytic_infiltrate: create_rtree\n", ": mostly_lymphocytic_infiltrate: set_tree_dict\n", ": mostly_lymphocytic_infiltrate: set_hierarchy\n", ": mostly_lymphocytic_infiltrate: get_merged_multipolygon\n", ": mostly_lymphocytic_infiltrate: _add_merged_multipolygon_contours\n", ": mostly_tumor: set_contours_slice\n", ": mostly_tumor: create_rtree\n", ": mostly_tumor: set_tree_dict\n", ": mostly_tumor: set_hierarchy\n", ": mostly_tumor: get_merged_multipolygon\n", ": mostly_tumor: _add_merged_multipolygon_contours\n", ": mostly_stroma: set_contours_slice\n", ": mostly_stroma: create_rtree\n", ": mostly_stroma: set_tree_dict\n", ": mostly_stroma: set_hierarchy\n", ": mostly_stroma: get_merged_multipolygon\n", ": mostly_stroma: _add_merged_multipolygon_contours\n" ] } ], "source": [ "# init & run polygon merger\n", "pm = Polygon_merger_v2(contours_df, verbose=1)\n", "pm.unique_groups.remove('roi')\n", "pm.run()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**NOTE:**\n", "\n", "The following steps are only \"aesthetic\", and just ensure the contours look nice when posted to Digital Slide Archive for viewing with GeoJS. " ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# add colors (aesthetic)\n", "for group in pm.unique_groups:\n", " cs = contours_df.loc[contours_df.loc[:, 'group'] == group, 'color']\n", " pm.new_contours.loc[\n", " pm.new_contours.loc[:, 'group'] == group, 'color'] = cs.iloc[0]\n", "\n", "# get rid of nonenclosed stroma (aesthetic)\n", "pm.new_contours = _discard_nonenclosed_background_group(\n", " pm.new_contours, background_group='mostly_stroma')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### This is the result" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
annidxelementidxtypegroupcolorxminxmaxyminymaxbbox_areacoords_xcoords_yhas_holes
0NaNNaNpolylinemostly_lymphocytic_infiltratergb(0,0,255)44670.045472.043750.044200.0360900.044670,44670,44670,44670,44670,44670,44670,4467...43901,43901,43907,43907,43908,43908,43909,4391...0.0
1NaNNaNpolylinemostly_tumorrgb(255,0,0)46181.046396.043750.043917.035905.046181,46181,46181,46181,46181,46181,46181,4618...43777,43777,43777,43778,43778,43778,43778,4377...0.0
2NaNNaNpolylinemostly_tumorrgb(255,0,0)46312.046396.044167.044350.015372.046312,46312,46312,46312,46312,46312,46315,4631...44252,44252,44252,44253,44253,44253,44256,4425...0.0
3NaNNaNpolylinemostly_tumorrgb(255,0,0)44907.046396.044609.046308.02529811.044907,44907,44907,44907,44907,44907,44907,4490...46230,46230,46230,46234,46234,46234,46234,4623...0.0
4NaNNaNpolylinemostly_tumorrgb(255,0,0)45822.046086.043824.043953.034056.045822,45822,45822,45822,45822,45822,45822,4582...43914,43914,43915,43915,43915,43915,43915,4391...0.0
\n", "
" ], "text/plain": [ " annidx elementidx type group color \\\n", "0 NaN NaN polyline mostly_lymphocytic_infiltrate rgb(0,0,255) \n", "1 NaN NaN polyline mostly_tumor rgb(255,0,0) \n", "2 NaN NaN polyline mostly_tumor rgb(255,0,0) \n", "3 NaN NaN polyline mostly_tumor rgb(255,0,0) \n", "4 NaN NaN polyline mostly_tumor rgb(255,0,0) \n", "\n", " xmin xmax ymin ymax bbox_area \\\n", "0 44670.0 45472.0 43750.0 44200.0 360900.0 \n", "1 46181.0 46396.0 43750.0 43917.0 35905.0 \n", "2 46312.0 46396.0 44167.0 44350.0 15372.0 \n", "3 44907.0 46396.0 44609.0 46308.0 2529811.0 \n", "4 45822.0 46086.0 43824.0 43953.0 34056.0 \n", "\n", " coords_x \\\n", "0 44670,44670,44670,44670,44670,44670,44670,4467... \n", "1 46181,46181,46181,46181,46181,46181,46181,4618... \n", "2 46312,46312,46312,46312,46312,46312,46315,4631... \n", "3 44907,44907,44907,44907,44907,44907,44907,4490... \n", "4 45822,45822,45822,45822,45822,45822,45822,4582... \n", "\n", " coords_y has_holes \n", "0 43901,43901,43907,43907,43908,43908,43909,4391... 0.0 \n", "1 43777,43777,43777,43778,43778,43778,43778,4377... 0.0 \n", "2 44252,44252,44252,44253,44253,44253,44256,4425... 0.0 \n", "3 46230,46230,46230,46234,46234,46234,46234,4623... 0.0 \n", "4 43914,43914,43915,43915,43915,43915,43915,4391... 0.0 " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pm.new_contours.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Visualize results on HistomicsTK" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# deleting existing annotations in target slide (if any)\n", "existing_annotations = gc.get('/annotation/item/' + POST_SLIDE_ID)\n", "for ann in existing_annotations:\n", " gc.delete('/annotation/%s' % ann['_id'])\n", "\n", "# get list of annotation documents\n", "annotation_docs = get_annotation_documents_from_contours(\n", " pm.new_contours.copy(), separate_docs_by_group=True,\n", " docnamePrefix='test',\n", " verbose=False, monitorPrefix=POST_SLIDE_ID + ': annotation docs')\n", "\n", "# post annotations to slide -- make sure it posts without errors\n", "for annotation_doc in annotation_docs:\n", " resp = gc.post(\n", " '/annotation?itemId=' + POST_SLIDE_ID, json=annotation_doc)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now you can go to HistomicsUI and confirm that the posted annotations make sense\n", "and correspond to tissue boundaries and expected labels." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 2 }