Converting masks back to annotations

Created on Mon Aug 12 18:33:48 2019.

@author: tageldim

histomicstk.annotations_and_masks.masks_to_annotations_handler.get_annotation_documents_from_contours(contours_df, separate_docs_by_group=True, annots_per_doc=200, annprops=None, docnamePrefix='', verbose=True, monitorPrefix='')[source]

Given dataframe of contours, get list of annotation documents.

This method parses a dataframe of contours to a list of dictionaries, each of which represents and large_image style annotation. This is a wrapper that extends the functionality of the method get_single_annotation_document_from_contours(), whose docstring should be referenced for implementation details and further explanation.

Parameters:
  • contours_df (pandas DataFrame) –

    WARNING - This is modified inside the function, so pass a copy. This dataframe includes data on contours extracted from input mask using get_contours_from_mask(). If you have contours using some other method, just make sure the dataframe follows the same schema as the output from get_contours_from_mask(). You may find a sample dataframe in the repo at ./tests/test_files/annotations_and_masks/sample_contours_df.tsv. The following columns are relevant for this method.

    groupstr

    annotation group (ground truth label).

    colorstr

    annotation color if it were to be posted to DSA.

    coords_xstr

    vertex x coordinates comma-separated values

    coords_y

    vertex y coordinated comma-separated values

  • separate_docs_by_group (bool) – if set to True, you get one or more annotation documents (dicts) for each group (eg tumor) independently.

  • annots_per_doc (int) – maximum number of annotation elements (polygons) per dict. The smaller this number, the more numerous the annotation documents, but the more seamless it is to post this data to the DSA server or to view using the HistomicsTK interface since you will be loading smaller chunks of data at a time.

  • annprops (dict) – properties of annotation elements. Contains the following keys F, X_OFFSET, Y_OFFSET, opacity, lineWidth. Refer to get_single_annotation_document_from_contours() for details.

  • docnamePrefix (str) – test to prepend to annotation document name

  • verbose (bool) – Print progress to screen?

  • monitorPrefix (str) – text to prepend to printed statements

Returns:

DSA-style annotation document.

Return type:

list of dicts

histomicstk.annotations_and_masks.masks_to_annotations_handler.get_contours_from_bin_mask(bin_mask)[source]

Given a binary mask, get opencv contours.

Parameters:

bin_mask (nd array) – ground truth mask (m,n) - int32 with [0, 1] values.

Returns:

a dictionary with the following keys: - contour group: the actual contour x,y coordinates. - hierarchy: contour hierarchy. This contains information about how contours relate to each other, in the form: [Next, Previous, First_Child, Parent, index_relative_to_contour_group] The last column is added for convenience and is not part of the original opencv output. - outer_contours: index of contours that do not have a parent, and are therefore the outermost most contours. These may have children (holes), however. See docs.opencv.org/3.1.0/d9/d8b/tutorial_py_contours_hierarchy.html for more information.

Return type:

dict

histomicstk.annotations_and_masks.masks_to_annotations_handler.get_contours_from_mask(MASK, GTCodes_df, groups_to_get=None, MIN_SIZE=30, MAX_SIZE=None, get_roi_contour=True, roi_group='roi', discard_nonenclosed_background=False, background_group='mostly_stroma', verbose=False, monitorPrefix='')[source]

Parse ground truth mask and gets contours for annotations.

Parameters:
  • MASK (nd array) – ground truth mask (m,n) where pixel values encode group membership.

  • GTCodes_df (pandas Dataframe) –

    the ground truth codes and information dataframe. This is a dataframe that is indexed by the annotation group name and has the following columns.

    group: str

    group name of annotation, eg. mostly_tumor.

    GT_code: int

    desired ground truth code (in the mask). Pixels of this value belong to corresponding group (class).

    color: str

    rgb format. eg. rgb(255,0,0).

  • groups_to_get (None) – if None (default) then all groups (ground truth labels) will be extracted. Otherwise pass a list of strings like [‘mostly_tumor’,].

  • MIN_SIZE (int) – minimum bounding box size of contour

  • MAX_SIZE (None) – if not None, int. Maximum bounding box size of contour. Sometimes very large contours cause segmentation faults that originate from opencv and are not caught by python, causing the python process to unexpectedly hault. If you would like to set a maximum size to defend against this, a suggested maximum would be 15000.

  • get_roi_contour (bool) – whether to get contour for boundary of region of interest (ROI). This is most relevant when dealing with multiple ROIs per slide and with rotated rectangular or polygonal ROIs.

  • roi_group (str) – name of roi group in the GT_Codes dataframe (eg roi)

  • discard_nonenclosed_background (bool) – If a background group contour is NOT fully enclosed, discard it. This is a purely aesthetic method, makes sure that the background group contours (eg stroma) are discarded by default to avoid cluttering the field when posted to DSA for viewing online. The only exception is if they are enclosed within something else (eg tumor), in which case they are kept since they represent holes. This is related to https://github.com/DigitalSlideArchive/HistomicsTK/issues/675 WARNING - This is a bit slower since the contours will have to be converted to shapely polygons. It is not noticeable for hundreds of contours, but you will notice the speed difference if you are parsing thousands of contours. Default, for this reason, is False.

  • background_group (str) – name of background group in the GT_codes dataframe (eg mostly_stroma)

  • verbose (bool) – Print progress to screen?

  • monitorPrefix (str) – text to prepend to printed statements

Returns:

contours extracted from input mask. The following columns are output.

groupstr

annotation group (ground truth label).

colorstr

annotation color if it were to be posted to DSA.

is_roibool

whether this annotation is a region of interest boundary

yminint

minimum y coordinate

ymaxint

maximum y coordinate

xminint

minimum x coordinate

xmaxint

maximum x coordinate

has_holesbool

whether this contour has holes

touches_edge-topbool

whether this contour touches top mask edge

touches_edge-bottombool

whether this contour touches bottom mask edge

touches_edge-leftbool

whether this contour touches left mask edge

touches_edge-rightbool

whether this contour touches right mask edge

coords_xstr

vertex x coordinates comma-separated values

coords_y

vertex y coordinated comma-separated values

Return type:

pandas DataFrame

histomicstk.annotations_and_masks.masks_to_annotations_handler.get_single_annotation_document_from_contours(contours_df_slice, docname='default', F=1.0, X_OFFSET=0, Y_OFFSET=0, opacity=0.3, lineWidth=4.0, verbose=True, monitorPrefix='')[source]

Given dataframe of contours, get annotation document.

This uses the large_image annotation schema to create an annotation document that maybe posted to DSA for viewing using something like: resp = gc.post(“/annotation?itemId=” + slide_id, json=annotation_doc) The annotation schema can be found at: github.com/girder/large_image/blob/master/docs/annotations.md .

Parameters:
  • contours_df_slice (pandas DataFrame) –

    The following columns are of relevance and must be contained.

    groupstr

    annotation group (ground truth label).

    colorstr

    annotation color if it were to be posted to DSA.

    coords_xstr

    vertex x coordinates comma-separated values

    coords_y

    vertex y coordinated comma-separated values

  • docname (str) – annotation document name

  • F (float) – how much smaller is the mask where the contours come from is relative to the slide scan magnification. For example, if the mask is at 10x whereas the slide scan magnification is 20x, then F would be 2.0.

  • X_OFFSET (int) – x offset to add to contours at BASE (SCAN) magnification

  • Y_OFFSET (int) – y offset to add to contours at BASE (SCAN) magnification

  • opacity (float) – opacity of annotation elements (in the range [0, 1])

  • lineWidth (float) – width of boarders of annotation elements

  • verbose (bool) – Print progress to screen?

  • monitorPrefix (str) – text to prepend to printed statements

Returns:

DSA-style annotation document ready to be post for viewing.

Return type:

dict