Converting masks back to annotations¶

Created on Mon Aug 12 18:33:48 2019.

@author: tageldim

histomicstk.annotations_and_masks.masks_to_annotations_handler.get_annotation_documents_from_contours(contours_df, separate_docs_by_group=True, annots_per_doc=200, annprops=None, docnamePrefix='', verbose=True, monitorPrefix='')[source]¶

Given dataframe of contours, get list of annotation documents.

This method parses a dataframe of contours to a list of dictionaries, each of which represents and large_image style annotation. This is a wrapper that extends the functionality of the method get_single_annotation_document_from_contours(), whose docstring should be referenced for implementation details and further explanation.

Parameters:

contours_df (pandas DataFrame) –
WARNING - This is modified inside the function, so pass a copy. This dataframe includes data on contours extracted from input mask using get_contours_from_mask(). If you have contours using some other method, just make sure the dataframe follows the same schema as the output from get_contours_from_mask(). You may find a sample dataframe in the repo at ./tests/test_files/annotations_and_masks/sample_contours_df.tsv. The following columns are relevant for this method.

groupstr
annotation group (ground truth label).

colorstr
annotation color if it were to be posted to DSA.

coords_xstr
vertex x coordinates comma-separated values

coords_y
vertex y coordinated comma-separated values
separate_docs_by_group (bool) – if set to True, you get one or more annotation documents (dicts) for each group (eg tumor) independently.
annots_per_doc (int) – maximum number of annotation elements (polygons) per dict. The smaller this number, the more numerous the annotation documents, but the more seamless it is to post this data to the DSA server or to view using the HistomicsTK interface since you will be loading smaller chunks of data at a time.
annprops (dict) – properties of annotation elements. Contains the following keys F, X_OFFSET, Y_OFFSET, opacity, lineWidth. Refer to get_single_annotation_document_from_contours() for details.
docnamePrefix (str) – test to prepend to annotation document name
verbose (bool) – Print progress to screen?
monitorPrefix (str) – text to prepend to printed statements

Returns:

DSA-style annotation document.

Return type:

list of dicts

histomicstk.annotations_and_masks.masks_to_annotations_handler.get_contours_from_bin_mask(bin_mask)[source]¶

Given a binary mask, get opencv contours.

Parameters:: bin_mask (nd array) – ground truth mask (m,n) - int32 with [0, 1] values.
Returns:: a dictionary with the following keys: - contour group: the actual contour x,y coordinates. - hierarchy: contour hierarchy. This contains information about how contours relate to each other, in the form: [Next, Previous, First_Child, Parent, index_relative_to_contour_group] The last column is added for convenience and is not part of the original opencv output. - outer_contours: index of contours that do not have a parent, and are therefore the outermost most contours. These may have children (holes), however. See docs.opencv.org/3.1.0/d9/d8b/tutorial_py_contours_hierarchy.html for more information.
Return type:: dict

histomicstk.annotations_and_masks.masks_to_annotations_handler.get_contours_from_mask(MASK, GTCodes_df, groups_to_get=None, MIN_SIZE=30, MAX_SIZE=None, get_roi_contour=True, roi_group='roi', discard_nonenclosed_background=False, background_group='mostly_stroma', verbose=False, monitorPrefix='')[source]¶

Parse ground truth mask and gets contours for annotations.

Parameters:

MASK (nd array) – ground truth mask (m,n) where pixel values encode group membership.
GTCodes_df (pandas Dataframe) –
the ground truth codes and information dataframe. This is a dataframe that is indexed by the annotation group name and has the following columns.

group: str
group name of annotation, eg. mostly_tumor.

GT_code: int
desired ground truth code (in the mask). Pixels of this value belong to corresponding group (class).

color: str
rgb format. eg. rgb(255,0,0).
groups_to_get (None) – if None (default) then all groups (ground truth labels) will be extracted. Otherwise pass a list of strings like [‘mostly_tumor’,].
MIN_SIZE (int) – minimum bounding box size of contour
MAX_SIZE (None) – if not None, int. Maximum bounding box size of contour. Sometimes very large contours cause segmentation faults that originate from opencv and are not caught by python, causing the python process to unexpectedly hault. If you would like to set a maximum size to defend against this, a suggested maximum would be 15000.
get_roi_contour (bool) – whether to get contour for boundary of region of interest (ROI). This is most relevant when dealing with multiple ROIs per slide and with rotated rectangular or polygonal ROIs.
roi_group (str) – name of roi group in the GT_Codes dataframe (eg roi)
discard_nonenclosed_background (bool) – If a background group contour is NOT fully enclosed, discard it. This is a purely aesthetic method, makes sure that the background group contours (eg stroma) are discarded by default to avoid cluttering the field when posted to DSA for viewing online. The only exception is if they are enclosed within something else (eg tumor), in which case they are kept since they represent holes. This is related to https://github.com/DigitalSlideArchive/HistomicsTK/issues/675 WARNING - This is a bit slower since the contours will have to be converted to shapely polygons. It is not noticeable for hundreds of contours, but you will notice the speed difference if you are parsing thousands of contours. Default, for this reason, is False.
background_group (str) – name of background group in the GT_codes dataframe (eg mostly_stroma)
verbose (bool) – Print progress to screen?
monitorPrefix (str) – text to prepend to printed statements

Returns:

contours extracted from input mask. The following columns are output.

groupstr: annotation group (ground truth label).
colorstr: annotation color if it were to be posted to DSA.
is_roibool: whether this annotation is a region of interest boundary
yminint: minimum y coordinate
ymaxint: maximum y coordinate
xminint: minimum x coordinate
xmaxint: maximum x coordinate
has_holesbool: whether this contour has holes
touches_edge-topbool: whether this contour touches top mask edge
touches_edge-bottombool: whether this contour touches bottom mask edge
touches_edge-leftbool: whether this contour touches left mask edge
touches_edge-rightbool: whether this contour touches right mask edge
coords_xstr: vertex x coordinates comma-separated values
coords_y: vertex y coordinated comma-separated values

Return type:

pandas DataFrame

histomicstk.annotations_and_masks.masks_to_annotations_handler.get_single_annotation_document_from_contours(contours_df_slice, docname='default', F=1.0, X_OFFSET=0, Y_OFFSET=0, opacity=0.3, lineWidth=4.0, verbose=True, monitorPrefix='')[source]¶

Given dataframe of contours, get annotation document.

This uses the large_image annotation schema to create an annotation document that maybe posted to DSA for viewing using something like: resp = gc.post(“/annotation?itemId=” + slide_id, json=annotation_doc) The annotation schema can be found at: github.com/girder/large_image/blob/master/docs/annotations.md .

Parameters:

contours_df_slice (pandas DataFrame) –
The following columns are of relevance and must be contained.

groupstr
annotation group (ground truth label).

colorstr
annotation color if it were to be posted to DSA.

coords_xstr
vertex x coordinates comma-separated values

coords_y
vertex y coordinated comma-separated values
docname (str) – annotation document name
F (float) – how much smaller is the mask where the contours come from is relative to the slide scan magnification. For example, if the mask is at 10x whereas the slide scan magnification is 20x, then F would be 2.0.
X_OFFSET (int) – x offset to add to contours at BASE (SCAN) magnification
Y_OFFSET (int) – y offset to add to contours at BASE (SCAN) magnification
opacity (float) – opacity of annotation elements (in the range [0, 1])
lineWidth (float) – width of boarders of annotation elements
verbose (bool) – Print progress to screen?
monitorPrefix (str) – text to prepend to printed statements

Returns:

DSA-style annotation document ready to be post for viewing.

Return type:

dict