Converting annotations to semantic segmentation mask images

Created on Mon Aug 12 18:33:48 2019.

@author: tageldim

histomicstk.annotations_and_masks.annotations_to_masks_handler.get_all_rois_from_slide(gc, slide_id, GTCodes_dict, save_directories, get_image_and_mask_from_slide_kwargs=None, max_roiside=None, slide_name=None, verbose=True, monitorPrefix='')[source]

Parse annotations and saves ground truth masks for ALL ROIs.

Get all ROIs in a single slide. This is mainly uses get_image_and_mask_from_slide(), which should be referred to for implementation details.

Parameters:
  • gc (object) – girder client object to make requests, for example: gc = girder_client.GirderClient(apiUrl = APIURL) gc.authenticate(interactive=True)

  • slide_id (str) – girder id for item (slide)

  • GTCodes_dict (dict) – the ground truth codes and information dict. This is a dict that is indexed by the annotation group name and each entry is in turn a dict with the following keys: - group: group name of annotation (string), eg. mostly_tumor - overlay_order: int, how early to place the annotation in the mask. Larger values means this annotation group is overlaid last and overwrites whatever overlaps it. - GT_code: int, desired ground truth code (in the mask) Pixels of this value belong to corresponding group (class) - is_roi: Flag for whether this group encodes an ROI - is_background_class: Flag, whether this group is the default fill value inside the ROI. For example, you may decide that any pixel inside the ROI is considered stroma.

  • save_directories (dict) – paths to directories to save data. Each entry is a string, and the following keys are allowed - ROI: path to save masks (labeled images) - rgb: path to save rgb images - contours: path to save annotation contours - visualization: path to save rgb visualization overlays

  • get_image_and_mask_from_slide_kwargs (dict) – kwargs to pass to get_image_and_mask_from_slide() default values are assigned if speceific parameters are not given.

  • max_roiside (int or None) – If int, this is the maximum allowed side for a downloaded region. If a region-of-interest is larger than this size, then it is tiled into non-overlapping regions whose maximal side is max_roiside. If None, the ROI is downloaded as-is, even if it was extremely large. If you know your slides have very large ROI annotations, the safer option is to set a max_roiside. A good value may be 5000-8000 pixels.

  • slide_name (str or None) – If not given, it’s inferred using a server request using girder client.

  • verbose (bool) – Print progress to screen?

  • monitorPrefix (str) – text to prepend to printed statements

Returns:

each entry contains the following keys - ROI: path to saved mask (labeled image) - rgb: path to saved rgb image - contours: path to saved annotation contours - visualization: path to saved rgb visualization overlay

Return type:

list of dicts

histomicstk.annotations_and_masks.annotations_to_masks_handler.get_image_and_mask_from_slide(gc, slide_id, GTCodes_dict, MPP=5.0, MAG=None, mode='min_bounding_box', bounds=None, idx_for_roi=None, slide_annotations=None, element_infos=None, get_roi_mask_kwargs=None, get_contours_kwargs=None, linewidth=0.2, get_rgb=True, get_contours=True, get_visualization=True, tau=10)[source]

Parse region from the slide and get its corresponding labeled mask.

This is a wrapper around get_roi_mask() which should be referred to for implementation details.

Parameters:
  • gc (object) – girder client object to make requests, for example: gc = girder_client.GirderClient(apiUrl = APIURL) gc.authenticate(interactive=True)

  • slide_id (str) – girder id for item (slide)

  • GTCodes_dict (dict) – the ground truth codes and information dict. This is a dict that is indexed by the annotation group name and each entry is in turn a dict with the following keys: - group: group name of annotation (string), eg. mostly_tumor - overlay_order: int, how early to place the annotation in the mask. Larger values means this annotation group is overlaid last and overwrites whatever overlaps it. - GT_code: int, desired ground truth code (in the mask) Pixels of this value belong to corresponding group (class) - is_roi: Flag for whether this group encodes an ROI - is_background_class: Flag, whether this group is the default fill value inside the ROI. For example, you may decide that any pixel inside the ROI is considered stroma.

  • MPP (float or None) – Microns-per-pixel – best use this as it’s more well-defined than magnification which is more scanner/manufacturer specific. MPP of 0.25 often roughly translates to 40x

  • MAG (float or None) – If you prefer to use whatever magnification is reported in slide. If neither MPP or MAG is provided, everything is retrieved without scaling at base (scan) magnification.

  • mode (str) – This specifies which part of the slide to get the mask from. Allowed modes include the following - wsi: get scaled up/down version of mask of whole slide - min_bounding_box: get minimum box for all annotations in slide - manual_bounds: use given ROI bounds provided by the ‘bounds’ param - polygonal_bounds: use the idx_for_roi param to get coordinates

  • bounds (dict or None) – if not None, has keys ‘XMIN’, ‘XMAX’, ‘YMIN’, ‘YMAX’ for slide region coordinates (AT BASE MAGNIFICATION) to get labeled image (mask) for. Use this with the ‘manual_bounds’ run mode.

  • idx_for_roi (int) – index of ROI within the element_infos dataframe. Use this with the ‘polygonal_bounds’ run mode.

  • slide_annotations (list or None) – Give this parameter to avoid re-getting slide annotations. If you do provide the annotations, though, make sure you have used scale_slide_annotations() to scale them up/down by sf BEFOREHAND.

  • element_infos (pandas DataFrame.) – The columns annidx and elementidx encode the dict index of annotation document and element, respectively, in the original slide_annotations list of dictionaries. This can be obained by get_bboxes_from_slide_annotations() method. Make sure you have used scale_slide_annotations().

  • get_roi_mask_kwargs (dict) – extra kwargs for get_roi_mask()

  • get_contours_kwargs (dict) – extra kwargs for get_contours_from_mask()

  • linewidth (float) – visualization line width

  • get_rgb (bool) – get rgb image?

  • get_contours (bool) – get annotation contours? (relative to final mask)

  • get_visualization (bool) – get overlaid annotation bounds over RGB for visualization

  • tau (int) – maximum difference (in pixels) between fetched image and mask allowed. Above this threshold, an error is raised indicating you may have some problem in your parameters or elsewhere. If the difference is less then tau, the rgb image and mask are resized to match each other before being returned

Returns:

Results dict containing one or more of the following keys bounds: dict of bounds at scan magnification ROI - (mxn) labeled image (mask) rgb - (mxnx3 np array) corresponding rgb image contours - list, each entry is a dict version of a row from the output of masks_to_annotations_handler.get_contours_from_mask() visualization - (mxnx3 np array) visualization overlay

Return type:

dict

histomicstk.annotations_and_masks.annotations_to_masks_handler.get_mask_from_slide(GTCodes_dict, roiinfo, slide_annotations, element_infos, sf=1.0, get_roi_mask_kwargs=None)[source]

Parse region from the slide and get its corresponding labeled mask.

This is a wrapper around get_roi_mask() which should be referred to for implementation details. If roiinfo is None, all annotations in the slide are parsed into labeled image (mask) form. Otherwise, the bounding box coordinates in roiinfo are used.

Parameters:
  • GTCodes_dict (dict) – the ground truth codes and information dict. This is a dict that is indexed by the annotation group name and each entry is in turn a dict with the following keys: - group: group name of annotation (string), eg. mostly_tumor - overlay_order: int, how early to place the annotation in the mask. Larger values means this annotation group is overlaid last and overwrites whatever overlaps it. - GT_code: int, desired ground truth code (in the mask) Pixels of this value belong to corresponding group (class) - is_roi: Flag for whether this group encodes an ROI - is_background_class: Flag, whether this group is the default fill value inside the ROI. For example, you may decide that any pixel inside the ROI is considered stroma.

  • roiinfo (dict or None) – if not None, has keys ‘XMIN’, ‘XMAX’, ‘YMIN’, ‘YMAX’ for slide region coordinates (AT BASE MAGNIFICATION) to get labeled image (mask) for.

  • sf (float) – scale factor to multiple coordinates (eg 0.5 would halve size)

  • slide_annotations (list) – Make sure you have used scale_slide_annotations() to scale them up/down by sf BEFOREHAND.

  • element_infos (pandas DataFrame.) – The columns annidx and elementidx encode the dict index of annotation document and element, respectively, in the original slide_annotations list of dictionaries. This can be obained by get_bboxes_from_slide_annotations() method. Make sure you have used scale_slide_annotations().

  • get_roi_mask_kwargs (dict) – extra kwargs for get_roi_mask()

Returns:

  • Np array – (N x 2), where pixel values encode class membership. IMPORTANT NOTE: Zero pixels have special meaning and do NOT encode specific ground truth class. Instead, they simply mean Outside mask and should be IGNORED during model training or evaluation.

  • Dict – information about mask

histomicstk.annotations_and_masks.annotations_to_masks_handler.get_roi_mask(slide_annotations, element_infos, GTCodes_df, idx_for_roi, iou_thresh=0.0, roiinfo=None, crop_to_roi=True, use_shapely=True, verbose=False, monitorPrefix='')[source]

Parse annotations and gets a ground truth mask for a single ROI.

This will look at all slide annotations and get ones that overlap with the region of interest (ROI) and assigns them to mask.

Parameters:
  • slide_annotations (list of dicts) – response from server request

  • element_infos (pandas DataFrame.) – The columns annidx and elementidx encode the dict index of annotation document and element, respectively, in the original slide_annotations list of dictionaries. This can be obain by get_bboxes_from_slide_annotations() method

  • GTCodes_df (pandas Dataframe) – the ground truth codes and information dataframe. WARNING: Modified inside this method so pass a copy. This is a dataframe that is indexed by the annotation group name and has the following columns: - group: group name of annotation (string), eg. mostly_tumor - overlay_order: int, how early to place the annotation in the mask. Larger values means this annotation group is overlaid last and overwrites whatever overlaps it. - GT_code: int, desired ground truth code (in the mask) Pixels of this value belong to corresponding group (class) - is_roi: Flag for whether this group encodes an ROI - is_background_class: Flag, whether this group is the default fill value inside the ROI. For example, you may decide that any pixel inside the ROI is considered stroma.

  • idx_for_roi (int) – index of ROI within the element_infos dataframe.

  • iou_thresh (float) – how much bounding box overlap is enough to consider an annotation to belong to the region of interest

  • roiinfo (pandas series or dict) – contains information about the roi. Keys will be added to this index containing info about the roi like bounding box location and size.

  • crop_to_roi (bool) – flag of whether to crop polygons to roi (prevent overflow beyond roi edge)

  • use_shapely (bool) – flag of whether to precisely determine whether an element belongs to an ROI using shapely polygons. Slightly slower. If set to False, overlapping bounding box is used as a cheap but less precise indicator of inclusion.

  • verbose (bool) – Print progress to screen?

  • monitorPrefix (str) – text to prepend to printed statements

Returns:

  • Np array – (N x 2), where pixel values encode class membership. IMPORTANT NOTE: Zero pixels have special meaning and do NOT encode specific ground truth class. Instead, they simply mean Outside ROI and should be IGNORED during model training or evaluation.

  • Dict – information about ROI