{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Local backup and SQL querying of annotation data\n", "\n", "**Overview:**\n", "\n", "![sqlite](https://user-images.githubusercontent.com/22067552/80071301-5e868080-8512-11ea-8115-47fcefdb568a.png)\n", "\n", "Annotations represent a significant time investment for the users who generate\n", "them and they should be backed up frequently. The simplest way to backup the\n", "annotations in a DSA database is to perform a\n", "[mongodump](https://docs.mongodb.com/manual/reference/program/mongodump/)\n", "operation. While frequent `mongodump` operations are always important to guard\n", "against failures they have the following disadvantages:\n", "- You need to have access on the server where the annotations are hosted.\n", "- The entire Mongo database is backed up, not just the folder you care about.\n", "- You cannot query the database using SQL queries.\n", "HistomicsTK has utility functions that allow the recursive backup of a girder\n", "database locally as a combination of `.json` files (most similar to the raw format),\n", "tabular files (`.csv`), and/or an SQLite database.\n", "\n", "The SQLite database can easily\n", "be viewed using, for example, an [offline](https://sqlitebrowser.org/dl/)\n", "sqlite viewer or even an [online](https://sqliteonline.com/) sqlite\n", "viewer.\n", "\n", "**Where to look:**\n", "\n", "```\n", "|_histomicstk/\n", " |_annotations_and_masks/\n", " |_annotation_database_parser.py\n", " |_annotation_and_mask_utils.py -> parse_slide_annotations_into_tables()\n", " |_tests/\n", " |_test_annotation_database_parser.py\n", " |_test_annotation_and_mask_utils.py -> test_parse_slide_annotations_into_table()\n", " \n", "```" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import os\n", "import pandas as pd\n", "import sqlalchemy as db\n", "\n", "from histomicstk.utils.girder_convenience_utils import connect_to_api\n", "from histomicstk.annotations_and_masks.annotation_database_parser import (\n", " dump_annotations_locally, parse_annotations_to_local_tables)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Connect and set parameters\n", "\n", "We use an api key to connect to the remote server, set the girder ID of the folder we want to backup, and set the local path where the backup will be stored." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "gc = connect_to_api(\n", " apiurl='http://candygram.neurology.emory.edu:8080/api/v1/',\n", " apikey='kri19nTIGOkWH01TbzRqfohaaDWb6kPecRqGmemb')\n", "\n", "# This is the girder ID of the folder we would like to backup and parse locally\n", "SAMPLE_FOLDER_ID = '5e24c20dddda5f8398695671'\n", "\n", "# This is where the annotations and sqlite database will be dumped locally\n", "savepath = '/home/mtageld/Desktop/tmp/concordance/'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Examine functions for pulling annotation data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the main function you will be using to walk the folder and pull the annotations from the remote server" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dump annotations of folder and subfolders locally recursively.\n", "\n", " This reproduces this tiered structure locally and (possibly) dumps\n", " annotations there. Adapted from Lee A.D. Cooper\n", "\n", " Parameters\n", " -----------\n", " gc : girder_client.GirderClient\n", " authenticated girder client instance\n", "\n", " folderid : str\n", " girder id of source (base) folder\n", "\n", " local : str\n", " local path to dump annotations\n", "\n", " save_json : bool\n", " whether to dump annotations as json file\n", "\n", " save_sqlite : bool\n", " whether to save the backup into an sqlite database\n", "\n", " dbcon : sqlalchemy.create_engine.connect() object\n", " IGNORE THIS PARAMETER!! This is used internally.\n", "\n", " callback : function\n", " function to call that CAN accept AT LEAST the following params\n", " - item: girder response with item information\n", " - annotations: loaded annotations\n", " - local: local directory\n", " - monitorPrefix: string\n", " - dbcon: sqlalchemy.create_engine.connect() object\n", " You can just add kwargs at the end of your callback definition\n", " for simplicity.\n", "\n", " callback_kwargs : dict\n", " kwargs to pass along to callback. DO NOT pass any of the parameters\n", " item, annotations, local, monitorPrefix, or dbcon as these will be\n", " internally passed. Just include any specific paremeters for the\n", " callback. See parse_annotations_to_local_tables() above for\n", " an example of a callback and the unir test of this function.\n", "\n", " \n" ] } ], "source": [ "print(dump_annotations_locally.__doc__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This optionally calls the following function to parse annotations into tables that are added to an sqlite database." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Parse loaded annotations for slide into tables.\n", "\n", " Parameters\n", " ----------\n", " item : dict\n", " girder response with item information\n", "\n", " annotations : dict\n", " loaded annotations\n", "\n", " local : str\n", " local directory\n", "\n", " save_csv : bool\n", " whether to use histomicstk.annotations_and_masks.annotation_and_mask.\n", " parse_slide_annotations_into_tables() to get a tabular representation\n", " (including some simple calculations like bounding box) and save\n", " the output as two csv files, one representing the annotation documents\n", " and the other representing the actual annotation elements (polygons).\n", "\n", " save_sqlite : bool\n", " whether to save the backup into an sqlite database\n", "\n", " dbcon : sqlalchemy.create_engine.connect() object\n", " IGNORE THIS PARAMETER!! This is used internally.\n", "\n", " monitorPrefix : str\n", " text to prepend to printed statements\n", "\n", " \n" ] } ], "source": [ "print(parse_annotations_to_local_tables.__doc__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Case 1: Simple backup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The simplest case is to backup the information about the girder folders, items, and annotations as `.json` files, with a folder structure replicated locally as it is in the girder database. The user may also elect to save the folder and item/slide information (but not the annotations) as the following tables in a SQLite database:\n", "\n", "- **folders**: all girder folders contained within the folder that the user wants to backup. This includes an 'absolute girder path' convenience column. The column '\\_id' is the unique girder ID. \n", "\n", "- **items**: all items (slide). The column '\\_id' is the unique girder ID, and is linked to the folders table by the 'folderId' column. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Here is the syntax:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ": save folder info\n", "Participant_1: save folder info\n", "Participant_1: slide 1 of 5 (TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD7-A61535786297.svs): save item info\n", "Participant_1: slide 1 of 5 (TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD7-A61535786297.svs): load annotations\n", "Participant_1: slide 1 of 5 (TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD7-A61535786297.svs): save annotations\n", "Participant_1: slide 2 of 5 (TCGA-A2-A0YM-01Z-00-DX1.A48B4C96-2CC5-464C-98B7-F0F92AE56533.svs): save item info\n", "Participant_1: slide 2 of 5 (TCGA-A2-A0YM-01Z-00-DX1.A48B4C96-2CC5-464C-98B7-F0F92AE56533.svs): load annotations\n", "Participant_1: slide 2 of 5 (TCGA-A2-A0YM-01Z-00-DX1.A48B4C96-2CC5-464C-98B7-F0F92AE56533.svs): save annotations\n", "Participant_1: slide 3 of 5 (TCGA-A7-A0DA-01Z-00-DX1.5F087009-16E9-4A07-BA24-62340E108B17.svs): save item info\n", "Participant_1: slide 3 of 5 (TCGA-A7-A0DA-01Z-00-DX1.5F087009-16E9-4A07-BA24-62340E108B17.svs): load annotations\n", "Participant_1: slide 3 of 5 (TCGA-A7-A0DA-01Z-00-DX1.5F087009-16E9-4A07-BA24-62340E108B17.svs): save annotations\n", "Participant_1: slide 4 of 5 (TCGA-AR-A1AY-01Z-00-DX1.6AC0BE3B-FFC5-4EDA-9E40-B18CAAC52B81.svs): save item info\n", "Participant_1: slide 4 of 5 (TCGA-AR-A1AY-01Z-00-DX1.6AC0BE3B-FFC5-4EDA-9E40-B18CAAC52B81.svs): load annotations\n", "Participant_1: slide 4 of 5 (TCGA-AR-A1AY-01Z-00-DX1.6AC0BE3B-FFC5-4EDA-9E40-B18CAAC52B81.svs): save annotations\n", "Participant_1: slide 5 of 5 (TCGA-BH-A0BG-01Z-00-DX1.0838FB7F-8C85-4687-9F70-D136A1063383.svs): save item info\n", "Participant_1: slide 5 of 5 (TCGA-BH-A0BG-01Z-00-DX1.0838FB7F-8C85-4687-9F70-D136A1063383.svs): load annotations\n", "Participant_1: slide 5 of 5 (TCGA-BH-A0BG-01Z-00-DX1.0838FB7F-8C85-4687-9F70-D136A1063383.svs): save annotations\n", "Participant_2: save folder info\n", "Participant_2: slide 1 of 5 (TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD7-A61535786297.svs): save item info\n", "Participant_2: slide 1 of 5 (TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD7-A61535786297.svs): load annotations\n", "Participant_2: slide 1 of 5 (TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD7-A61535786297.svs): save annotations\n", "Participant_2: slide 2 of 5 (TCGA-A2-A0YM-01Z-00-DX1.A48B4C96-2CC5-464C-98B7-F0F92AE56533.svs): save item info\n", "Participant_2: slide 2 of 5 (TCGA-A2-A0YM-01Z-00-DX1.A48B4C96-2CC5-464C-98B7-F0F92AE56533.svs): load annotations\n", "Participant_2: slide 2 of 5 (TCGA-A2-A0YM-01Z-00-DX1.A48B4C96-2CC5-464C-98B7-F0F92AE56533.svs): save annotations\n", "Participant_2: slide 3 of 5 (TCGA-A7-A0DA-01Z-00-DX1.5F087009-16E9-4A07-BA24-62340E108B17.svs): save item info\n", "Participant_2: slide 3 of 5 (TCGA-A7-A0DA-01Z-00-DX1.5F087009-16E9-4A07-BA24-62340E108B17.svs): load annotations\n", "Participant_2: slide 3 of 5 (TCGA-A7-A0DA-01Z-00-DX1.5F087009-16E9-4A07-BA24-62340E108B17.svs): save annotations\n", "Participant_2: slide 4 of 5 (TCGA-AR-A1AY-01Z-00-DX1.6AC0BE3B-FFC5-4EDA-9E40-B18CAAC52B81.svs): save item info\n", "Participant_2: slide 4 of 5 (TCGA-AR-A1AY-01Z-00-DX1.6AC0BE3B-FFC5-4EDA-9E40-B18CAAC52B81.svs): load annotations\n", "Participant_2: slide 4 of 5 (TCGA-AR-A1AY-01Z-00-DX1.6AC0BE3B-FFC5-4EDA-9E40-B18CAAC52B81.svs): save annotations\n", "Participant_2: slide 5 of 5 (TCGA-BH-A0BG-01Z-00-DX1.0838FB7F-8C85-4687-9F70-D136A1063383.svs): save item info\n", "Participant_2: slide 5 of 5 (TCGA-BH-A0BG-01Z-00-DX1.0838FB7F-8C85-4687-9F70-D136A1063383.svs): load annotations\n", "Participant_2: slide 5 of 5 (TCGA-BH-A0BG-01Z-00-DX1.0838FB7F-8C85-4687-9F70-D136A1063383.svs): save annotations\n" ] } ], "source": [ "# recursively save annotations -- JSONs + sqlite for folders/items\n", "dump_annotations_locally(\n", " gc, folderid=SAMPLE_FOLDER_ID, local=savepath,\n", " save_json=True, save_sqlite=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Check the results" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/mtageld/Desktop/tmp/concordance/\r\n", "├── Concordance.json\r\n", "├── Concordance.sqlite\r\n", "├── Participant_1\r\n", "│   ├── Participant_1.json\r\n", "│   ├── TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD7-A61535786297.svs_annotations.json\r\n", "│   ├── TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD7-A61535786297.svs.json\r\n", "│   ├── TCGA-A2-A0YM-01Z-00-DX1.A48B4C96-2CC5-464C-98B7-F0F92AE56533.svs_annotations.json\r\n", "│   ├── TCGA-A2-A0YM-01Z-00-DX1.A48B4C96-2CC5-464C-98B7-F0F92AE56533.svs.json\r\n", "│   ├── TCGA-A7-A0DA-01Z-00-DX1.5F087009-16E9-4A07-BA24-62340E108B17.svs_annotations.json\r\n", "│   ├── TCGA-A7-A0DA-01Z-00-DX1.5F087009-16E9-4A07-BA24-62340E108B17.svs.json\r\n", "│   ├── TCGA-AR-A1AY-01Z-00-DX1.6AC0BE3B-FFC5-4EDA-9E40-B18CAAC52B81.svs_annotations.json\r\n", "│   ├── TCGA-AR-A1AY-01Z-00-DX1.6AC0BE3B-FFC5-4EDA-9E40-B18CAAC52B81.svs.json\r\n", "│   ├── TCGA-BH-A0BG-01Z-00-DX1.0838FB7F-8C85-4687-9F70-D136A1063383.svs_annotations.json\r\n", "│   └── TCGA-BH-A0BG-01Z-00-DX1.0838FB7F-8C85-4687-9F70-D136A1063383.svs.json\r\n", "└── Participant_2\r\n", " ├── Participant_2.json\r\n", " ├── TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD7-A61535786297.svs_annotations.json\r\n", " ├── TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD7-A61535786297.svs.json\r\n", " ├── TCGA-A2-A0YM-01Z-00-DX1.A48B4C96-2CC5-464C-98B7-F0F92AE56533.svs_annotations.json\r\n", " ├── TCGA-A2-A0YM-01Z-00-DX1.A48B4C96-2CC5-464C-98B7-F0F92AE56533.svs.json\r\n", " ├── TCGA-A7-A0DA-01Z-00-DX1.5F087009-16E9-4A07-BA24-62340E108B17.svs_annotations.json\r\n", " ├── TCGA-A7-A0DA-01Z-00-DX1.5F087009-16E9-4A07-BA24-62340E108B17.svs.json\r\n", " ├── TCGA-AR-A1AY-01Z-00-DX1.6AC0BE3B-FFC5-4EDA-9E40-B18CAAC52B81.svs_annotations.json\r\n", " ├── TCGA-AR-A1AY-01Z-00-DX1.6AC0BE3B-FFC5-4EDA-9E40-B18CAAC52B81.svs.json\r\n", " ├── TCGA-BH-A0BG-01Z-00-DX1.0838FB7F-8C85-4687-9F70-D136A1063383.svs_annotations.json\r\n", " └── TCGA-BH-A0BG-01Z-00-DX1.0838FB7F-8C85-4687-9F70-D136A1063383.svs.json\r\n", "\r\n", "2 directories, 24 files\r\n" ] } ], "source": [ "!tree '/home/mtageld/Desktop/tmp/concordance/'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Query the database" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# Connect to the database\n", "sql_engine = db.create_engine(\n", " 'sqlite:///%s/Concordance.sqlite' % savepath)\n", "dbcon = sql_engine.connect()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
_idnamefolder_path
05e24c20dddda5f8398695671ConcordanceUncrossPolygonTest/Concordance/
15e24c0dfddda5f839869556cParticipant_1UncrossPolygonTest/Concordance/Participant_1/
25e24c0d3ddda5f8398694f06Participant_2UncrossPolygonTest/Concordance/Participant_2/
\n
", "text/plain": " _id name \\\n0 5e24c20dddda5f8398695671 Concordance \n1 5e24c0dfddda5f839869556c Participant_1 \n2 5e24c0d3ddda5f8398694f06 Participant_2 \n\n folder_path \n0 UncrossPolygonTest/Concordance/ \n1 UncrossPolygonTest/Concordance/Participant_1/ \n2 UncrossPolygonTest/Concordance/Participant_2/ " }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# folders table\n", "folders_df = pd.read_sql_query(\n", " \"\"\"\n", " SELECT \"_id\", \"name\", \"folder_path\"\n", " FROM \"folders\"\n", " ;\"\"\", dbcon)\n", "\n", "folders_df" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
_idnamefolderId
05e24c0dfddda5f8398695571TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD...5e24c0dfddda5f839869556c
15e24c0dfddda5f8398695586TCGA-A2-A0YM-01Z-00-DX1.A48B4C96-2CC5-464C-98B...5e24c0dfddda5f839869556c
25e24c0dfddda5f83986955b1TCGA-A7-A0DA-01Z-00-DX1.5F087009-16E9-4A07-BA2...5e24c0dfddda5f839869556c
35e24c0dfddda5f83986955c1TCGA-AR-A1AY-01Z-00-DX1.6AC0BE3B-FFC5-4EDA-9E4...5e24c0dfddda5f839869556c
45e24c0e0ddda5f83986955d8TCGA-BH-A0BG-01Z-00-DX1.0838FB7F-8C85-4687-9F7...5e24c0dfddda5f839869556c
55e24c0dbddda5f839869531aTCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD...5e24c0d3ddda5f8398694f06
65e24c0dbddda5f8398695342TCGA-A2-A0YM-01Z-00-DX1.A48B4C96-2CC5-464C-98B...5e24c0d3ddda5f8398694f06
75e24c0dbddda5f8398695372TCGA-A7-A0DA-01Z-00-DX1.5F087009-16E9-4A07-BA2...5e24c0d3ddda5f8398694f06
85e24c0dcddda5f8398695387TCGA-AR-A1AY-01Z-00-DX1.6AC0BE3B-FFC5-4EDA-9E4...5e24c0d3ddda5f8398694f06
95e24c0dcddda5f83986953aaTCGA-BH-A0BG-01Z-00-DX1.0838FB7F-8C85-4687-9F7...5e24c0d3ddda5f8398694f06
\n
", "text/plain": " _id \\\n0 5e24c0dfddda5f8398695571 \n1 5e24c0dfddda5f8398695586 \n2 5e24c0dfddda5f83986955b1 \n3 5e24c0dfddda5f83986955c1 \n4 5e24c0e0ddda5f83986955d8 \n5 5e24c0dbddda5f839869531a \n6 5e24c0dbddda5f8398695342 \n7 5e24c0dbddda5f8398695372 \n8 5e24c0dcddda5f8398695387 \n9 5e24c0dcddda5f83986953aa \n\n name folderId \n0 TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD... 5e24c0dfddda5f839869556c \n1 TCGA-A2-A0YM-01Z-00-DX1.A48B4C96-2CC5-464C-98B... 5e24c0dfddda5f839869556c \n2 TCGA-A7-A0DA-01Z-00-DX1.5F087009-16E9-4A07-BA2... 5e24c0dfddda5f839869556c \n3 TCGA-AR-A1AY-01Z-00-DX1.6AC0BE3B-FFC5-4EDA-9E4... 5e24c0dfddda5f839869556c \n4 TCGA-BH-A0BG-01Z-00-DX1.0838FB7F-8C85-4687-9F7... 5e24c0dfddda5f839869556c \n5 TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD... 5e24c0d3ddda5f8398694f06 \n6 TCGA-A2-A0YM-01Z-00-DX1.A48B4C96-2CC5-464C-98B... 5e24c0d3ddda5f8398694f06 \n7 TCGA-A7-A0DA-01Z-00-DX1.5F087009-16E9-4A07-BA2... 5e24c0d3ddda5f8398694f06 \n8 TCGA-AR-A1AY-01Z-00-DX1.6AC0BE3B-FFC5-4EDA-9E4... 5e24c0d3ddda5f8398694f06 \n9 TCGA-BH-A0BG-01Z-00-DX1.0838FB7F-8C85-4687-9F7... 5e24c0d3ddda5f8398694f06 " }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# items table\n", "items_df = pd.read_sql_query(\n", " \"\"\"\n", " SELECT \"_id\", \"name\", \"folderid\"\n", " FROM \"items\"\n", " ;\"\"\", dbcon)\n", "\n", "items_df" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# cleanup\n", "import shutil\n", "shutil.rmtree(os.path.join(savepath))\n", "os.mkdir(savepath)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Case 2: Parse annotations to tables" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Besides everything outlined above, we could also parse the annotations into tables in the SQLite database and not just save the raw JSON files. This is a little slower because loops through each annotation element. Beside the tables above, the following extra tables are saved into the SQLite database:\n", "\n", "- **annotation_docs**: Information about all the annotation documents (one document is a collection of elements like polygons, rectangles etc). The column 'annotation_girder_id' is the unique girder ID, and is linked to the 'items' table by the 'itemid' column. \n", "\n", "- **annotation_elements**: Information about the annotation elements (polygons, rectangles, points, etc). The column 'element_girder_id' is the unique girder ID, and is linked to the 'annotation_docs' table by the 'annotation_girder_id' column. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Here's the syntax:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Participant_1: slide 1 of 5 (TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD7-A61535786297.svs): load annotations\n", "Participant_1: slide 1 of 5 (TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD7-A61535786297.svs): run callback\n", "Participant_1: slide 1 of 5 (TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD7-A61535786297.svs): parse to tables\n", "Participant_1: slide 2 of 5 (TCGA-A2-A0YM-01Z-00-DX1.A48B4C96-2CC5-464C-98B7-F0F92AE56533.svs): load annotations\n", "Participant_1: slide 2 of 5 (TCGA-A2-A0YM-01Z-00-DX1.A48B4C96-2CC5-464C-98B7-F0F92AE56533.svs): run callback\n", "Participant_1: slide 2 of 5 (TCGA-A2-A0YM-01Z-00-DX1.A48B4C96-2CC5-464C-98B7-F0F92AE56533.svs): parse to tables\n", "Participant_1: slide 3 of 5 (TCGA-A7-A0DA-01Z-00-DX1.5F087009-16E9-4A07-BA24-62340E108B17.svs): load annotations\n", "Participant_1: slide 3 of 5 (TCGA-A7-A0DA-01Z-00-DX1.5F087009-16E9-4A07-BA24-62340E108B17.svs): run callback\n", "Participant_1: slide 3 of 5 (TCGA-A7-A0DA-01Z-00-DX1.5F087009-16E9-4A07-BA24-62340E108B17.svs): parse to tables\n", "Participant_1: slide 4 of 5 (TCGA-AR-A1AY-01Z-00-DX1.6AC0BE3B-FFC5-4EDA-9E40-B18CAAC52B81.svs): load annotations\n", "Participant_1: slide 4 of 5 (TCGA-AR-A1AY-01Z-00-DX1.6AC0BE3B-FFC5-4EDA-9E40-B18CAAC52B81.svs): run callback\n", "Participant_1: slide 4 of 5 (TCGA-AR-A1AY-01Z-00-DX1.6AC0BE3B-FFC5-4EDA-9E40-B18CAAC52B81.svs): parse to tables\n", "Participant_1: slide 5 of 5 (TCGA-BH-A0BG-01Z-00-DX1.0838FB7F-8C85-4687-9F70-D136A1063383.svs): load annotations\n", "Participant_1: slide 5 of 5 (TCGA-BH-A0BG-01Z-00-DX1.0838FB7F-8C85-4687-9F70-D136A1063383.svs): run callback\n", "Participant_1: slide 5 of 5 (TCGA-BH-A0BG-01Z-00-DX1.0838FB7F-8C85-4687-9F70-D136A1063383.svs): parse to tables\n", "Participant_2: slide 1 of 5 (TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD7-A61535786297.svs): load annotations\n", "Participant_2: slide 1 of 5 (TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD7-A61535786297.svs): run callback\n", "Participant_2: slide 1 of 5 (TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD7-A61535786297.svs): parse to tables\n", "Participant_2: slide 2 of 5 (TCGA-A2-A0YM-01Z-00-DX1.A48B4C96-2CC5-464C-98B7-F0F92AE56533.svs): load annotations\n", "Participant_2: slide 2 of 5 (TCGA-A2-A0YM-01Z-00-DX1.A48B4C96-2CC5-464C-98B7-F0F92AE56533.svs): run callback\n", "Participant_2: slide 2 of 5 (TCGA-A2-A0YM-01Z-00-DX1.A48B4C96-2CC5-464C-98B7-F0F92AE56533.svs): parse to tables\n", "Participant_2: slide 3 of 5 (TCGA-A7-A0DA-01Z-00-DX1.5F087009-16E9-4A07-BA24-62340E108B17.svs): load annotations\n", "Participant_2: slide 3 of 5 (TCGA-A7-A0DA-01Z-00-DX1.5F087009-16E9-4A07-BA24-62340E108B17.svs): run callback\n", "Participant_2: slide 3 of 5 (TCGA-A7-A0DA-01Z-00-DX1.5F087009-16E9-4A07-BA24-62340E108B17.svs): parse to tables\n", "Participant_2: slide 4 of 5 (TCGA-AR-A1AY-01Z-00-DX1.6AC0BE3B-FFC5-4EDA-9E40-B18CAAC52B81.svs): load annotations\n", "Participant_2: slide 4 of 5 (TCGA-AR-A1AY-01Z-00-DX1.6AC0BE3B-FFC5-4EDA-9E40-B18CAAC52B81.svs): run callback\n", "Participant_2: slide 4 of 5 (TCGA-AR-A1AY-01Z-00-DX1.6AC0BE3B-FFC5-4EDA-9E40-B18CAAC52B81.svs): parse to tables\n", "Participant_2: slide 5 of 5 (TCGA-BH-A0BG-01Z-00-DX1.0838FB7F-8C85-4687-9F70-D136A1063383.svs): load annotations\n", "Participant_2: slide 5 of 5 (TCGA-BH-A0BG-01Z-00-DX1.0838FB7F-8C85-4687-9F70-D136A1063383.svs): run callback\n", "Participant_2: slide 5 of 5 (TCGA-BH-A0BG-01Z-00-DX1.0838FB7F-8C85-4687-9F70-D136A1063383.svs): parse to tables\n" ] } ], "source": [ "# recursively save annotations -- parse sqlite\n", "dump_annotations_locally(\n", " gc, folderid=SAMPLE_FOLDER_ID, local=savepath,\n", " save_json=False, save_sqlite=True,\n", " callback=parse_annotations_to_local_tables,\n", " callback_kwargs={\n", " 'save_csv': False,\n", " 'save_sqlite': True,\n", " },\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Check the result" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/mtageld/Desktop/tmp/concordance/\r\n", "├── Concordance.sqlite\r\n", "├── Participant_1\r\n", "└── Participant_2\r\n", "\r\n", "2 directories, 1 file\r\n" ] } ], "source": [ "!tree '/home/mtageld/Desktop/tmp/concordance/'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Query the database" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# Connect to the database\n", "sql_engine = db.create_engine(\n", " 'sqlite:///%s/Concordance.sqlite' % savepath)\n", "dbcon = sql_engine.connect()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
annotation_girder_iditemIditem_nameelement_count
05e24c0dfddda5f83986955735e24c0dfddda5f8398695571TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD...1
15e24c0dfddda5f83986955755e24c0dfddda5f8398695571TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD...4
25e24c0dfddda5f839869557a5e24c0dfddda5f8398695571TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD...5
35e24c0dfddda5f83986955805e24c0dfddda5f8398695571TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD...1
45e24c0dfddda5f83986955825e24c0dfddda5f8398695571TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD...1
\n
", "text/plain": " annotation_girder_id itemId \\\n0 5e24c0dfddda5f8398695573 5e24c0dfddda5f8398695571 \n1 5e24c0dfddda5f8398695575 5e24c0dfddda5f8398695571 \n2 5e24c0dfddda5f839869557a 5e24c0dfddda5f8398695571 \n3 5e24c0dfddda5f8398695580 5e24c0dfddda5f8398695571 \n4 5e24c0dfddda5f8398695582 5e24c0dfddda5f8398695571 \n\n item_name element_count \n0 TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD... 1 \n1 TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD... 4 \n2 TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD... 5 \n3 TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD... 1 \n4 TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD... 1 " }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# annotation documents\n", "docs_df = pd.read_sql_query(\n", " \"\"\"\n", " SELECT \"annotation_girder_id\", \"itemId\", \"item_name\", \"element_count\"\n", " FROM 'annotation_docs'\n", " ;\"\"\", dbcon)\n", "docs_df.head()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
groupcount(*)
0Necrosis_or_Debris6
1Mostly_Blood3
2Mostly_Tumor10
3Arteriole_or_Veinule6
4Evaluation10
5Exclude20
6Exclude23
7Mostly_Blood3
8Mostly_Fat9
9Mostly_Lymph2
10Mostly_Lymphocytic_Infiltrate36
11Mostly_PlasmaCells9
12Mostly_Tumor83
13Necrosis_or_Debris10
\n
", "text/plain": " group count(*)\n0 Necrosis_or_Debris 6\n1 Mostly_Blood 3\n2 Mostly_Tumor 10\n3 Arteriole_or_Veinule 6\n4 Evaluation 10\n5 Exclude 20\n6 Exclude 23\n7 Mostly_Blood 3\n8 Mostly_Fat 9\n9 Mostly_Lymph 2\n10 Mostly_Lymphocytic_Infiltrate 36\n11 Mostly_PlasmaCells 9\n12 Mostly_Tumor 83\n13 Necrosis_or_Debris 10" }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# annotation elements\n", "elements_summary = pd.read_sql_query(\n", " \"\"\"\n", " SELECT \"group\", count(*)\n", " FROM 'annotation_elements'\n", " GROUP BY \"group\"\n", " ;\"\"\", dbcon)\n", "elements_summary" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sample screenshots" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![image1](https://user-images.githubusercontent.com/22067552/72703220-09a62900-3b23-11ea-8968-709f938b1eb9.png)\n", "\n", "![image2](https://user-images.githubusercontent.com/22067552/72703277-29d5e800-3b23-11ea-80fe-86d82a4e86b3.png)\n", "\n", "![image3](https://user-images.githubusercontent.com/22067552/72703918-001dc080-3b25-11ea-8ca2-6aa5454536db.png)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 4 }