TL;DR: This work recruited 25 participants, ranging in experience from senior pathologists to medical students, to delineate tissue regions in 151 breast cancer slides using the Digital Slide Archive, and found the scale of annotation data provided notable improvements in image classification accuracy.
Abstract: Motivation While deep-learning algorithms have demonstrated outstanding performance in semantic image segmentation tasks, large annotation datasets are needed to create accurate models. Annotation of histology images is challenging due to the effort and experience required to carefully delineate tissue structures, and difficulties related to sharing and markup of whole-slide images. Results We recruited 25 participants, ranging in experience from senior pathologists to medical students, to delineate tissue regions in 151 breast cancer slides using the Digital Slide Archive. Inter-participant discordance was systematically evaluated, revealing low discordance for tumor and stroma, and higher discordance for more subjectively defined or rare tissue classes. Feedback provided by senior participants enabled the generation and curation of 20 000+ annotated tissue regions. Fully convolutional networks trained using these annotations were highly accurate (mean AUC=0.945), and the scale of annotation data provided notable improvements in image classification accuracy. Availability and implementation Dataset is freely available at: https://goo.gl/cNM4EL. Supplementary information Supplementary data are available at Bioinformatics online.
TL;DR: ProCan is a program aiming to generate high‐quality tissue proteomic data across a broad spectrum of cancer types based on data‐independent acquisition–MS proteomic analysis of annotated tissue samples sourced through collaboration with expert clinical and cancer research groups.
Abstract: The cancer tissue proteome has enormous potential as a source of novel predictive biomarkers in oncology. Progress in the development of mass spectrometry (MS)-based tissue proteomics now presents an opportunity to exploit this by applying the strategies of comprehensive molecular profiling and big-data analytics that are refined in other fields of 'omics research. ProCan (ProCan is a registered trademark) is a program aiming to generate high-quality tissue proteomic data across a broad spectrum of cancer types. It is based on data-independent acquisition-MS proteomic analysis of annotated tissue samples sourced through collaboration with expert clinical and cancer research groups. The practical requirements of a high-throughput translational research program have shaped the approach that ProCan is taking to address challenges in study design, sample preparation, raw data acquisition, and data analysis. The ultimate goal is to establish a large proteomics knowledge-base that, in combination with other cancer 'omics data, will accelerate cancer research.
TL;DR: It is determined that high quality of tissue samples is imperative for both genomic and proteomic molecular research in both clinical and basic research arenas in this study.
Abstract: The success of molecular research and its applications in both the clinical and basic research arenas is strongly dependent on the collection, handling, storage, and quality control of fresh human tissue samples. This tissue bank was set up to bank fresh surgically obtained human tissue using a Clinical Annotated Tissue Database (CATD) in order to capture the associated patient clinical data and demographics using a one way patient encryption scheme to protect patient identification. In this study, we determined that high quality of tissue samples is imperative for both genomic and proteomic molecular research. This paper also contains a brief compilation of the literature involved in the patient ethics, patient informed consent, patient de-identification, tissue collection, processing, and storage as well as basic molecular research generated from the tissue bank using good clinical practices. The current applicable rules, regulations, and guidelines for handling human tissues are briefly discussed. More than 6,610 cancer patients have been consented (97% of those that were contacted by the consenter) and 16,800 tissue specimens have been banked from these patients in 9 years. All samples collected in the bank were QC'd by a pathologist. Approximately 1,550 tissue samples have been requested for use in basic, clinical, and/or biomarker cancer research studies. Each tissue aliquot removed from the bank for a research study were evaluated by a second H&E, if the samples passed the QC, they were submitted for genomic and proteomic molecular analysis/study. Approximately 75% of samples evaluated were of high histologic quality and used for research studies. Since 2003, we changed the patient informed consent to allow the tissue bank to gather more patient clinical follow-up information. Ninety two percent of the patients (1,865 patients) signed the new informed consent form and agreed to be re-contacted for follow-up information on their disease state. In addition, eighty five percent of patients (1,584) agreed to be re-contacted to provide a biological fluid sample to be used for biomarker research.
TL;DR: Working at Eli Lilly was an amazing experience that gave me a very realistic view of what a research scientist's daily job would be.
Abstract: An internship at Eli Lilly is an incredible opportunity to be able to experience. There are thousands of different research projects being conducted at Lilly and I was extremely lucky to get to help with two projects. A clinically annotated tissue databank (CATD) was developed about 10 years ago at Lilly, but it still being perfected. CATD is of great importance to the researchers at Lilly because it enables them to find and obtain the necessary tissues they need for the studies they are conducting. During this study, nearly 4,400 patient specimens were entered into CATD making thousands of new tissues available for research purposes. Microscopic sections from over 650 of these tissues were analyzed for cancer types and imaged for inclusion in the database. Also, the specimens from these patients were organized so they could easily be located once requested in CATD. Another project of great significance researched the amount of data loss when an image file is compressed for computer storage. Storage space on hard drives is limited in many medical research companies, which unfortunately limits the number of images researchers can hold on their hard drives. This could potentially even halt some research from occurring. The compression study evaluated compressing images by creating JPEG and JPEG2000 files in color and black and white. Image analysis included calculating how much data was lost at each compression leveL There was a very delicate line as to how far an image can be compressed to save hard drive space without losing important data from the image. Overall, JPEg2000 was more efficient at compressing color images without data loss than JPEG. Using liver tissues stained in the H&E, a 10% data loss at a 150 level of compression was acceptable for continued data analysis. Working at Eli Lilly was an amazing experience that gave me a very realistic view of what a research scientist's daily job would be.
TL;DR: The Stanford Tissue Microarray Database (TMAD) is a public resource for disseminating annotated tissue images and associated expression data and incorporates the NCI Thesaurus ontology for searching tissues in the cancer domain.
Abstract: The Stanford Tissue Microarray Database (TMAD; http://tma.stanford.edu) is a public resource for disseminating annotated tissue images and associated expression data. Stanford University pathologists, researchers and their collaborators worldwide use TMAD for designing, viewing, scoring and analyzing their tissue microarrays. The use of tissue microarrays allows hundreds of human tissue cores to be simultaneously probed by antibodies to detect protein abundance (Immunohistochemistry; IHC), or by labeled nucleic acids (in situ hybridization; ISH) to detect transcript abundance. TMAD archives multi-wavelength fluorescence and bright-field images of tissue microarrays for scoring and analysis. As of July 2007, TMAD contained 205 161 images archiving 349 distinct probes on 1488 tissue microarray slides. Of these, 31 306 images for 68 probes on 125 slides have been released to the public. To date, 12 publications have been based on these raw public data. TMAD incorporates the NCI Thesaurus ontology for searching tissues in the cancer domain. Image processing researchers can extract images and scores for training and testing classification algorithms. The production server uses the Apache HTTP Server, Oracle Database and Perl application code. Source code is available to interested researchers under a no-cost license.