Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles
Jevgenij Gamper,Nasir M. Rajpoot +1 more
- 01 Jun 2021
- pp 16549-16559
TL;DR: It is shown that ARCH is the only CP dataset to (ARCH-)rival its computer vision analog MS-COCO Captions, and conjecture that an encoder pre-trained on dense image captions learns transferable representations for most CP tasks.
read more
Abstract: We present ARCH, a computational pathology (CP) multiple instance captioning dataset to facilitate dense supervision of CP tasks. Existing CP datasets focus on narrow tasks; ARCH on the other hand contains dense diagnos-tic and morphological descriptions for a range of stains, tissue types and pathologies. Using intrinsic dimensionality estimation, we show that ARCH is the only CP dataset to (ARCH-)rival its computer vision analog MS-COCO Captions. We conjecture that an encoder pre-trained on dense image captions learns transferable representations for most CP tasks. We support the conjecture with evidence that ARCH representation transfers to a variety of pathology sub-tasks better than ImageNet features or representations obtained via self-supervised or multi-task learning on pathology images alone. We release our best model and invite other researchers to test it on their CP tasks.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Multimodal Learning with Transformers: A Survey
TL;DR: A comprehensive survey of Transformer techniques oriented at multimodal data and a discussion of open problems and potential research directions for the community are presented.
337
A visual–language foundation model for pathology image analysis using medical Twitter
Zhi Huang,F. Bianchi,Byron Rogers,Thomas J. Montine,James Zou +4 more
TL;DR: This work develops pathology language–image pretraining (PLIP), a multimodal artificial intelligence with both image and text understanding, which is trained on OpenPath and enables users to retrieve similar cases by either image or natural language search, greatly facilitating knowledge sharing.
308
Multimodal Learning With Transformers: A Survey
TL;DR: Transformer is a promising neural network learner, and has achieved great success in various machine learning tasks as discussed by the authors , thanks to the recent prevalence of multimodal applications and Big Data, Transformer-based multimodAL learning has become a hot topic in AI research.
Self-supervised learning for medical image classification: a systematic review and implementation guidelines
TL;DR: In this paper , the authors provide consistent descriptions of different self-supervised learning strategies and compose a systematic review of papers published between 2012 and 2022 on PubMed, Scopus, and ArXiv.
A visual-language foundation model for computational pathology.
Ming Y. Lu,Bowen Chen,Drew F. K. Williamson,Richard J Chen,Ivy Liang,Tong Ding,Guillaume Jaume,Igor Odintsov,Long Le,Georg Gerber,Anil V. Parwani,Andrew Zhang,Faisal Mahmood +12 more
154
References
CIDEr: Consensus-based image description evaluation
Ramakrishna Vedantam,C. Lawrence Zitnick,Devi Parikh +2 more
- 07 Jun 2015
TL;DR: A novel paradigm for evaluating image descriptions that uses human consensus is proposed and a new automated metric that captures human judgment of consensus better than existing metrics across sentences generated by various sources is evaluated.
•Posted Content
Scaling Laws for Neural Language Models
Jared Kaplan,Samuel McCandlish,Thomas Henighan,Tom B. Brown,Benjamin Chess,Rewon Child,Scott Gray,Alec Radford,Jeffrey Wu,Dario Amodei +9 more
TL;DR: Larger models are significantly more sample-efficient, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping significantly before convergence.
3.3K
Clinical-grade computational pathology using weakly supervised deep learning on whole slide images.
Gabriele Campanella,Gabriele Campanella,Matthew G. Hanna,Luke Geneslaw,Allen P. Miraflor,Vitor Werneck Krauss Silva,Klaus J. Busam,Edi Brogi,Victor E. Reuter,David S. Klimstra,Thomas J. Fuchs,Thomas J. Fuchs +11 more
TL;DR: A multiple instance learning-based deep learning system that uses only the reported diagnoses as labels for training, thereby avoiding expensive and time-consuming pixel-wise manual annotations, and has the ability to train accurate classification models at unprecedented scale.
2.2K
Digging Into Self-Supervised Monocular Depth Estimation
Clément Godard,Oisin Mac Aodha,Michael Firman,Gabriel J. Brostow +3 more
- 01 Oct 2019
TL;DR: In this paper, the authors propose a set of improvements, which together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods, and demonstrate the effectiveness of each component in isolation, and show high quality, state-of-theart results on the KITTI benchmark.
Panoptic Segmentation
Alexander Kirillov,Kaiming He,Ross Girshick,Carsten Rother,Piotr Dollár +4 more
- 01 Jun 2019
TL;DR: A novel panoptic quality (PQ) metric is proposed that captures performance for all classes (stuff and things) in an interpretable and unified manner and is performed a rigorous study of both human and machine performance for PS on three existing datasets, revealing interesting insights about the task.
1.8K