Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles

doi:10.1109/CVPR46437.2021.01628

Open AccessProceedings Article10.1109/CVPR46437.2021.01628

Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles

Jevgenij Gamper, +1 more

- 01 Jun 2021

- pp 16549-16559

79

TL;DR: It is shown that ARCH is the only CP dataset to (ARCH-)rival its computer vision analog MS-COCO Captions, and conjecture that an encoder pre-trained on dense image captions learns transferable representations for most CP tasks.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.48550/arXiv.2206.06488

Multimodal Learning with Transformers: A Survey

Peng Xu, +2 more

- 13 Jun 2022

- IEEE Transactions on Pattern Analysis an...

TL;DR: A comprehensive survey of Transformer techniques oriented at multimodal data and a discussion of open problems and potential research directions for the community are presented.

...read moreread less

337

Journal Article•10.1038/s41591-023-02504-3

A visual–language foundation model for pathology image analysis using medical Twitter

Zhi Huang, +4 more

- 17 Aug 2023

- news@nature.com

TL;DR: This work develops pathology language–image pretraining (PLIP), a multimodal artificial intelligence with both image and text understanding, which is trained on OpenPath and enables users to retrieve similar cases by either image or natural language search, greatly facilitating knowledge sharing.

...read moreread less

308

•Journal Article•10.1109/tpami.2023.3275156

Multimodal Learning With Transformers: A Survey

01 Jan 2023

- IEEE Transactions on Pattern Analysis an...

TL;DR: Transformer is a promising neural network learner, and has achieved great success in various machine learning tasks as discussed by the authors , thanks to the recent prevalence of multimodal applications and Big Data, Transformer-based multimodAL learning has become a hot topic in AI research.

...read moreread less

278

•Journal Article•10.1038/s41746-023-00811-0

Self-supervised learning for medical image classification: a systematic review and implementation guidelines

Shih-Cheng Huang, +5 more

- 26 Apr 2023

- npj digital medicine

TL;DR: In this paper , the authors provide consistent descriptions of different self-supervised learning strategies and compose a systematic review of papers published between 2012 and 2022 on PubMed, Scopus, and ArXiv.

...read moreread less

174

Journal Article•10.1038/s41591-024-02856-4

A visual-language foundation model for computational pathology.

Ming Y. Lu, +12 more

- 01 Mar 2024

- news@nature.com

154

...

Expand

References

•Proceedings Article•10.1109/CVPR.2016.90

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

198.7K

•Posted Content

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 10 Dec 2015

- arXiv: Computer Vision and Pattern Recog...

TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

...read moreread less

117.9K

•Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- 12 Jun 2017

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

94.2K

Preprint•10.48550/arxiv.1706.03762

Attention Is All You Need

Ashish Vaswani, +7 more

- 01 Jan 2017

Abstract: The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

...read moreread less

51.8K

•Posted Content

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, +11 more

- 22 Oct 2020

- arXiv: Computer Vision and Pattern Recog...

TL;DR: Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

...read moreread less

36.9K