MDNet: A Semantically and Visually Interpretable Medical Image Diagnosis Network
Zizhao Zhang,Yuanpu Xie,Fuyong Xing,Mason McGough,Lin Yang +4 more
- 08 Jul 2017
- pp 3549-3557
TL;DR: This paper proposes MDNet to establish a direct multimodal mapping between medical images and diagnostic reports that can read images, generate diagnostic reports, retrieve images by symptom descriptions, and visualize attention, to provide justifications of the network diagnosis process.
read more
Abstract: The inability to interpret the model prediction in semantically and visually meaningful ways is a well-known shortcoming of most existing computer-aided diagnosis methods. In this paper, we propose MDNet to establish a direct multimodal mapping between medical images and diagnostic reports that can read images, generate diagnostic reports, retrieve images by symptom descriptions, and visualize attention, to provide justifications of the network diagnosis process. MDNet includes an image model and a language model. The image model is proposed to enhance multi-scale feature ensembles and utilization efficiency. The language model, integrated with our improved attention mechanism, aims to read and explore discriminative image feature descriptions from reports to learn a direct mapping from sentence words to image pixels. The overall network is trained end-to-end by using our developed optimization strategy. Based on a pathology bladder cancer images and its diagnostic reports (BCIDR) dataset, we conduct sufficient experiments to demonstrate that MDNet outperforms comparative baselines. The proposed image model obtains state-of-the-art performance on two CIFAR datasets as well.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Towards a safe and efficient clinical implementation of machine learning in radiation oncology by exploring model interpretability, explainability and data-model dependency
Ana M. Barragan-Montero,Adrien Bibal,M. Huet Dastarac,C. Draguet,Gilmer Valdes,Dan Nguyen,S. Willems,Liesbeth Vandewinckele,M. Holmström,Fredrik Löfman,Kevin Souris,Edmond Sterpin,John Aldo Lee +12 more
TL;DR: The main risks and current solutions when applying the latter to workflows in the former are reviewed, and the core concepts of interpretability, explainability, and data-model dependency are formally defined and illustrated with examples.
OralCam: Enabling Self-Examination and Awareness of Oral Health Using a Smartphone Camera
Yuan Liang,Hsuan Wei Fan,Zhujun Fang,Leiying Miao,Wen Li,Xuan Zhang,Weibin Sun,Kun Wang,Lei He,Xiang 'Anthony' Chen +9 more
- 21 Apr 2020
TL;DR: OralCam is presented, the first interactive app that enables end-users' self-examination of five common oral conditions (diseases or early disease signals) by taking smartphone photos of one's oral cavity by using a deep learning based framework.
54
Automatic medical image interpretation: State of the art and future directions
Hareem Ayesha,Sajid Iqbal,Mehreen Tariq,Muhammad Abrar,Muhammad Sanaullah,Ishaq Abbas,Amjad Rehman,Muhammad Farooq Khan Niazi,Shafiq Hussain +8 more
TL;DR: A comprehensive review of recent years' research of medical image captioning published in different international conferences and journals is presented in this article, where their common parameters are extracted to compare their methods, performance, strengths, limitations, and their recommendations are discussed.
52
•Posted Content
Interpretable Spatio-temporal Attention for Video Action Recognition
TL;DR: Zhang et al. as discussed by the authors proposed an interpretable and easy plug-in spatial-temporal attention mechanism for video action recognition, which employs a convolutional LSTM based attention mechanism to identify the most relevant frames from an input video, and a set of regularizers to ensure that attention mechanism attends to coherent regions in space and time.
51
Medical image captioning via generative pretrained transformers
Alexander Selivanov,Oleg Y. Rogov,Daniil Chesakov,Artem Shelmanov,Irina Fedulova,Dmitry V. Dylov +5 more
TL;DR: In this paper , a model for automatic clinical image caption generation combines the analysis of radiological scans with structured patient information from the textual records using two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records.
References
Deep Residual Learning for Image Recognition
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
•Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
- 04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
102.6K
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
99K
Going deeper with convolutions
Christian Szegedy,Wei Liu,Yangqing Jia,Pierre Sermanet,Scott Reed,Dragomir Anguelov,Dumitru Erhan,Vincent Vanhoucke,Andrew Rabinovich +8 more
- 07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
•Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
- 01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
51.9K
Related Papers (5)
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 27 Jun 2016
Karen Simonyan,Andrew Zisserman +1 more
- 04 Sep 2014