Grader Variability and the Importance of Reference Standards for Evaluating Machine Learning Models for Diabetic Retinopathy

doi:10.1016/J.OPHTHA.2018.01.034

Open AccessJournal Article10.1016/J.OPHTHA.2018.01.034

Grader Variability and the Importance of Reference Standards for Evaluating Machine Learning Models for Diabetic Retinopathy

Jonathan Krause, +7 more

- 12 Mar 2018

- Ophthalmology

- Vol. 125, Iss: 8, pp 1264-1272

502

TL;DR: Adjudication reduces the errors in DR grading by using a small number of adjudicated consensus grades as a tuning dataset and higher-resolution images as input, and to train an improved automated algorithm for DR grading.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Table 4. Comparison of ophthalmologist grades versus adjudicated grades from retina specialists on the validation dataset. Confusion matrix for diabetic retinopathy and DME between the grade determined by majority decision of the ophthalmologists and the adjudicated consensus of retinal specialists.

Table 5. Agreement between ophthalmologists’ grades with the adjudicated reference standard on the validation dataset. Sensitivity and specificity metrics are for moderate or worse DR and referable DME for each grader. Agreement between the adjudicated grade and the 5-point scale is also measured by the quadratic-weighted kappa.

Table 3. Agreement between each retina specialist and the adjudicated reference standard on the validation dataset. Retina specialists correspond to those who contributed to the final adjudicated reference standard. Sensitivity and specificity metrics reported are for moderate or worse DR. Agreement between the preadjudication 5-point DR grade and the final adjudicated grade is also measured by the quadratic-weighted kappa.

Table 2. Comparison of retinal specialist grades before and after adjudication on the validation dataset. Confusion matrix for diabetic retinopathy between the grade determined by majority decision and adjudicated consensus.

Fig. 1. Grader agreement based on the adjudicated consensus grade for referable diabetic retinopathy (DR) and diabetic macular edema (DME). Independent grading of all 3 retinal specialists and all 3 ophthalmologists are included in this analysis.

Fig. 2. Image resolution input to model versus area under the curve (AUC) for mild and above DR. Left: Using majority decision of retinal specialists as the reference standard. Right: Using the adjudicated consensus grade of retinal specialists as a reference standard. Shaded areas represent a 95% confidence interval as measured via bootstrapping.

Citations

Journal Article•10.1111/aos.16781

Grading of diabetic retinopathy using a pre‐segmenting deep learning classification model: Validation of an automated algorithm

Dyllan Edson Similié, +4 more

- 19 Oct 2024

- Acta Ophthalmologica

TL;DR: A deep learning algorithm for diabetic retinopathy grading achieved comparable performance to human graders in a high-risk population, with 92% negative predictive value, suggesting its potential for autonomous identification of non-diabetic retinopathy patients in real-world settings.

...read moreread less

•Journal Article•10.1016/J.MEDIA.2020.101724

Expert-validated estimation of diagnostic uncertainty for deep neural networks in diabetic retinopathy detection.

Murat Seckin Ayhan, +5 more

- 01 Aug 2020

- Medical Image Analysis

TL;DR: This work describes an intuitive framework based on test-time data augmentation for quantifying the diagnostic uncertainty of a state-of-the-art DNN for diagnosing diabetic retinopathy and shows that the derived measure of uncertainty is well-calibrated and paves the way for an integrated treatment of uncertainty in DNN-based diagnostic systems.

...read moreread less

Journal Article•10.1038/S41433-021-01715-7

Automated detection of retinal exudates and drusen in ultra-widefield fundus images based on deep learning.

Zhongwen Li, +16 more

- 03 Aug 2021

- Eye

TL;DR: Zhang et al. as mentioned in this paper developed and assessed a deep learning system for automated detection of RED using ultra-widefield fundus (UWF) images, which achieved areas under the receiver operating characteristic curve of 0.994 (95% confidence interval [CI]: 0.991-0.996), 0.972 ( 95% CI: 0.983-0,0.984), and 0.

...read moreread less

Journal Article•10.1007/978-3-031-08637-3_1

Explainable Artificial Intelligence (XAI) with IoHT for Smart Healthcare: A Review

Li Chen

- 01 Jan 2023

- Internet of things

TL;DR: In this article , the authors discuss the use of artificial intelligence (AI) in healthcare, explainability is a highly contentious topic, because the majority of existing AI systems are incomprehensible and opaque, it is unlikely that AI technologies will be properly exploited and incorporated into standard clinical practice.

...read moreread less

10.48550/arxiv.1711.11279

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV).

Been Kim, +6 more

TL;DR: Researchers introduce Concept Activation Vectors (CAVs) to interpret deep learning models, enabling quantitative testing of concept importance through directional derivatives, and demonstrate its application in image classification and medical domains for hypothesis exploration and insight generation.

...read moreread less

...

Expand

References

Journal Article•10.1038/NATURE14539

Deep learning

Yann LeCun, +4 more

- 28 May 2015

- Nature

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.

...read moreread less

67K

Journal Article•10.1109/5.726791

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

- 01 Jan 1998

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

53.5K

•Journal Article•10.1007/S11263-015-0816-Y

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, +11 more

- 01 Dec 2015

- International Journal of Computer Vision

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.

...read moreread less

41.6K

•Journal Article•10.1177/001316446002000104

A Coefficient of agreement for nominal Scales

Jacob Cohen

- 01 Apr 1960

- Educational and Psychological Measuremen...

TL;DR: In this article, the authors present a procedure for having two or more judges independently categorize a sample of units and determine the degree, significance, and significance of the units. But they do not discuss the extent to which these judgments are reproducible, i.e., reliable.

...read moreread less

41.1K

Gradient-based learning applied to document recognition

Yann LeCun, +7 more

- 01 Jan 2001

TL;DR: This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task, and Convolutional neural networks are shown to outperform all other techniques.

...read moreread less

32.7K

...

Expand

Related Papers (5)

Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs

[...]

Varun Gulshan, +14 more

- 13 Dec 2016

- JAMA

Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images From Multiethnic Populations With Diabetes.

[...]

Daniel Shu Wei Ting, +42 more

- 12 Dec 2017

- JAMA

Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning

[...]

Ryan Poplin, +7 more

- 19 Feb 2018

- Nature Biomedical Engineering

Deep learning

[...]

Yann LeCun, +4 more

- 28 May 2015

- Nature

Grader Variability and the Importance of Reference Standards for Evaluating Machine Learning Models for Diabetic Retinopathy

Chat with Paper

AI Agents for this Paper

Figures

Citations

Grading of diabetic retinopathy using a pre‐segmenting deep learning classification model: Validation of an automated algorithm

Expert-validated estimation of diagnostic uncertainty for deep neural networks in diabetic retinopathy detection.

Automated detection of retinal exudates and drusen in ultra-widefield fundus images based on deep learning.

Explainable Artificial Intelligence (XAI) with IoHT for Smart Healthcare: A Review

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV).

References

Deep learning

Gradient-based learning applied to document recognition

ImageNet Large Scale Visual Recognition Challenge

A Coefficient of agreement for nominal Scales

Gradient-based learning applied to document recognition

Related Papers (5)

Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs

Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images From Multiethnic Populations With Diabetes.

Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning

Clinically applicable deep learning for diagnosis and referral in retinal disease

Deep learning