Open AccessPosted Content10.1101/078634

A Machine Learning-based Framework to Identify Type 2 Diabetes through Electronic Health Records

- 30 Sep 2016

- pp 078634

185

TL;DR: A semi-automated framework based on machine learning as a pilot study to liberalize filtering criteria to improve recall rate with a keeping of low false positive rate for T2DM subjects from EHR.

Abstract: Objective: To discover diverse genotype-phenotype associations affiliated with Type 2 Diabetes Mellitus (T2DM) via genome-wide association study (GWAS) and phenome-wide association study (PheWAS), more cases (T2DM subjects) and controls (subjects without T2DM) are required to be identified (e.g., via Electronic Health Records (EHR)). However, existing expert based identification algorithms often suffer in a low recall rate and could miss a large number of valuable samples under conservative filtering standards. The goal of this work is to develop a semi-automated framework based on machine learning as a pilot study to liberalize filtering criteria to improve recall rate with a keeping of low false positive rate. Materials and Methods: We propose a data informed framework for identifying subjects with and without T2DM from EHR via feature engineering and machine learning. We evaluate and contrast the identification performance of widely-used machine learning models within our framework, including k-Nearest-Neighbors, Naive Bayes, Decision Tree, Random Forest, Support Vector Machine and Logistic Regression. Our framework was conducted on 300 patient samples (161 cases, 60 controls and 79 unconfirmed subjects), randomly selected from 23,281 diabetes related cohort retrieved from a regional distributed EHR repository ranging from 2012 to 2014. Results: We apply top-performing machine learning algorithms on the engineered features. We benchmark and contrast the accuracy, precision, AUC, sensitivity and specificity of classification models against the state-of-the-art expert algorithm for identification of T2DM subjects. Our results indicate that the framework achieved high identification performances (~0.98 in average AUC), which are much higher than the state-of-the-art algorithm (0.71 in AUC). Discussion: Expert algorithm-based identification of T2DM subjects from EHR is often hampered by the high missing rates due to their conservative selection criteria. Our framework leverages machine learning and feature engineering to loosen such selection criteria to achieve a high identification rate of cases and controls. Conclusions: Our proposed framework demonstrates a more accurate and efficient approach for identifying subjects with and without T2DM from EHR.

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Most frequently asked questions

1. What are the contributions in "Title: a machine learning-based framework to identify type 2 diabetes through electronic health records" ?

To discover diverse genotype-phenotype associations affiliated with Type 2 Diabetes Mellitus ( T2DM ) via genome-wide association study ( GWAS ) and phenome-wide association study ( PheWAS ), more cases ( T2DM subjects ) and controls ( subjects without T2DM ) are required to be identified ( e. g., via Electronic Health Records ( EHR ) ).. The goal of this work is to develop a semi-automated framework based on machine learning as a pilot study to liberalize filtering criteria to improve recall rate with a keeping of low false positive rate.. The authors propose a data informed framework for identifying subjects with and without T2DM from EHR via feature engineering and machine learning.. The authors evaluate and contrast the identification performance of widely-used machine learning models within their framework, including k-Nearest-Neighbors, Naïve Bayes, Decision Tree, Random Forest, Support Vector Machine and Logistic Regression.. The authors apply top-performing machine learning algorithms on the engineered features.. Not certified by peer review ) is the author/funder.

Table 2. Comparison of different classifiers and the expert algorithm (baseline), measured by their average performance (and standard deviation) in cross-validation.

Figure 6 Prediction precision [Positive predictive value] (y-axis) with different feature sets (xaxis), categorized by different classifiers (different lines plotted).

Figure 7 Prediction AUC (y-axis) with different feature sets (x-axis), categorized by different classifiers (different lines plotted).

Table 1. First-level Features constructed from source “demographic information”, “communication reports”, “outpatients diagnosis reports”, “inpatients diagnosis reports”, “inpatients discharge summaries”, “prescription reports” and “laboratory test reports”.

Figure 4. Prediction sensitivity [True positive rate] (y-axis) with different feature sets (x-axis), categorized by different classifiers (different lines plotted).

Figure 5 Prediction specificity [True negative rate] (y-axis) with different feature sets (x-axis), categorized by different classifiers (different lines plotted).

Citations

•Posted Content•10.20944/PREPRINTS202103.0216.V1

Machine Learning: Algorithms, Real-World Applications and Research Directions

Iqbal H. Sarker

- 08 Mar 2021

TL;DR: This study explains the principles of different machine learning techniques and their applicability in various real-world application domains, such as cybersecurity systems, smart cities, healthcare, e-commerce, agriculture, and many more.

...read moreread less

1.8K

•Journal Article•10.1007/S42979-021-00765-8

Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective

Iqbal H. Sarker, +1 more

- 01 Jan 2021

TL;DR: This paper presents a comprehensive view on “Data Science” including various types of advanced analytics methods that can be applied to enhance the intelligence and capabilities of an application through smart decision-making in different scenarios.

...read moreread less

332

•Proceedings Article•10.18653/V1/P18-2033

Task-oriented Dialogue System for Automatic Diagnosis

Zhongyu Wei, +7 more

- 01 Jul 2018

TL;DR: Experimental results on this dialogue system show that additional symptoms extracted from conversation can greatly improve the accuracy for disease identification and the dialogue system is able to collect these symptoms automatically and make a better diagnosis.

...read moreread less

256

•Journal Article•10.1186/s12874-022-01768-6

Real-world data: a brief review of the methods, applications, challenges and opportunities

Fang Liu, +1 more

- 05 Nov 2022

- BMC Medical Research Methodology

TL;DR: Real-world data for evidence-based decision making as discussed by the authors provides a brief overview on the type and sources of real-world datasets and the common models and approaches to utilize and analyze realworld data.

...read moreread less

207

•Journal Article•10.1056/nejme2206291

Artificial Intelligence in Medicine

30 Mar 2023

- The New England Journal of Medicine

TL;DR: Artificial Intelligence in Medicine is looking for novelty in the methodological and/or theoretical content of submitted papers and must show the novel expected effects of the proposed solution in some medical or healthcare field.

...read moreread less

207

...

Expand

References

Book Chapter•10.1017/CBO9781139207249.009

I and J

William Marsden

- 01 Jan 2012

154.7K

•Journal Article•10.1056/NEJMOA012512

Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin.

William C. Knowler, +6 more

- 07 Feb 2002

- The New England Journal of Medicine

TL;DR: In this paper, the authors compared a lifestyle intervention with metformin to prevent or delay the development of Type 2 diabetes in nondiabetic individuals. And they found that the lifestyle intervention was significantly more effective than the medication.

...read moreread less

19.4K

•Journal Article

Effect of intensive blood-glucose control with metformin on complications in overweight patients with type 2 diabetes (UKPDS 34)

R C Turner, +13 more

- 12 Sep 1998

- The Lancet

TL;DR: Since intensive glucose control with metformin appears to decrease the risk of diabetes-related endpoints in overweight diabetic patients, and is associated with less weight gain and fewer hypoglycaemic attacks than are insulin and sulphonylureas, it may be the first-line pharmacological therapy of choice in these patients.

...read moreread less

8.4K

•Journal Article•10.1056/NEJMP1500523

A New Initiative on Precision Medicine

Francis S. Collins, +1 more

- 25 Feb 2015

- The New England Journal of Medicine

TL;DR: A research initiative that aims to accelerate progress toward a new era of precision medicine, with a near-term focus on cancers and a longer-term aim to generate knowledge applicable to the whole range of health and disease.

...read moreread less

4.8K

•Journal Article

Effect of intensive blood-glucose control with metformin on complications in overweight patients with type 2 diabetes (UKPDS 34). UK Prospective Diabetes Study (UKPDS) Group.

UK Prospective Diabetes, +7 more

- 11 Sep 1998

- The Lancet

4.1K

...

Expand

A Machine Learning-based Framework to Identify Type 2 Diabetes through Electronic Health Records

Chat with Paper

AI Agents for this Paper

Most frequently asked questions

1. What are the contributions in "Title: a machine learning-based framework to identify type 2 diabetes through electronic health records" ?

Figures

Citations

Machine Learning: Algorithms, Real-World Applications and Research Directions

Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective

Task-oriented Dialogue System for Automatic Diagnosis

Real-world data: a brief review of the methods, applications, challenges and opportunities

Artificial Intelligence in Medicine

References

I and J

Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin.

Effect of intensive blood-glucose control with metformin on complications in overweight patients with type 2 diabetes (UKPDS 34)

A New Initiative on Precision Medicine

Effect of intensive blood-glucose control with metformin on complications in overweight patients with type 2 diabetes (UKPDS 34). UK Prospective Diabetes Study (UKPDS) Group.

Related Papers (5)

A machine learning-based framework to identify type 2 diabetes through electronic health records

A patient network-based machine learning model for disease prediction: The case of type 2 diabetes mellitus

Automated disease cohort selection using word embeddings from Electronic Health Records.

Development and Validation of Various Phenotyping Algorithms for Diabetes Mellitus Using Data from Electronic Health Records.

Naïve Electronic Health Record phenotype identification for Rheumatoid arthritis.