Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data.

doi:10.1038/S41746-021-00519-Z

Open AccessJournal Article10.1038/S41746-021-00519-Z

Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data.

Chuan Hong, +32 more

- 27 Oct 2021

- Vol. 4, Iss: 1, pp 151-151

35

TL;DR: In this article, a large-scale code embedding for a wide range of codified concepts from EHRs from two large medical centers was constructed and knowledge extraction via sparse embedding regression (KESER) was performed for feature selection and integrative network analysis.

Abstract: The increasing availability of electronic health record (EHR) systems has created enormous potential for translational research. However, it is difficult to know all the relevant codes related to a phenotype due to the large number of codes available. Traditional data mining approaches often require the use of patient-level data, which hinders the ability to share data across institutions. In this project, we demonstrate that multi-center large-scale code embeddings can be used to efficiently identify relevant features related to a disease of interest. We constructed large-scale code embeddings for a wide range of codified concepts from EHRs from two large medical centers. We developed knowledge extraction via sparse embedding regression (KESER) for feature selection and integrative network analysis. We evaluated the quality of the code embeddings and assessed the performance of KESER in feature selection for eight diseases. Besides, we developed an integrated clinical knowledge map combining embedding data from both institutions. The features selected by KESER were comprehensive compared to lists of codified data generated by domain experts. Features identified via KESER resulted in comparable performance to those built upon features selected manually or with patient-level data. The knowledge map created using an integrative analysis identified disease-disease and disease-drug pairs more accurately compared to those identified using single institution data. Analysis of code embeddings via KESER can effectively reveal clinical knowledge and infer relatedness among codified concepts. KESER bypasses the need for patient-level data in individual analyses providing a significant advance in enabling multi-center studies using EHR data.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1038/s41597-023-01960-3

Building a knowledge graph to enable precision medicine

Payal Chandak, +2 more

- 09 May 2022

- Scientific Data

TL;DR: For example, PrimeKG as mentioned in this paper integrates 20 high-quality resources to describe 17,080 diseases with 4,050,249 relationships representing ten major biological scales, including disease-associated protein perturbations, biological processes and pathways, anatomical and phenotypic scales, and the entire range of approved drugs with their therapeutic action, considerably expanding previous efforts in disease-rooted knowledge graphs.

...read moreread less

232

Journal Article•10.1038/s41551-022-00942-x

Graph representation learning in biomedicine and healthcare

Michelle Li, +2 more

- 31 Oct 2022

- Nature Biomedical Engineering

TL;DR: It is argued that graph representation learning will keep pushing forward machine learning for biomedicine and healthcare applications, including the identification of genetic variants underlying complex traits, the disentanglement of single-cell behaviours and their effects on health, the assistance of patients in diagnosis and treatment, and the development of safe and effective medicines.

...read moreread less

196

•Journal Article•10.1136/ard-2022-222626

From real-world electronic health record data to real-world results using artificial intelligence

Rachel Knevel, +1 more

- 23 Sep 2022

- Annals of the Rheumatic Diseases

TL;DR: Transforming RWD EHR data for research and for real-world evidence using ML requires knowledge of the EHR system and their differences from existing observational data to ensure that studies incorporate rigorous methods that acknowledge or address factors such as access to care, noise in the data, missingness and indication bias.

...read moreread less

79

•Journal Article•10.1093/JAMIA/OCAB264

The Mass General Brigham Biobank Portal: an i2b2-based data repository linking disparate and high-dimensional patient data to support multimodal analytics.

Victor M. Castro, +13 more

- 28 Nov 2021

- Journal of the American Medical Informat...

TL;DR: The Mass General Brigham (MGB) Biobank Portal data repository as mentioned in this paper integrates data from primary and curated data sources and is updated weekly to enable researchers to conduct analysis efficiently and effectively.

...read moreread less

44

Journal Article•10.1016/j.jbi.2023.104403

Towards electronic health record-based medical knowledge graph construction, completion, and applications: A literature study.

Lino Murali, +3 more

- 01 May 2023

- Journal of Biomedical Informatics

TL;DR: In this paper , a review of existing works on medical knowledge graphs that used EHR data as the data source at representation level, extraction level, and completion level is presented, and the authors conclude that future research should focus on knowledge graph integration and knowledge graph completion challenges.

...read moreread less

37

...

Expand

References

Proceedings Article•10.3115/V1/D14-1162

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

- 01 Oct 2014

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

41.6K

•Posted Content

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

- 16 Oct 2013

- arXiv: Computation and Language

TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.

...read moreread less

22.9K

•Journal Article•10.1111/J.1467-9868.2005.00503.X

Regularization and variable selection via the elastic net

Hui Zou, +1 more

- 01 Apr 2005

- Journal of The Royal Statistical Society...

TL;DR: It is shown that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation, and an algorithm called LARS‐EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lamba.

...read moreread less

20.2K

•Journal Article•10.1215/23289252-2399740

International Statistical Classification of Diseases and Related Health Problems

Justus Eisfeld

- 01 May 2014

TL;DR: There is substantial global variation in the relative burden of stroke compared with IHD, and the disproportionate burden from stroke for many lower-income countries suggests that distinct interventions may be required.

...read moreread less

8.2K

Journal Article•10.1075/LI.30.1.03NAD

A survey of named entity recognition and classification

David Nadeau, +1 more

- 01 Jan 2007

- Lingvisticae Investigationes

TL;DR: Observations about languages, named entity types, domains and textual genres studied in the literature, along with other critical aspects of NERC such as features and evaluation methods, are reported.

...read moreread less

3K