Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data.
Chuan Hong,Chuan Hong,Everett Rush,Molei Liu,Doudou Zhou,Jiehuan Sun,Aaron Sonabend,Victor M. Castro,Petra Schubert,Vidul A. Panickan,Tianrun Cai,Lauren Costa,Zeling He,Nicholas Link,Ronald G. Hauser,J. Michael Gaziano,J. Michael Gaziano,J. Michael Gaziano,Shawn N. Murphy,George Ostrouchov,Yuk-Lam Ho,Edmon Begoli,Junwei Lu,Junwei Lu,Kelly Cho,Kelly Cho,Kelly Cho,Katherine P. Liao,Katherine P. Liao,Katherine P. Liao,Tianxi Cai,Tianxi Cai,VA Million Veteran Program +32 more
- 27 Oct 2021
- Vol. 4, Iss: 1, pp 151-151
TL;DR: In this article, a large-scale code embedding for a wide range of codified concepts from EHRs from two large medical centers was constructed and knowledge extraction via sparse embedding regression (KESER) was performed for feature selection and integrative network analysis.
read more
Abstract: The increasing availability of electronic health record (EHR) systems has created enormous potential for translational research. However, it is difficult to know all the relevant codes related to a phenotype due to the large number of codes available. Traditional data mining approaches often require the use of patient-level data, which hinders the ability to share data across institutions. In this project, we demonstrate that multi-center large-scale code embeddings can be used to efficiently identify relevant features related to a disease of interest. We constructed large-scale code embeddings for a wide range of codified concepts from EHRs from two large medical centers. We developed knowledge extraction via sparse embedding regression (KESER) for feature selection and integrative network analysis. We evaluated the quality of the code embeddings and assessed the performance of KESER in feature selection for eight diseases. Besides, we developed an integrated clinical knowledge map combining embedding data from both institutions. The features selected by KESER were comprehensive compared to lists of codified data generated by domain experts. Features identified via KESER resulted in comparable performance to those built upon features selected manually or with patient-level data. The knowledge map created using an integrative analysis identified disease-disease and disease-drug pairs more accurately compared to those identified using single institution data. Analysis of code embeddings via KESER can effectively reveal clinical knowledge and infer relatedness among codified concepts. KESER bypasses the need for patient-level data in individual analyses providing a significant advance in enabling multi-center studies using EHR data.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Building a knowledge graph to enable precision medicine
TL;DR: For example, PrimeKG as mentioned in this paper integrates 20 high-quality resources to describe 17,080 diseases with 4,050,249 relationships representing ten major biological scales, including disease-associated protein perturbations, biological processes and pathways, anatomical and phenotypic scales, and the entire range of approved drugs with their therapeutic action, considerably expanding previous efforts in disease-rooted knowledge graphs.
Graph representation learning in biomedicine and healthcare
TL;DR: It is argued that graph representation learning will keep pushing forward machine learning for biomedicine and healthcare applications, including the identification of genetic variants underlying complex traits, the disentanglement of single-cell behaviours and their effects on health, the assistance of patients in diagnosis and treatment, and the development of safe and effective medicines.
196
From real-world electronic health record data to real-world results using artificial intelligence
Rachel Knevel,Katherine P. Liao +1 more
TL;DR: Transforming RWD EHR data for research and for real-world evidence using ML requires knowledge of the EHR system and their differences from existing observational data to ensure that studies incorporate rigorous methods that acknowledge or address factors such as access to care, noise in the data, missingness and indication bias.
The Mass General Brigham Biobank Portal: an i2b2-based data repository linking disparate and high-dimensional patient data to support multimodal analytics.
Victor M. Castro,Vivian S. Gainer,Nich Wattanasin,Barbara Benoit,Andrew Cagan,Bhaswati Ghosh,Sergey Goryachev,Reeta Metta,Heekyong Park,David Wang,Michael Mendis,Martin Rees,Christopher Herrick,Shawn N. Murphy +13 more
TL;DR: The Mass General Brigham (MGB) Biobank Portal data repository as mentioned in this paper integrates data from primary and curated data sources and is updated weekly to enable researchers to conduct analysis efficiently and effectively.
44
Towards electronic health record-based medical knowledge graph construction, completion, and applications: A literature study.
TL;DR: In this paper , a review of existing works on medical knowledge graphs that used EHR data as the data source at representation level, extraction level, and completion level is presented, and the authors conclude that future research should focus on knowledge graph integration and knowledge graph completion challenges.
37
References
Glove: Global Vectors for Word Representation
Jeffrey Pennington,Richard Socher,Christopher D. Manning +2 more
- 01 Oct 2014
TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
•Posted Content
Distributed Representations of Words and Phrases and their Compositionality
TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.
Regularization and variable selection via the elastic net
Hui Zou,Trevor Hastie +1 more
TL;DR: It is shown that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation, and an algorithm called LARS‐EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lamba.
International Statistical Classification of Diseases and Related Health Problems
Justus Eisfeld
- 01 May 2014
TL;DR: There is substantial global variation in the relative burden of stroke compared with IHD, and the disproportionate burden from stroke for many lower-income countries suggests that distinct interventions may be required.
A survey of named entity recognition and classification
David Nadeau,Satoshi Sekine +1 more
TL;DR: Observations about languages, named entity types, domains and textual genres studied in the literature, along with other critical aspects of NERC such as features and evaluation methods, are reported.
3K