Knowledge-based biomedical Data Science.
Lawrence Hunter
- 01 Jan 2017
- Vol. 1, pp 19-25
44
TL;DR: This position paper argues that knowledge-based data science research is ripe for expansion, and expanded application.
read more
Abstract: Computational manipulation of knowledge is an important, and often under-appreciated, aspect of biomedical Data Science. The first Data Science initiative from the US National Institutes of Health was entitled "Big Data to Knowledge (BD2K)." The main emphasis of the more than $200M allocated to that program has been on "Big Data;" the "Knowledge" component has largely been the implicit assumption that the work will lead to new biomedical knowledge. However, there is long-standing and highly productive work in computational knowledge representation and reasoning, and computational processing of knowledge has a role in the world of Data Science. Knowledge-based biomedical Data Science involves the design and implementation of computer systems that act as if they knew about biomedicine. There are many ways in which a computational approach might act as if it knew something: for example, it might be able to answer a natural language question about a biomedical topic, or pass an exam; it might be able to use existing biomedical knowledge to rank or evaluate hypotheses; it might explain or interpret data in light of prior knowledge, either in a Bayesian or other sort of framework. These are all examples of automated reasoning that act on computational representations of knowledge. After a brief survey of existing approaches to knowledge-based data science, this position paper argues that such research is ripe for expansion, and expanded application.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Domain-specific knowledge graphs: A survey
TL;DR: This survey is the first to provide an inclusive definition to the notion of domain KG, and a comprehensive review of the state-of-the-art approaches drawn from academic works relevant to seven dissimilar domains of knowledge is provided.
308
Graph representation learning in biomedicine and healthcare
TL;DR: It is argued that graph representation learning will keep pushing forward machine learning for biomedicine and healthcare applications, including the identification of genetic variants underlying complex traits, the disentanglement of single-cell behaviours and their effects on health, the assistance of patients in diagnosis and treatment, and the development of safe and effective medicines.
196
Semantic similarity and machine learning with ontologies
TL;DR: An overview over the methods that use ontologies to compute similarity and incorporate them in machine learning methods is provided, which outlines how semantic similarity measures and ontology embeddings can exploit the background knowledge in ontologies and how ontologies can provide constraints that improve machine learning models.
Reflect: Augmented Browsing for the Life Scientist
Evangelos Pafilis,Seán I. O'Donoghue,Lars Juhl Jensen,Heiko Horn,Michael Kuhn,Nigel P. Brown,Reinhard Schneider +6 more
TL;DR: Reflect as discussed by the authors tags gene, protein, and small molecule names in any web page, typically within a few seconds, and without affecting document layout, and shows a concise summary that includes synonyms, database identifiers, sequence, domains, 3D structure, interaction partners, subcellular location, and related literature.
69
•Posted Content
Erratum: Link prediction in drug-target interactions network using similarity indices
TL;DR: This paper proposes a new, alternative method for DTI prediction that makes use of only network topology information attempting to solve the problem of in silico drug-target interaction (DTI) prediction, and shows that when applied to the MATADOR database, the approach based on node neighborhoods yield higher precision for high-ranking predictions than RBM when no information regarding DTI types is available.
59
References
Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks
Paul Shannon,Andrew Markiel,Owen Ozier,Nitin S. Baliga,Jonathan T. Wang,Daniel Ramage,Nada Amin,Benno Schwikowski,Trey Ideker +8 more
TL;DR: Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.
Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists
TL;DR: The survey will help tool designers/developers and experienced end users understand the underlying algorithms and pertinent details of particular tool categories/tools, enabling them to make the best choices for their particular research interests.
What is a support vector machine
TL;DR: Support vector machines are becoming popular in a wide variety of biological applications, but how do they work and what are their most promising applications in the life sciences?
The Reactome Pathway Knowledgebase.
Antonio Fabregat,Konstantinos Sidiropoulos,Phani V. Garapati,Marc Gillespie,Marc Gillespie,Kerstin Hausmann,Robin Haw,Bijay Jassal,S Jupe,Florian Korninger,Sheldon J. McKay,Lisa Matthews,Bruce May,Marija Milacic,Karen Rothfels,Veronica Shamovsky,Marissa Webber,Joel Weiser,Mark Williams,Guanming Wu,Lincoln Stein,Lincoln Stein,Lincoln Stein,Henning Hermjakob,Henning Hermjakob,Peter D'Eustachio +25 more
TL;DR: The Reactome Knowledgebase provides molecular details of signal transduction, transport, DNA replication, metabolism and other cellular processes as an ordered network of molecular transformations—an extended version of a classic metabolic map, in a single consistent data model.
LINE: Large-scale Information Network Embedding
Jian Tang,Meng Qu,Mingzhe Wang,Ming Zhang,Jun Yan,Qiaozhu Mei +5 more
- 18 May 2015
TL;DR: A novel network embedding method called the ``LINE,'' which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted, and optimizes a carefully designed objective function that preserves both the local and global network structures.