Data curation + process curation=data integration + science
74
TL;DR: This article will brief the community on the current state of the art and the current challenges for process curation, both within and without the Life Sciences.
read more
Abstract: In bioinformatics, we are familiar with the idea of curated data as a prerequisite for data integration. We neglect, often to our cost, the curation and cataloguing of the processes that we use to integrate and analyse our data. Programmatic access to services, for data and processes, means that compositions of services can be made that represent the in silico experiments or processes that bioinformaticians perform. Data integration through workflows depends on being able to know what services exist and where to find those services. The large number of services and the operations they perform, their arbitrary naming and lack of documentation, however, mean that they can be difficult to use. The workflows themselves are composite processes that could be pooled and reused but only if they too can be found and understood. Thus appropriate curation, including semantic mark-up, would enable processes to be found, maintained and consequently used more easily. This broader view on semantic annotation is vital for full data integration that is necessary for the modern scientific analyses in biology. This article will brief the community on the current state of the art and the current challenges for process curation, both within and without the Life Sciences.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
BioCatalogue: a universal catalogue of web services for the life sciences
Jiten Bhagat,Franck Tanoh,Eric Nzuobontane,Thomas Laurent,Jerzy Orlowski,Marco Roos,Katy Wolstencroft,Sergejs Aleksejevs,Robert Stevens,Steve Pettifer,Rodrigo Lopez,Carole Goble +11 more
TL;DR: The use of Web Services to enable programmatic access to on-line bioinformatics is becoming increasingly important in the Life Sciences, but their number, distribution and the variable quality of their documentation can make their discovery and subsequent use difficult.
Dealing with data: a case study on information and data management literacy.
TL;DR: The launch of the eagle-i Consortium, a collaborative network for sharing information about research resources, such as protocols and reagents, provides a vivid demonstration of the challenges that researchers, libraries and institutions face in making their data available to others.
BioFed: federated query processing over life sciences linked open data
Ali Hasnain,Qaiser Mehmood,Syeda Sana e Zainab,Muhammad Saleem,Claude N. Warren,Durre Zehra,Stefan Decker,Dietrich Rebholz-Schuhmann +7 more
TL;DR: The efficient cataloguing approach of the federated query processing system ’BioFed’, the triple pattern wise source selection and the semantic source normalisation forms the core to the solution and facilitates efficient query generation for data access and provides basic provenance information in combination with the retrieved data.
Ethical Use of Electronic Health Record Data and Artificial Intelligence: Recommendations of the Primary Care Informatics Working Group of the International Medical Informatics Association.
Siaw-Teng Liaw,Harshana Liyanage,Craig Kuziemsky,Amanda L. Terry,Richard Schreiber,Jitendra Jonnagaddala,Simon de Lusignan +6 more
TL;DR: The ethical use of data needs to be integrated within the curation process, hence running throughout the data lifecycle, and harmonised data quality assessment, management, and governance is important.
Research resources: curating the new eagle-i discovery system
Nicole Vasilevsky,Tenille Johnson,Karen Corday,Carlo Torniai,Matthew H. Brush,Erik Segerdell,Melanie Wilson,Chris Shaffer,David W. Robinson,Melissa A. Haendel +9 more
TL;DR: The experiences with eagle-i, a 2-year pilot project to develop a federated network of data repositories in which unpublished, unshared or otherwise ‘invisible’ scientific resources could be inventoried and made accessible to the scientific community, are described.
References
From genomics to chemical genomics: new developments in KEGG
Minoru Kanehisa,Susumu Goto,Masahiro Hattori,Kiyoko F. Aoki-Kinoshita,Masumi Itoh,Shuichi Kawashima,Toshiaki Katayama,Michihiro Araki,Mika Hirakawa +8 more
TL;DR: The scope of KEGG LIGAND has been significantly expanded to cover both endogenous and exogenous molecules, and RPAIR contains curated chemical structure transformation patterns extracted from known enzymatic reactions, which would enable analysis of genome-environment interactions.
InterProScan: protein domains identifier.
Emmanuel Quevillon,Ville Silventoinen,Sharmila Pillai,Nicola Harte,Nicola Mulder,Rolf Apweiler,Rodrigo Lopez +6 more
TL;DR: InterProScan is a tool that combines different protein signature recognition methods from the InterPro consortium member databases into one resource and can be analysed for protein as well as DNA sequences.
BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis
Steffen Durinck,Yves Moreau,Arek Kasprzyk,Sean Davis,Bart De Moor,Alvis Brazma,Wolfgang Huber +6 more
TL;DR: The biomaRt package provides a tight integration of large, public or locally installed BioMart databases with data analysis in Bioconductor creating a powerful environment for biological data mining.
Unraveling the Web services web: an introduction to SOAP, WSDL, and UDDI
Francisco Curbera,Matthew J. Duftler,Rania Khalaf,William A. Nagy,Nirmal K. Mukhi,Sanjiva Weerawarana +5 more
TL;DR: This tutorial explores the most salient and stable specifications in each of the three major areas of the emerging Web services framework, which are the simple object access protocol, the Web Services Description Language and the Universal Description, Discovery, and Integration directory.
1.5K
Taverna: a tool for building and running workflows of services
TL;DR: Taverna is an application that eases the use and integration of the growing number of molecular biology tools and databases available on the web, especially web services, to perform a range of different analyses, such as sequence analysis and genome annotation.
1.1K
Related Papers (5)
Barry Smith,Michael Ashburner,Cornelius Rosse,Jonathan Bard,William J. Bug,Werner Ceusters,Louis J. Goldberg,Karen Eilbeck,Amelia Ireland,Christopher J. Mungall,Neocles B. Leontis,Philippe Rocca-Serra,Alan Ruttenberg,Susanna-Assunta Sansone,Richard H. Scheuermann,Nigam H. Shah,Patricia L. Whetzel,Suzanna E. Lewis +17 more