Journal Article10.1089/big.2020.0383
A Survey of Biological Data in a Big Data Perspective
10
TL;DR: Some fundamental concepts of information technology, including storage resources, analysis, and data sharing, are described along with their relation to biological data.
read more
Abstract: The amount of available data is continuously growing. This phenomenon promotes a new concept, named big data. The highlight technologies related to big data are cloud computing (infrastructure) and Not Only SQL (NoSQL; data storage). In addition, for data analysis, machine learning algorithms such as decision trees, support vector machines, artificial neural networks, and clustering techniques present promising results. In a biological context, big data has many applications due to the large number of biological databases available. Some limitations of biological big data are related to the inherent features of these data, such as high degrees of complexity and heterogeneity, since biological systems provide information from an atomic level to interactions between organisms or their environment. Such characteristics make most bioinformatic-based applications difficult to build, configure, and maintain. Although the rise of big data is relatively recent, it has contributed to a better understanding of the underlying mechanisms of life. The main goal of this article is to provide a concise and reliable survey of the application of big data-related technologies in biology. As such, some fundamental concepts of information technology, including storage resources, analysis, and data sharing, are described along with their relation to biological data.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A High Performance Computing Platform for Big Biological Data Analysis
Chieh-Wei Huang,Chia Lee Yang,Yu Tai Wang +2 more
- 21 Apr 2023
TL;DR: In this paper , the authors presented a large-scale biological data analysis platform enabled by an HPC infrastructure for genomic sequencing data and protein structure analysis, which enables researchers to gain profound insights into the deepest biological functions by addressing the challenges of big biological data analyses.
2
Challenges and Future Research Directions on Data Computation
01 Jan 2023
TL;DR: In this paper , the authors have discussed the various applications, current challenges and expected future research directions in data computation with respect to the big data, IoT, cloud computing, quantum computing, and biological computing frameworks.
1
Y chromosome sequence and epigenomic reconstruction across human populations
TL;DR: In this paper , the authors analyzed and compared the chrY enrichment of sequencing data obtained using two different selective sequencing approaches: adaptive sampling and flow cytometry chromosome sorting, and provided a framework to study complex genomic regions with a simple, fast, and affordable methodology that could be applied to larger population genomics datasets.
Evaluating the Efficiency of the Tube Turbulent Apparatus Influence on Kinetics of Polymer Production Processes
Eldar Miftakhov,Sofya I. Mustafina,Nikolay D. Morozkin,Ildus Sh. Nasyrov,S. A. Mustafina +4 more
TL;DR: This study evaluates the efficiency of a tube turbulent apparatus on polymer production kinetics, using simulation modeling and cloud computing to determine the influence of external factors on catalyst heterogeneity and kinetic activity.
1
CREDO: a friendly Customizable, REproducible, DOcker file generator for bioinformatics applications
Simone Alessandri,M. Ratto,Sergio Rabellino,Gabriele Piacenti,S. G. Contaldo,Simone Pernice,Marco Beccuti,Raffaele Calogero,Luca Alessandrì +8 more
TL;DR: CREDO, a customizable Docker file generator, addresses reproducibility issues in bioinformatics by creating modular Docker images with embedded tools, simplifying the process and promoting open science practices through user-friendly GUI and Github-compatible format.
1
References
Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks
Paul Shannon,Andrew Markiel,Owen Ozier,Nitin S. Baliga,Jonathan T. Wang,Daniel Ramage,Nada Amin,Benno Schwikowski,Trey Ideker +8 more
TL;DR: Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.
The Protein Data Bank
Helen M. Berman,John D. Westbrook,Zukang Feng,Gary L. Gilliland,Talapady N. Bhat,Helge Weissig,Ilya N. Shindyalov,Philip E. Bourne +7 more
TL;DR: The goals of the PDB are described, the systems in place for data deposition and access, how to obtain further information and plans for the future development of the resource are described.
Highly accurate protein structure prediction with AlphaFold
John M. Jumper,Richard O. Evans,Alexander Pritzel,Tim Green,Michael Figurnov,Olaf Ronneberger,Kathryn Tunyasuvunakool,Russell Bates,Augustin Žídek,Anna Potapenko,Alex Bridgland,Clemens Meyer,Simon A. A. Kohl,Andrew J. Ballard,Andrew Cowie,Bernardino Romera-Paredes,Stanislav Nikolov,R. D. Jain,Jonas Adler,Trevor Back,Stig Petersen,David Reiman,Ellen Clancy,Michal Zielinski,Martin Steinegger,Michalina Pacholska,Tamas Berghammer,Sebastian Bodenstein,David L. Silver,Oriol Vinyals,Andrew W. Senior,Koray Kavukcuoglu,Pushmeet Kohli,Demis Hassabis +33 more
TL;DR: For example, AlphaFold as mentioned in this paper predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture. But the accuracy is limited by the fact that no homologous structure is available.
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
- 06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Related Papers (5)
Preeti Bala
- 01 Jan 2021
Jeelani Ahmed,Muqeem Ahmed +1 more
- 10 Dec 2020
Jaroslav Pokorny,Bela Stantic +1 more
- 01 Jan 2016