A Survey of Biological Data in a Big Data Perspective

doi:10.1089/big.2020.0383

Journal Article10.1089/big.2020.0383

A Survey of Biological Data in a Big Data Perspective

Gabriel Dall'Alba, +4 more

- 07 Apr 2022

- Big data

- Vol. 10, Iss: 4, pp 279-297

10

TL;DR: Some fundamental concepts of information technology, including storage resources, analysis, and data sharing, are described along with their relation to biological data.

Abstract: The amount of available data is continuously growing. This phenomenon promotes a new concept, named big data. The highlight technologies related to big data are cloud computing (infrastructure) and Not Only SQL (NoSQL; data storage). In addition, for data analysis, machine learning algorithms such as decision trees, support vector machines, artificial neural networks, and clustering techniques present promising results. In a biological context, big data has many applications due to the large number of biological databases available. Some limitations of biological big data are related to the inherent features of these data, such as high degrees of complexity and heterogeneity, since biological systems provide information from an atomic level to interactions between organisms or their environment. Such characteristics make most bioinformatic-based applications difficult to build, configure, and maintain. Although the rise of big data is relatively recent, it has contributed to a better understanding of the underlying mechanisms of life. The main goal of this article is to provide a concise and reliable survey of the application of big data-related technologies in biology. As such, some fundamental concepts of information technology, including storage resources, analysis, and data sharing, are described along with their relation to biological data.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Proceedings Article•10.1109/ICASI57738.2023.10179527

A High Performance Computing Platform for Big Biological Data Analysis

Chieh-Wei Huang, +2 more

- 21 Apr 2023

TL;DR: In this paper , the authors presented a large-scale biological data analysis platform enabled by an HPC infrastructure for genomic sequencing data and protein structure analysis, which enables researchers to gain profound insights into the deepest biological functions by addressing the challenges of big biological data analyses.

...read moreread less

2

Book Chapter•10.1007/978-981-19-8004-6_9

Challenges and Future Research Directions on Data Computation

01 Jan 2023

TL;DR: In this paper , the authors have discussed the various applications, current challenges and expected future research directions in data computation with respect to the big data, IoT, cloud computing, quantum computing, and biological computing frameworks.

...read moreread less

1

•Journal Article•10.1038/s42003-023-05004-9

Y chromosome sequence and epigenomic reconstruction across human populations

Tomas Marques-Bonet, +1 more

- 09 Jun 2023

- Communications biology

TL;DR: In this paper , the authors analyzed and compared the chrY enrichment of sequencing data obtained using two different selective sequencing approaches: adaptive sampling and flow cytometry chromosome sorting, and provided a framework to study complex genomic regions with a simple, fast, and affordable methodology that could be applied to larger population genomics datasets.

...read moreread less

1

Journal Article•10.15507/2658-4123.033.202303.388-402

Evaluating the Efficiency of the Tube Turbulent Apparatus Influence on Kinetics of Polymer Production Processes

Eldar Miftakhov, +4 more

- 29 Sep 2023

- Инженерные технологии и системы

TL;DR: This study evaluates the efficiency of a tube turbulent apparatus on polymer production kinetics, using simulation modeling and cloud computing to determine the influence of external factors on catalyst heterogeneity and kinetic activity.

...read moreread less

1

Journal Article•10.1186/s12859-024-05695-9

CREDO: a friendly Customizable, REproducible, DOcker file generator for bioinformatics applications

Simone Alessandri, +8 more

- 12 Mar 2024

- BMC Bioinformatics

TL;DR: CREDO, a customizable Docker file generator, addresses reproducibility issues in bioinformatics by creating modular Docker images with embedded tools, simplifying the process and promoting open science practices through user-friendly GUI and Github-compatible format.

...read moreread less

1

References

•Journal Article•10.1101/GR.1239303

Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks

Paul Shannon, +8 more

- 01 Nov 2003

- Genome Research

TL;DR: Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.

...read moreread less

46.4K

•Journal Article•10.1093/NAR/28.1.235

The Protein Data Bank

Helen M. Berman, +7 more

- 01 Jan 2000

- Nucleic Acids Research

TL;DR: The goals of the PDB are described, the systems in place for data deposition and access, how to obtain further information and plans for the future development of the resource are described.

...read moreread less

39.5K

•Journal Article•10.1038/S41586-021-03819-2

Highly accurate protein structure prediction with AlphaFold

John M. Jumper, +33 more

- 15 Jul 2021

- Nature

TL;DR: For example, AlphaFold as mentioned in this paper predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture. But the accuracy is limited by the fact that no homologous structure is available.

...read moreread less

28.2K

Journal Article•10.21276/IJRE.2018.5.5.4

MapReduce: simplified data processing on large clusters

Jeffrey Dean, +1 more

- 06 Dec 2004

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.

...read moreread less

22.7K

Journal Article•10.1145/1327452.1327492

MapReduce: simplified data processing on large clusters

Jeffrey Dean, +1 more

- 01 Jan 2008

- Communications of The ACM

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.

...read moreread less

18.6K

...

Expand

A Survey of Biological Data in a Big Data Perspective

Chat with Paper

AI Agents for this Paper

Citations

A High Performance Computing Platform for Big Biological Data Analysis

Challenges and Future Research Directions on Data Computation

Y chromosome sequence and epigenomic reconstruction across human populations

Evaluating the Efficiency of the Tube Turbulent Apparatus Influence on Kinetics of Polymer Production Processes

CREDO: a friendly Customizable, REproducible, DOcker file generator for bioinformatics applications

References

Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks

The Protein Data Bank

Highly accurate protein structure prediction with AlphaFold

MapReduce: simplified data processing on large clusters

MapReduce: simplified data processing on large clusters

Related Papers (5)

Trends in IT Innovation to Build a Next Generation Bioinformatics Solution to Manage and Analyse Biological Big Data Produced by NGS Technologies

Introduction of Big Data With Analytics of Big Data

A Study of Big Data and Classification of NoSQL Databases

Challenges and Opportunities in Big Data Processing

A Survey of Biological Data in a Big Data Perspective