An innovative approach for testing bioinformatics programs using metamorphic testing
TL;DR: This work proposes to use a novel software testing technique, metamorphic testing (MT), to test a range of bioinformatics programs, and shows that MT is simple to implement, and is effective in detecting faults in a real-life program and some artificially fault-seeded programs.
read more
Abstract: Recent advances in experimental and computational technologies have fueled the development of many sophisticated bioinformatics programs. The correctness of such programs is crucial as incorrectly computed results may lead to wrong biological conclusion or misguide downstream experimentation. Common software testing procedures involve executing the target program with a set of test inputs and then verifying the correctness of the test outputs. However, due to the complexity of many bioinformatics programs, it is often difficult to verify the correctness of the test outputs. Therefore our ability to perform systematic software testing is greatly hindered. We propose to use a novel software testing technique, metamorphic testing (MT), to test a range of bioinformatics programs. Instead of requiring a mechanism to verify whether an individual test output is correct, the MT technique verifies whether a pair of test outputs conform to a set of domain specific properties, called metamorphic relations (MRs), thus greatly increases the number and variety of test cases that can be applied. To demonstrate how MT is used in practice, we applied MT to test two open-source bioinformatics programs, namely GNLab and SeqMap. In particular we show that MT is simple to implement, and is effective in detecting faults in a real-life program and some artificially fault-seeded programs. Further, we discuss how MT can be applied to test programs from various domains of bioinformatics. This paper describes the application of a simple, effective and automated technique to systematically test a range of bioinformatics programs. We show how MT can be implemented in practice through two real-life case studies. Since many bioinformatics programs, particularly those for large scale simulation and data analysis, are hard to test systematically, their developers may benefit from using MT as part of the testing strategy. Therefore our work represents a significant step towards software reliability in bioinformatics.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Verification and Validation of Bioinformatics Software
Joshua W. K. Ho,Michael A. Charleston +1 more
- 01 Jan 2009
TL;DR: MT alleviates the oracle problem by testing necessary properties of the program, instead of testing for the correctness of specific test cases, and enables a diverse range of test cases to be generated and verified.
Identification Algorithm Framework and Structural model on Input Pattern of Metamorphic Relations
Shiyu Yan,Xiaohua Yang,Zhongjiang Lu,Meng Li,Helin Gong,Jie-Sheng Liu +5 more
- 01 Aug 2022
TL;DR: The core contribution of the method is to decompose the input pattern identification problem in multi-dimensional space into a single one-dimensional and prove that any one input pattern can be expressed as a combination of basic input patterns in scientific computing programs.
Workshop Summary: 2019 IEEE / ACM Fourth International Workshop on Metamorphic Testing (MET 2019)
TL;DR: The aims of the workshop are outlined, followed by a discussion of its keynote speech and technical program.
Identifying the Failure-Revealing Test Cases in Metamorphic Testing: A Statistical Approach
Zheng Zheng,Daixu Ren,Huai Liu,Tsong Yueh Chen,T. T. Li +4 more
TL;DR: This paper proposes FAILTIM, a statistical approach to identify failure-revealing test cases in metamorphic testing, leveraging spectrum-based techniques and risk formulas to estimate suspiciousness and achieve high accuracy in identifying actual failure-causing test cases.
Scalability and Validation of Big Data Bioinformatics Software
TL;DR: It is argued that not only are the issues of scalability and validation common to all big data bioinformatics analyses, they can be tackled by conceptually related methodological approaches, namely divide-and-conquer (scalability) and multiple executions (validation).
References
Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles
Aravind Subramanian,Pablo Tamayo,Vamsi K. Mootha,Sayan Mukherjee,Benjamin L. Ebert,Michael A. Gillette,Amanda G. Paulovich,Scott L. Pomeroy,Todd R. Golub,Eric S. Lander,Jill P. Mesirov +10 more
TL;DR: The Gene Set Enrichment Analysis (GSEA) method as discussed by the authors focuses on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation.
KEGG: Kyoto Encyclopedia of Genes and Genomes
Minoru Kanehisa,Susumu Goto +1 more
TL;DR: The Kyoto Encyclopedia of Genes and Genomes (KEGG) as discussed by the authors is a knowledge base for systematic analysis of gene functions in terms of the networks of genes and molecules.
Cluster analysis and display of genome-wide expression patterns
TL;DR: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression, finding in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function.
Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments
TL;DR: The hierarchical model of Lonnstedt and Speed (2002) is developed into a practical approach for general microarray experiments with arbitrary numbers of treatments and RNA samples and the moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom.
Using Bayesian networks to analyze expression data
TL;DR: A new framework for discovering interactions between genes based on multiple expression measurements is proposed and a method for recovering gene interactions from microarray data is described using tools for learning Bayesian networks.
3.7K