Software framework

Topic Tools

Papers published on a yearly basis

1 / 2

Papers

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

[...]

Aaron McKenna¹, Matthew Hanna, Eric Banks, Andrey Sivachenko, Kristian Cibulskis, Andrew Kernytsky, Kiran V. Garimella, David Altshuler, Stacey Gabriel, Mark J. Daly, Mark A. DePristo - Show less +7 more•Institutions (1)

Broad Institute¹

01 Sep 2010-Genome Research

TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

27,202 citations

Book•

Component Software: Beyond Object-Oriented Programming

[...]

Clemens Szyperski

23 Nov 2002

TL;DR: Anyone responsible for developing software strategy, evaluating new technologies, buying or building software will find Clemens Szyperski's objective and market-aware perspective of this new area invaluable.

...read moreread less

Abstract: From the Publisher: Component Software: Beyond Object-Oriented Programming explains the technical foundations of this evolving technology and its importance in the software market place. It provides in-depth discussion of both the technical and the business issues to be considered, then moves on to suggest approaches for implementing component-oriented software production and the organizational requirements for success. The author draws on his own experience to offer tried-and-tested solutions to common problems and novel approaches to potential pitfalls. Anyone responsible for developing software strategy, evaluating new technologies, buying or building software will find Clemens Szyperski's objective and market-aware perspective of this new area invaluable.

...read moreread less

5,556 citations

Software Framework for Topic Modelling with Large Corpora

[...]

Radim Řehůřek¹, Petr Sojka¹•Institutions (1)

Masaryk University¹

22 May 2010

TL;DR: This work describes a Natural Language Processing software framework which is based on the idea of document streaming, i.e. processing corpora document after document, in a memory independent fashion, and implements several popular algorithms for topical inference, including Latent Semantic Analysis and Latent Dirichlet Allocation in a way that makes them completely independent of the training corpus size.

...read moreread less

Abstract: Large corpora are ubiquitous in today's world and memory quickly becomes the limiting factor in practical applications of the Vector Space Model (VSM). We identify gap in existing VSM implementations, which is their scalability and ease of use. We describe a Natural Language Processing software framework which is based on the idea of document streaming, i.e. processing corpora document after document, in a memory independent fashion. In this framework, we implement several popular algorithms for topical inference, including Latent Semantic Analysis and Latent Dirichlet Allocation, in a way that makes them completely independent of the training corpus size. Particular emphasis is placed on straightforward and intuitive framework design, so that modifications and extensions of the methods and/or their application by interested practitioners are effortless. We demonstrate the usefulness of our approach on a real-world scenario of computing document similarities within an existing digital library DML-CZ.

...read moreread less

4,796 citations

Journal Article•10.1371/JOURNAL.PCBI.1006650•

BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis.

[...]

Remco R. Bouckaert¹, Remco R. Bouckaert², Timothy G. Vaughan³, Timothy G. Vaughan⁴, Joëlle Barido-Sottani³, Joëlle Barido-Sottani⁴, Sebastián Duchêne⁵, Mathieu Fourment⁶, Alexandra Gavryushkina⁷, Joseph Heled, Graham Jones⁸, Denise Kühnert¹, Nicola De Maio⁹, Michael Matschiner¹⁰, Fábio K. Mendes², Nicola F. Müller³, Nicola F. Müller⁴, Huw A. Ogilvie¹¹, Louis du Plessis¹², Alex Popinga², Andrew Rambaut¹³, David A. Rasmussen¹⁴, Igor Siveroni¹⁵, Marc A. Suchard¹⁶, Chieh-Hsi Wu¹², Dong Xie², Chi Zhang¹⁷, Tanja Stadler³, Tanja Stadler⁴, Alexei J. Drummond² - Show less +26 more•Institutions (17)

Max Planck Society¹, University of Auckland², Swiss Institute of Bioinformatics³, ETH Zurich⁴, University of Melbourne⁵, University of Technology, Sydney⁶, University of Otago⁷, University of Gothenburg⁸, European Bioinformatics Institute⁹, University of Basel¹⁰, Rice University¹¹, University of Oxford¹², University of Edinburgh¹³, North Carolina State University¹⁴, Imperial College London¹⁵, University of California, Los Angeles¹⁶, Chinese Academy of Sciences¹⁷

08 Apr 2019-PLOS Computational Biology

TL;DR: A series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release are described.

...read moreread less

Abstract: Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is increasingly appreciated that some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, including genome sequences, sampling dates, phenotypic data, radiocarbon dates, fossil occurrences, and biogeographic range information among others. Including all relevant data into a single joint model is very challenging both conceptually and computationally. Advanced computational software packages that allow robust development of compatible (sub-)models which can be composed into a full model hierarchy have played a key role in these developments. Developing such software frameworks is increasingly a major scientific activity in its own right, and comes with specific challenges, from practical software design, development and engineering challenges to statistical and conceptual modelling challenges. BEAST 2 is one such computational software platform, and was first announced over 4 years ago. Here we describe a series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release.

...read moreread less

3,600 citations

Journal Article•10.1007/S12532-018-0139-4•

CasADi: a software framework for nonlinear optimization and optimal control

[...]

Joel Andersson¹, Joris Gillis², Greg Horn, James B. Rawlings¹, Moritz Diehl³ - Show less +1 more•Institutions (3)

University of Wisconsin-Madison¹, Katholieke Universiteit Leuven², University of Freiburg³

20 Mar 2019-Mathematical Programming Computation

TL;DR: This article gives an up-to-date and accessible introduction to the CasADi framework, which has undergone numerous design improvements over the last 7 years.

...read moreread less

Abstract: We present CasADi, an open-source software framework for numerical optimization. CasADi is a general-purpose tool that can be used to model and solve optimization problems with a large degree of flexibility, larger than what is associated with popular algebraic modeling languages such as AMPL, GAMS, JuMP or Pyomo. Of special interest are problems constrained by differential equations, i.e. optimal control problems. CasADi is written in self-contained C++, but is most conveniently used via full-featured interfaces to Python, MATLAB or Octave. Since its inception in late 2009, it has been used successfully for academic teaching as well as in applications from multiple fields, including process control, robotics and aerospace. This article gives an up-to-date and accessible introduction to the CasADi framework, which has undergone numerous design improvements over the last 7 years.

...read moreread less

3,546 citations

...

Expand

Year	Papers
2025	2
2024	6
2023	3
2022	20
2021	194
2020	243

Topic Tools

Papers published on a yearly basis

Papers

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

Component Software: Beyond Object-Oriented Programming

Software Framework for Topic Modelling with Large Corpora

BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis.

CasADi: a software framework for nonlinear optimization and optimal control

Related Topics (5)

Performance Metrics