About: Software framework is a research topic. Over the lifetime, 13644 publications have been published within this topic receiving 289255 citations. The topic is also known as: framework & software platform.
TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
TL;DR: Anyone responsible for developing software strategy, evaluating new technologies, buying or building software will find Clemens Szyperski's objective and market-aware perspective of this new area invaluable.
Abstract: From the Publisher:
Component Software: Beyond Object-Oriented Programming explains the technical foundations of this evolving technology and its importance in the software market place. It provides in-depth discussion of both the technical and the business issues to be considered, then moves on to suggest approaches for implementing component-oriented software production and the organizational requirements for success. The author draws on his own experience to offer tried-and-tested solutions to common problems and novel approaches to potential pitfalls. Anyone responsible for developing software strategy, evaluating new technologies, buying or building software will find Clemens Szyperski's objective and market-aware perspective of this new area invaluable.
TL;DR: This work describes a Natural Language Processing software framework which is based on the idea of document streaming, i.e. processing corpora document after document, in a memory independent fashion, and implements several popular algorithms for topical inference, including Latent Semantic Analysis and Latent Dirichlet Allocation in a way that makes them completely independent of the training corpus size.
Abstract: Large corpora are ubiquitous in today's world and memory
quickly becomes the limiting factor in practical applications
of the Vector Space Model (VSM). We identify gap in existing
VSM implementations, which is their scalability and ease of
use. We describe a Natural Language Processing software
framework which is based on the idea of document streaming,
i.e. processing corpora document after document, in a memory
independent fashion. In this framework, we implement several
popular algorithms for topical inference, including Latent
Semantic Analysis and Latent Dirichlet Allocation, in a way
that makes them completely independent of the training corpus
size. Particular emphasis is placed on straightforward and
intuitive framework design, so that modifications and
extensions of the methods and/or their application by
interested practitioners are effortless. We demonstrate the
usefulness of our approach on a real-world scenario of
computing document similarities within an existing digital
library DML-CZ.
TL;DR: A series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release are described.
Abstract: Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is increasingly appreciated that some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, including genome sequences, sampling dates, phenotypic data, radiocarbon dates, fossil occurrences, and biogeographic range information among others. Including all relevant data into a single joint model is very challenging both conceptually and computationally. Advanced computational software packages that allow robust development of compatible (sub-)models which can be composed into a full model hierarchy have played a key role in these developments. Developing such software frameworks is increasingly a major scientific activity in its own right, and comes with specific challenges, from practical software design, development and engineering challenges to statistical and conceptual modelling challenges. BEAST 2 is one such computational software platform, and was first announced over 4 years ago. Here we describe a series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release.
TL;DR: This article gives an up-to-date and accessible introduction to the CasADi framework, which has undergone numerous design improvements over the last 7 years.
Abstract: We present CasADi, an open-source software framework for numerical optimization. CasADi is a general-purpose tool that can be used to model and solve optimization problems with a large degree of flexibility, larger than what is associated with popular algebraic modeling languages such as AMPL, GAMS, JuMP or Pyomo. Of special interest are problems constrained by differential equations, i.e. optimal control problems. CasADi is written in self-contained C++, but is most conveniently used via full-featured interfaces to Python, MATLAB or Octave. Since its inception in late 2009, it has been used successfully for academic teaching as well as in applications from multiple fields, including process control, robotics and aerospace. This article gives an up-to-date and accessible introduction to the CasADi framework, which has undergone numerous design improvements over the last 7 years.