TL;DR: Key features of the gg Plot2 package are summarized with examples from pharmacometrics and pointers to available resources for learning ggplot2.
Abstract: Visualization is a powerful mechanism for extracting information from data. ggplot2 is a contributed visualization package in the R programming language, which creates publication-quality statistical graphics in an efficient, elegant, and systematic manner. This article summarizes key features of the package with examples from pharmacometrics and pointers to available resources for learning ggplot2.CPT: Pharmacometrics & Systems Pharmacology (2013) 2, e79; doi:10.1038/psp.2013.56; advance online publication 16 October 2013.
TL;DR: The BlueSNP R package implements GWAS statistical tests in the R programming language and executes the calculations across computer clusters configured with Apache Hadoop, a de facto standard framework for distributed data processing using the MapReduce formalism.
Abstract: Summary: Computational workloads for genome-wide association studies (GWAS) are growing in scale and complexity outpacing the capabilities of single-threaded software designed for personal computers. The BlueSNP R package implements GWAS statistical tests in the R programming language and executes the calculations across computer clusters configured with Apache Hadoop, a de facto standard framework for distributed data processing using the MapReduce formalism. BlueSNP makes computationally intensive analyses, such as estimating empirical p-values via data permutation, and searching for expression quantitative trait loci over thousands of genes, feasible for large genotype–phenotype datasets. Availability and implementation: http://github.com/ibmbioinformatics/bluesnp
TL;DR: Inlinedocs, an R package for generating documentation from comments, is presented and a new syntax for inline documentation of R code within comments adjacent to the relevant code, which allows for highly readable and maintainable code and documentation.
Abstract: This article presents inlinedocs, an R package for generating documentation from comments. The concept of structured, interwoven code and documentation has existed for many years, but existing systems that implement this for the R programming language do not tightly integrate with R code, leading to several drawbacks. This article attempts to address these issues and presents 2 contributions for documentation generation for the R community. First, we propose a new syntax for inline documentation of R code within comments adjacent to the relevant code, which allows for highly readable and maintainable code and documentation. Second, we propose an extensible system for parsing these comments, which allows the syntax to be easily augmented.
TL;DR: The basic concepts of Hadoop and MapReduce necessary for data analysts who are familiar with statistical programming are reviewed, through examples that combine the R programming language and Hadoops.
Abstract: As the need for large-scale data analysis is rapidly increasing, Hadoop, or the platform that realizes large-scale data processing, and MapReduce, or the internal computational model of Hadoop, are receiving great attention. This paper reviews the basic concepts of Hadoop and MapReduce necessary for data analysts who are familiar with statistical programming, through examples that combine the R programming language and Hadoop.
TL;DR: Several packages are developed that provide a tight coupling of R with highly scalable libraries, enabling scalability to terabytes of data on tens of thousands of cores.
Abstract: The R programming language is known for its diversity and sophistication in data analysis, however its scalability to big data has been lacking. Our project ”Programming with Big Data in R” (pbdR) is adding scalability to its list of data virtues. We have developed several packages that provide a tight coupling of R with highly scalable libraries, enabling scalability to terabytes of data on tens of thousands of cores. We also added classes and methods to handle distributed data objects needed by the libraries so that the R language syntax is largely unchanged. Our philosophy is that the R developer is not asked to deal with the details of managing the data distribution and processor communication but the developer is asked to be aware of the data distribution and provided with high-level functions to manage it if needed. Many R functions are already instrumented to handle the distributed data classes. We encourage developers of compute intensive R packages to use pbdR methods for scalability to bigger data and to bigger computing platforms.
TL;DR: R is presented as an open source alternative to the existing commercial GIS software and proves especially well when advanced quantitative methods on spatial data are needed (e.g. spatial modelling).
Abstract: R is a powerful and increasingly popular programming language with strong graphical and presentation features and large expanŽdability. Although primarily intended for statistical computing, R has paved its way to the field of GIS through the development of specialized extension packages. It offers a wide range of functions at all GIS levels: data acquisition, data manipulation, graphical reŽpresentation and quantitative analysis. The paper presents R as an open source alternative to the existing commercial GIS software. It proves especially well when advanced quantitative methods on spatial data are needed (e.g. spatial modelling). We demonstrate R capabilities through spatial analysis of forest area in Snežnik (South Slovenia), where the possibilities of data import, conversion and export into various GIS formats and possibilities of geostatistics, spatial modelling and spatial visualization are demonstrated.
TL;DR: This article proposes an algorithm to solve the quadratic programming problem of minimizing for positive definite Q, where is constrained to be in a closed polyhedral convex cone, and the m × n matrix is not necessarily full row rank.
Abstract: Problems involving estimation and inference under linear inequality constraints arise often in statistical modeling. In this article, we propose an algorithm to solve the quadratic programming problem of minimizing for positive definite Q, where is constrained to be in a closed polyhedral convex cone , and the m × n matrix is not necessarily full row rank. The three-step algorithm is intuitive and easy to code. Code is provided in the R programming language.
TL;DR: The R Programming Language R is an open source, open development computing environment and language for statistical computing and graphics that is popular in biostatistics, bioinformatics, financial market analysis, social network analysis and geospatial modeling.
Abstract: The R Programming Language R is an open source, open development computing environment and language for statistical computing and graphics1. R is popular in biostatistics, bioinformatics, financial market analysis, social network analysis and geospatial modeling. As a programming language, R is expressive and compact with a large collection of powerful functions and tools and operators for data representation, analysis and display. On-line tutorials are available for learning both basic and advanced R programming2.