Open Access
Large-Scale Parallel Statistical Forecasting Computations in R
Murray Stokely,Farzan Rohani,Eric Tassone +2 more
- 01 Jan 2011
14
TL;DR: This work generates simulation-based uncertainty bands, which necessitates a large number of computationally intensive realizations, and applies this approach to a forecasting application that fits a variety of models, prohibiting an analytical description of the statistical uncertainty associated with the overall forecast.
read more
Abstract: We demonstrate the utility of massively parallel computational infrastructure for statistical computing using the MapReduce paradigm for R. This framework allows users to write computations in a high-level language that are then broken up and distributed to worker tasks in Google datacenters. Results are collected in a scalable, distributed data store and returned to the interactive user session. We apply our approach to a forecasting application that fits a variety of models, prohibiting an analytical description of the statistical uncertainty associated with the overall forecast. To overcome this, we generate simulation-based uncertainty bands, which necessitates a large number of computationally intensive realizations. Our technique cut total run time by a factor of 300. Distributing the computation across many machines permits analysts to focus on statistical issues while answering questions that would be intractable without significant parallel computational infrastructure. We present real-world performance characteristics from our application to allow practitioners to better understand the nature of massively parallel statistical simulations in R.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Supporting Very Large Models using Automatic Dataflow Graph Partitioning
TL;DR: Tofu as mentioned in this paper partitions a dataflow graph of fine-grained tensor operators in order to work transparently with a general-purpose deep learning platform like MXNet.
Supporting Very Large Models using Automatic Dataflow Graph Partitioning
Minjie Wang,Chien-Chin Huang,Jinyang Li +2 more
- 25 Mar 2019
TL;DR: Tofu as discussed by the authors uses a recursive search algorithm that minimizes the total communication cost to partition a dataflow graph of fine-grained tensor operators used by platforms like MXNet and TensorFlow.
71
Workload-Driven VM Consolidation in Cloud Data Centers
Hao Lin,Xin Qi,Shuo Yang,Samuel P. Midkiff +3 more
- 25 May 2015
TL;DR: This paper uses a multi-capacity bin packing technique that efficiently places VMs onto physical servers that mathematically guarantee the VM scheduling meets the Service Level Objectives (SLO) and guarantees statistically that the desired success probability of the SLO is met.
33
Rolling window time series prediction using MapReduce
Lei Li,Farzad Noorian,Duncan J. M. Moss,Philip H. W. Leong +3 more
- 30 Aug 2014
TL;DR: A novel framework to facilitate retrieval and rolling-window prediction of irregularly sampled large-scale time series data is presented and by introducing a new index pool data structure, processing of time series can be efficiently parallelised.
•Posted Content
Unifying Data, Model and Hybrid Parallelism in Deep Learning via Tensor Tiling.
TL;DR: This work proposes an algorithm that can find the best tiling to partition tensors with the least overall communication and builds the SoyBean system, which automatically transforms a serial dataflow graph captured by an existing deep learning system frontend into a parallel dataflowgraph based on the optimal tiling it has found.
27
References
•Book
An introduction to the bootstrap
Bradley Efron,Robert Tibshirani +1 more
- 01 Jan 1993
TL;DR: This article presents bootstrap methods for estimation, using simple arguments, with Minitab macros for implementing these methods, as well as some examples of how these methods could be used for estimation purposes.
•Book
Neural Networks: A Comprehensive Foundation
Simon Haykin
- 16 Jul 1998
TL;DR: Thorough, well-organized, and completely up to date, this book examines all the important aspects of this emerging technology, including the learning process, back-propagation learning, radial-basis function networks, self-organizing systems, modular networks, temporal processing and neurodynamics, and VLSI implementation of neural networks.
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
- 06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
An Introduction to the Bootstrap.
Bradley Efron,Robert Tibshirani +1 more
TL;DR: In this article, the authors present a geometric representation for the Bootstrap and the Jackknife, as well as an overview of nonparametric and Parametric Inference methods for estimating the error in Bootstrap estimates.
15.3K