Scispace (Formerly Typeset)
  1. Home
  2. Topics
  3. Bayesian inference
  4. 2017
  1. Home
  2. Topics
  3. Bayesian inference
  4. 2017
Showing papers on "Bayesian inference published in 2017"
Journal Article•10.18637/JSS.V080.I01•
brms: An R Package for Bayesian Multilevel Models Using Stan

[...]

Paul-Christian Bürkner
29 Aug 2017-Journal of Statistical Software
TL;DR: The brms package implements Bayesian multilevel models in R using the probabilistic programming language Stan, allowing users to fit linear, robust linear, binomial, Poisson, survival, ordinal, zero-inflated, hurdle, and even non-linear models all in a multileVEL context.
Abstract: The brms package implements Bayesian multilevel models in R using the probabilistic programming language Stan. A wide range of distributions and link functions are supported, allowing users to fit - among others - linear, robust linear, binomial, Poisson, survival, ordinal, zero-inflated, hurdle, and even non-linear models all in a multilevel context. Further modeling options include autocorrelation of the response variable, user defined covariance structures, censored data, as well as meta-analytic standard errors. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs. In addition, model fit can easily be assessed and compared with the Watanabe-Akaike information criterion and leave-one-out cross-validation.

7,445 citations

Journal Article•10.18637/JSS.V076.I01•
Stan : A Probabilistic Programming Language

[...]

Bob Carpenter, Andrew Gelman, Matthew D. Hoffman, Daniel D. Lee, Ben Goodrich, Michael Betancourt, Marcus A. Brubaker, Jiqiang Guo, Peter Li, Allen Riddell 
11 Jan 2017-Journal of Statistical Software
TL;DR: Stan as discussed by the authors is a probabilistic programming language for specifying statistical models, where a program imperatively defines a log probability function over parameters conditioned on specified data and constants, which can be used in alternative algorithms such as variational Bayes, expectation propagation, and marginal inference using approximate integration.
Abstract: Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.14.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectation propagation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can be called from the command line using the cmdstan package, through R using the rstan package, and through Python using the pystan package. All three interfaces support sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, gradients, Hessians, parameter transforms, and specialized plotting.

7,309 citations

Journal Article•10.1080/01621459.2017.1285773•
Variational Inference: A Review for Statisticians

[...]

David M. Blei1, Alp Kucukelbir1, Jon McAuliffe2•
Columbia University1, University of California, Berkeley2
27 Feb 2017-Journal of the American Statistical Association
TL;DR: For instance, mean-field variational inference as discussed by the authors approximates probability densities through optimization, which is used in many applications and tends to be faster than classical methods, such as Markov chain Monte Carlo sampling.
Abstract: One of the core problems of modern statistics is to approximate difficult-to-compute probability densities. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation involving the posterior density. In this article, we review variational inference (VI), a method from machine learning that approximates probability densities through optimization. VI has been used in many applications and tends to be faster than classical methods, such as Markov chain Monte Carlo sampling. The idea behind VI is to first posit a family of densities and then to find a member of that family which is close to the target density. Closeness is measured by Kullback–Leibler divergence. We review the ideas behind mean-field variational inference, discuss the special case of VI applied to exponential family models, present a full example with a Bayesian mixture of Gaussians, and derive a variant that uses stochastic optimization to scale up to massive data...

4,463 citations

Stan: A Probabilistic Programming Language.

[...]

Bob Carpenter, Andrew Gelman, Matthew D. Hoffman, Daniel D. Lee, Ben Goodrich, Michael Betancourt, Marcus A. Brubaker, Jiqiang Guo, Peter Li, Allen Riddell 
1 Jan 2017
TL;DR: Stan is a probabilistic programming language for specifying statistical models that provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler and an adaptive form of Hamiltonian Monte Carlo sampling.
Abstract: Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.14.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectation propagation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can be called from the command line using the cmdstan package, through R using the rstan package, and through Python using the pystan package. All three interfaces support sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, gradients, Hessians, parameter transforms, and specialized plotting.

3,490 citations

Journal Article•10.1007/S11222-016-9696-4•
Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

[...]

Aki Vehtari1, Andrew Gelman2, Jonah Gabry2•
Helsinki Institute for Information Technology1, Columbia University2
01 Sep 2017-Statistics and Computing
TL;DR: In this paper, leave-one-out cross-validation (LOO) and the widely applicable information criterion (WAIC) are used to estimate pointwise out-of-sample prediction accuracy from a fitted Bayesian model using the log-likelihood evaluated at the posterior simulations of the parameter values.
Abstract: Leave-one-out cross-validation (LOO) and the widely applicable information criterion (WAIC) are methods for estimating pointwise out-of-sample prediction accuracy from a fitted Bayesian model using the log-likelihood evaluated at the posterior simulations of the parameter values. LOO and WAIC have various advantages over simpler estimates of predictive error such as AIC and DIC but are less used in practice because they involve additional computational steps. Here we lay out fast and stable computations for LOO and WAIC that can be performed using existing simulation draws. We introduce an efficient computation of LOO using Pareto-smoothed importance sampling (PSIS), a new procedure for regularizing importance weights. Although WAIC is asymptotically equal to LOO, we demonstrate that PSIS-LOO is more robust in the finite case with weak priors or influential observations. As a byproduct of our calculations, we also obtain approximate standard errors for estimated predictive errors and for comparison of predictive errors between two models. We implement the computations in an R package called loo and demonstrate using models fit with the Bayesian inference package Stan.

1,533 citations

Journal Article•10.1162/NECO_A_00912•
Active inference: A process theory

[...]

Karl J. Friston1, Thomas H. B. FitzGerald1, Francesco Rigoli1, Philipp Schwartenbeck1, Giovanni Pezzulo2 •
Wellcome Trust Centre for Neuroimaging1, National Research Council2
01 Jan 2017-Neural Computation
TL;DR: The fact that a gradient descent appears to be a valid description of neuronal activity means that variational free energy is a Lyapunov function for neuronal dynamics, which therefore conform to Hamilton’s principle of least action.
Abstract: This article describes a process theory based on active inference and belief propagation. Starting from the premise that all neuronal processing and action selection can be explained by maximizing Bayesian model evidence-or minimizing variational free energy-we ask whether neuronal responses can be described as a gradient descent on variational free energy. Using a standard Markov decision process generative model, we derive the neuronal dynamics implicit in this description and reproduce a remarkable range of well-characterized neuronal phenomena. These include repetition suppression, mismatch negativity, violation responses, place-cell activity, phase precession, theta sequences, theta-gamma coupling, evidence accumulation, race-to-bound dynamics, and transfer of dopamine responses. Furthermore, the approximately Bayes' optimal behavior prescribed by these dynamics has a degree of face validity, providing a formal explanation for reward seeking, context learning, and epistemic foraging. Technically, the fact that a gradient descent appears to be a valid description of neuronal activity means that variational free energy is a Lyapunov function for neuronal dynamics, which therefore conform to Hamilton's principle of least action.

1,016 citations

Journal Article•10.1146/ANNUREV-STATISTICS-060116-054045•
Bayesian Computing with INLA: A Review

[...]

Håvard Rue1, Andrea Riebler1, Sigrunn Holbek Sørbye2, Janine B. Illian3, Daniel Simpson4, Finn Lindgren5 •
Norwegian University of Science and Technology1, University of Tromsø2, University of St Andrews3, University of Bath4, University of Edinburgh5
10 Mar 2017-Social Science Research Network
TL;DR: Integrated nested Laplace approximations (INLA) as mentioned in this paper approximates the integrand with a second-order Taylor expansion around the mode and computes the integral analytically.
Abstract: The key operation in Bayesian inference is to compute high-dimensional integrals. An old approximate technique is the Laplace method or approximation, which dates back to Pierre-Simon Laplace (1774). This simple idea approximates the integrand with a second-order Taylor expansion around the mode and computes the integral analytically. By developing a nested version of this classical idea, combined with modern numerical techniques for sparse matrices, we obtain the approach of integrated nested Laplace approximations (INLA) to do approximate Bayesian inference for latent Gaussian models (LGMs). LGMs represent an important model abstraction for Bayesian inference and include a large proportion of the statistical models used today. In this review, we discuss the reasons for the success of the INLA approach, the R-INLA package, why it is so accurate, why the approximations are very quick to compute, and why LGMs make such a useful concept for Bayesian computing.

678 citations

Journal Article•10.5555/3122009.3208015•
Stochastic Gradient Descent as Approximate Bayesian Inference

[...]

Stephan Mandt, Matthew D. Hoffman, David M. Blei
01 Jan 2017-Journal of Machine Learning Research
TL;DR: It is demonstrated that constant SGD gives rise to a new variational EM algorithm that optimizes hyperparameters in complex probabilistic models and a scalable approximate MCMC algorithm, the Averaged Stochastic Gradient Sampler is proposed.
Abstract: Stochastic Gradient Descent with a constant learning rate (constant SGD) simulates a Markov chain with a stationary distribution. With this perspective, we derive several new results. (1) We show t...

670 citations

Journal Article•10.5555/3122009.3122023•
Automatic differentiation variational inference

[...]

Alp Kucukelbir1, Dustin Tran1, Rajesh Ranganath2, Andrew Gelman1, David M. Blei1 •
Columbia University1, Princeton University2
01 Jan 2017-Journal of Machine Learning Research
TL;DR: This article proposed an automatic differentiation variational inference (ADVI) method for probabilistic models. But the method requires the model and the data to be shared, which makes it difficult to efficiently cycle through the steps of fitting complex models to large data.
Abstract: Probabilistic modeling is iterative. A scientist posits a simple model, fits it to her data, refines it according to her analysis, and repeats. However, fitting complex models to large data is a bottleneck in this process. Deriving algorithms for new models can be both mathematically and computationally challenging, which makes it difficult to efficiently cycle through the steps. To this end, we develop automatic differentiation variational inference (ADVI). Using our method, the scientist only provides a probabilistic model and a dataset, nothing else. ADVI automatically derives an efficient variational inference algorithm, freeing the scientist to refine and explore many models. ADVI supports a broad class of models--no conjugacy assumptions are required. We study ADVI across ten modern probabilistic models and apply it to a dataset with millions of observations. We deploy ADVI as part of Stan, a probabilistic programming system.

549 citations

Journal Article•10.3390/E19100555•
The Prior Can Often Only Be Understood in the Context of the Likelihood

[...]

Andrew Gelman1, Daniel Simpson2, Michael Betancourt1•
Columbia University1, University of Toronto2
19 Oct 2017-Entropy
TL;DR: The authors place the choice of prior into the context of the entire Bayesian analysis, from inference to prediction, from prediction to model evaluation, and show that the prior distribution can be chosen without reference to the model of the measurement process, while most common prior modeling techniques are implicitly motivated by a reference likelihood.
Abstract: A key sticking point of Bayesian analysis is the choice of prior distribution, and there is a vast literature on potential defaults including uniform priors, Jeffreys’ priors, reference priors, maximum entropy priors, and weakly informative priors. These methods, however, often manifest a key conceptual tension in prior modeling: a model encoding true prior information should be chosen without reference to the model of the measurement process, but almost all common prior modeling techniques are implicitly motivated by a reference likelihood. In this paper we resolve this apparent paradox by placing the choice of prior into the context of the entire Bayesian analysis, from inference to prediction to model evaluation.

411 citations

Journal Article•10.1162/NECO_A_00999•
Active Inference, Curiosity and Insight

[...]

Karl J. Friston1, Marco Lin1, Chris D. Frith1, Giovanni Pezzulo2, J. Allan Hobson1, Sasha Ondobaka1 •
Wellcome Trust Centre for Neuroimaging1, National Research Council2
11 Sep 2017-Neural Computation
TL;DR: This article uses simulations of abstract rule learning and approximate Bayesian inference to show that minimizing (expected) variational free energy leads to active sampling of novel contingencies and closes explanatory gaps in generative models of the world, thereby reducing uncertainty and satisfying curiosity.
Abstract: This article offers a formal account of curiosity and insight in terms of active (Bayesian) inference. It deals with the dual problem of inferring states of the world and learning its statistical structure. In contrast to current trends in machine learning (e.g., deep learning), we focus on how people attain insight and understanding using just a handful of observations, which are solicited through curious behavior. We use simulations of abstract rule learning and approximate Bayesian inference to show that minimizing (expected) variational free energy leads to active sampling of novel contingencies. This epistemic behavior closes explanatory gaps in generative models of the world, thereby reducing uncertainty and satisfying curiosity. We then move from epistemic learning to model selection or structure learning to show how abductive processes emerge when agents test plausible hypotheses about symmetries (i.e., invariances or rules) in their generative models. The ensuing Bayesian model reduction evinces ...
Journal Article•10.1037/MET0000065•
Improving transparency and replication in Bayesian statistics: The WAMBS-Checklist.

[...]

Sarah Depaoli1, Rens van de Schoot2•
University of California, Merced1, Utrecht University2
01 Jun 2017-Psychological Methods
TL;DR: A succinct checklist, the WAMBS-checklist (When to worry and how to Avoid the Misuse of Bayesian Statistics), is developed to describe 10 main points that should be thoroughly checked when applying Bayesian analysis.
Abstract: Bayesian statistical methods are slowly creeping into all fields of science and are becoming ever more popular in applied research. Although it is very attractive to use Bayesian statistics, our personal experience has led us to believe that naively applying Bayesian methods can be dangerous for at least 3 main reasons: the potential influence of priors, misinterpretation of Bayesian features and results, and improper reporting of Bayesian results. To deal with these 3 points of potential danger, we have developed a succinct checklist: the WAMBS-checklist (When to worry and how to Avoid the Misuse of Bayesian Statistics). The purpose of the questionnaire is to describe 10 main points that should be thoroughly checked when applying Bayesian analysis. We provide an account of "when to worry" for each of these issues related to: (a) issues to check before estimating the model, (b) issues to check after estimating the model but before interpreting results, (c) understanding the influence of priors, and (d) actions to take after interpreting results. To accompany these key points of concern, we will present diagnostic tools that can be used in conjunction with the development and assessment of a Bayesian model. We also include examples of how to interpret results when "problems" in estimation arise, as well as syntax and instructions for implementation. Our aim is to stress the importance of openness and transparency of all aspects of Bayesian estimation, and it is our hope that the WAMBS questionnaire can aid in this process. (PsycINFO Database Record
Posted Content•
Deep Neural Networks as Gaussian Processes

[...]

Jaehoon Lee1, Yasaman Bahri2, Roman Novak2, Samuel S. Schoenholz2, Jeffrey Pennington2, Jascha Sohl-Dickstein2 •
University of British Columbia1, Google2
01 Nov 2017-arXiv: Machine Learning
TL;DR: In this article, the authors derive the exact equivalence between infinitely wide deep networks and Gaussian Processes (GP) and develop a computationally efficient pipeline to compute the covariance function for these GPs.
Abstract: It has long been known that a single-layer fully-connected neural network with an i.i.d. prior over its parameters is equivalent to a Gaussian process (GP), in the limit of infinite network width. This correspondence enables exact Bayesian inference for infinite width neural networks on regression tasks by means of evaluating the corresponding GP. Recently, kernel functions which mimic multi-layer random neural networks have been developed, but only outside of a Bayesian framework. As such, previous work has not identified that these kernels can be used as covariance functions for GPs and allow fully Bayesian prediction with a deep neural network. In this work, we derive the exact equivalence between infinitely wide deep networks and GPs. We further develop a computationally efficient pipeline to compute the covariance function for these GPs. We then use the resulting GPs to perform Bayesian inference for wide deep neural networks on MNIST and CIFAR-10. We observe that trained neural network accuracy approaches that of the corresponding GP with increasing layer width, and that the GP uncertainty is strongly correlated with trained network prediction error. We further find that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite-width networks. Finally we connect the performance of these GPs to the recent theory of signal propagation in random neural networks.
Journal Article•10.1007/S11222-016-9649-Y•
Comparison of Bayesian predictive methods for model selection

[...]

Juho Piironen1, Aki Vehtari1•
Helsinki Institute for Information Technology1
01 May 2017-Statistics and Computing
TL;DR: The study demonstrates that the model selection can greatly benefit from using cross-validation outside the searching process both for guiding the model size selection and assessing the predictive performance of the finally selected model.
Abstract: The goal of this paper is to compare several widely used Bayesian model selection methods in practical model selection problems, highlight their differences and give recommendations about the preferred approaches. We focus on the variable subset selection for regression and classification and perform several numerical experiments using both simulated and real world data. The results show that the optimization of a utility estimate such as the cross-validation (CV) score is liable to finding overfitted models due to relatively high variance in the utility estimates when the data is scarce. This can also lead to substantial selection induced bias and optimism in the performance evaluation for the selected model. From a predictive viewpoint, best results are obtained by accounting for model uncertainty by forming the full encompassing model, such as the Bayesian model averaging solution over the candidate models. If the encompassing model is too complex, it can be robustly simplified by the projection method, in which the information of the full model is projected onto the submodels. This approach is substantially less prone to overfitting than selection based on CV-score. Overall, the projection method appears to outperform also the maximum a posteriori model and the selection of the most probable variables. The study also demonstrates that the model selection can greatly benefit from using cross-validation outside the searching process both for guiding the model size selection and assessing the predictive performance of the finally selected model.
Posted Content•
Stochastic Gradient Descent as Approximate Bayesian Inference

[...]

Stephan Mandt, Matthew D. Hoffman, David M. Blei
13 Apr 2017-arXiv: Machine Learning
TL;DR: In this paper, an approximate Bayesian posterior inference algorithm for stochastic gradient descent with constant SGD was proposed, where the tuning parameters of SGD were adjusted to best match the stationary distribution to a posterior, minimizing the Kullback-Leibler divergence.
Abstract: Stochastic Gradient Descent with a constant learning rate (constant SGD) simulates a Markov chain with a stationary distribution. With this perspective, we derive several new results. (1) We show that constant SGD can be used as an approximate Bayesian posterior inference algorithm. Specifically, we show how to adjust the tuning parameters of constant SGD to best match the stationary distribution to a posterior, minimizing the Kullback-Leibler divergence between these two distributions. (2) We demonstrate that constant SGD gives rise to a new variational EM algorithm that optimizes hyperparameters in complex probabilistic models. (3) We also propose SGD with momentum for sampling and show how to adjust the damping coefficient accordingly. (4) We analyze MCMC algorithms. For Langevin Dynamics and Stochastic Gradient Fisher Scoring, we quantify the approximation errors due to finite learning rates. Finally (5), we use the stochastic process perspective to give a short proof of why Polyak averaging is optimal. Based on this idea, we propose a scalable approximate MCMC algorithm, the Averaged Stochastic Gradient Sampler.
Journal Article•10.3758/S13423-016-1199-Y•
The drift diffusion model as the choice rule in reinforcement learning.

[...]

Mads Lund Pedersen1, Michael J. Frank2, Guido Biele1•
University of Oslo1, Brown University2
01 Aug 2017-Psychonomic Bulletin & Review
TL;DR: The model approach described here, using simultaneous estimation of reinforcement-learning and drift diffusion model parameters, shows promise for revealing new insights into the cognitive and neural mechanisms of learning and decision making, as well as the alteration of such processes in clinical groups.
Abstract: Current reinforcement-learning models often assume simplified decision processes that do not fully reflect the dynamic complexities of choice processes. Conversely, sequential-sampling models of decision making account for both choice accuracy and response time, but assume that decisions are based on static decision values. To combine these two computational models of decision making and learning, we implemented reinforcement-learning models in which the drift diffusion model describes the choice process, thereby capturing both within- and across-trial dynamics. To exemplify the utility of this approach, we quantitatively fit data from a common reinforcement-learning paradigm using hierarchical Bayesian parameter estimation, and compared model variants to determine whether they could capture the effects of stimulant medication in adult patients with attention-deficit hyperactivity disorder (ADHD). The model with the best relative fit provided a good description of the learning process, choices, and response times. A parameter recovery experiment showed that the hierarchical Bayesian modeling approach enabled accurate estimation of the model parameters. The model approach described here, using simultaneous estimation of reinforcement-learning and drift diffusion model parameters, shows promise for revealing new insights into the cognitive and neural mechanisms of learning and decision making, as well as the alteration of such processes in clinical groups.
Journal Article•10.1037/BUL0000097•
Bayesian approaches to autism: Towards volatility, action, and behavior.

[...]

Colin J. Palmer1, Rebecca P. Lawson2, Jakob Hohwy1•
Monash University1, University College London2
23 Mar 2017-Psychological Bulletin
TL;DR: It is proposed that autism is characterized by a greater weighting of sensory information in updating probabilistic representations of the environment, and hypotheses regarding atypical sensory weighting in autism have direct implications for the regulation of action and behavior.
Abstract: Autism spectrum disorder currently lacks an explanation that bridges cognitive, computational, and neural domains. In the past 5 years, progress has been sought in this area by drawing on Bayesian probability theory to describe both social and nonsocial aspects of autism in terms of systematic differences in the processing of sensory information in the brain. The present article begins by synthesizing the existing literature in this regard, including an introduction to the topic for unfamiliar readers. The key proposal is that autism is characterized by a greater weighting of sensory information in updating probabilistic representations of the environment. Here, we unpack further how the hierarchical setting of Bayesian inference in the brain (i.e., predictive processing) adds significant depth to this approach. In particular, autism may relate to finer mechanisms involved in the context-sensitive adjustment of sensory weightings, such as in how neural representations of environmental volatility inform perception. Crucially, in light of recent sensorimotor treatments of predictive processing (i.e., active inference), hypotheses regarding atypical sensory weighting in autism have direct implications for the regulation of action and behavior. Given that core features of autism relate to how the individual interacts with and samples the world around them (e.g., reduced social responding, repetitive behaviors, motor impairments, and atypical visual sampling), the extension of Bayesian theories of autism to action will be critical for yielding insights into this condition. (PsycINFO Database Record
Journal Article•
A Bayesian framework for learning rule sets for interpretable classification

[...]

Tong Wang1, Cynthia Rudin2, Finale Doshi-Velez3, Yimin Liu, Erica Klampfl4, Perry Robinson MacNeille4 •
University of Iowa1, Duke University2, Harvard University3, Ford Motor Company4
01 Jan 2017-Journal of Machine Learning Research
TL;DR: The method (Bayesian Rule Sets - BRS) is applied to characterize and predict user behavior with respect to in-vehicle context-aware personalized recommender systems and has a major advantage over classical associative classification methods and decision trees.
Abstract: We present a machine learning algorithm for building classifiers that are comprised of a small number of short rules. These are restricted disjunctive normal form models. An example of a classifier of this form is as follows: If X satisfies (condition A AND condition B) OR (condition C) OR ..., then Y = 1. Models of this form have the advantage of being interpretable to human experts since they produce a set of rules that concisely describe a specific class. We present two probabilistic models with prior parameters that the user can set to encourage the model to have a desired size and shape, to conform with a domain-specific definition of interpretability. We provide a scalable MAP inference approach and develop theoretical bounds to reduce computation by iteratively pruning the search space. We apply our method (Bayesian Rule Sets - BRS) to characterize and predict user behavior with respect to in-vehicle context-aware personalized recommender systems. Our method has a major advantage over classical associative classification methods and decision trees in that it does not greedily grow the model.
Journal Article•10.1016/J.INFFUS.2016.11.010•
A review of source term estimation methods for atmospheric dispersion events using static or mobile sensors

[...]

Michael Hutchinson1, Hyondong Oh2, Wen-Hua Chen1•
Loughborough University1, Ulsan National Institute of Science and Technology2
01 Jul 2017-Information Fusion
TL;DR: This paper presents a review of techniques used to gain information about atmospheric dispersion events using static or mobile sensors and discusses on the current limitations of the state of the art and recommendations for future research.
Journal Article•10.1111/2041-210X.12681•
Faster estimation of Bayesian models in ecology using Hamiltonian Monte Carlo

[...]

Cole C. Monnahan1, James T. Thorson2, Trevor A. Branch1•
University of Washington1, National Oceanic and Atmospheric Administration2
01 Mar 2017-Methods in Ecology and Evolution
TL;DR: Stan is a valuable tool for many ecologists utilizing Bayesian inference, particularly for problems where BUGS is prohibitively slow, and can extend the boundaries of feasible models for applied problems, leading to better understanding of ecological processes.
Abstract: Summary Bayesian inference is a powerful tool to better understand ecological processes across varied subfields in ecology, and is often implemented in generic and flexible software packages such as the widely used BUGS family (BUGS, WinBUGS, OpenBUGS and JAGS). However, some models have prohibitively long run times when implemented in BUGS. A relatively new software platform called Stan uses Hamiltonian Monte Carlo (HMC), a family of Markov chain Monte Carlo (MCMC) algorithms which promise improved efficiency and faster inference relative to those used by BUGS. Stan is gaining traction in many fields as an alternative to BUGS, but adoption has been slow in ecology, likely due in part to the complex nature of HMC. Here, we provide an intuitive illustration of the principles of HMC on a set of simple models. We then compared the relative efficiency of BUGS and Stan using population ecology models that vary in size and complexity. For hierarchical models, we also investigated the effect of an alternative parameterization of random effects, known as non-centering. For small, simple models there is little practical difference between the two platforms, but Stan outperforms BUGS as model size and complexity grows. Stan also performs well for hierarchical models, but is more sensitive to model parameterization than BUGS. Stan may also be more robust to biased inference caused by pathologies, because it produces diagnostic warnings where BUGS provides none. Disadvantages of Stan include an inability to use discrete parameters, more complex diagnostics and a greater requirement for hands-on tuning. Given these results, Stan is a valuable tool for many ecologists utilizing Bayesian inference, particularly for problems where BUGS is prohibitively slow. As such, Stan can extend the boundaries of feasible models for applied problems, leading to better understanding of ecological processes. Fields that would likely benefit include estimation of individual and population growth rates, meta-analyses and cross-system comparisons and spatiotemporal models.
Proceedings Article•
Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems

[...]

Scott W. Linderman1, Matthew J. Johnson, Andrew Miller2, Ryan P. Adams, David M. Blei1, Liam Paninski1 •
Columbia University1, Harvard University2
10 Apr 2017
TL;DR: This work develops a model class and Bayesian inference algorithms that not only discover these dynamical units but also, by learning how transition probabilities depend on observations or continuous latent states, explain their switching behavior.
Abstract: Many natural systems, such as neurons firing in the brain or basketball teams traversing a court, give rise to time series data with complex, nonlinear dynamics. We can gain insight into these systems by decomposing the data into segments that are each explained by simpler dynamic units. Building on switching linear dynamical systems (SLDS), we develop a model class and Bayesian inference algorithms that not only discover these dynamical units but also, by learning how transition probabilities depend on observations or continuous latent states, explain their switching behavior. Our key innovation is to design these recurrent SLDS models to enable recent Pólya-gamma auxiliary variable techniques and thus make approximate Bayesian learning and inference in these models easy, fast, and scalable.
Journal Article•10.1093/MOLBEV/MSW279•
RWTY (R We There Yet): An R Package for Examining Convergence of Bayesian Phylogenetic Analyses.

[...]

Dan L. Warren1, Anthony J. Geneva2, Robert Lanfear1, Robert Lanfear3•
Macquarie University1, Harvard University2, Australian National University3
01 Apr 2017-Molecular Biology and Evolution
TL;DR: RWTY as mentioned in this paper is an R package that implements established and new methods for diagnosing phylogenetic MCMC convergence in a single convenient interface, which can be used for large data sets.
Abstract: Bayesian inference using Markov chain Monte Carlo (MCMC) has become one of the primary methods used to infer phylogenies from sequence data. Assessing convergence is a crucial component of these analyses, as it establishes the reliability of the posterior distribution estimates of the tree topology and model parameters sampled from the MCMC. Numerous tests and visualizations have been developed for this purpose, but many of the most popular methods are implemented in ways that make them inconvenient to use for large data sets. RWTY is an R package that implements established and new methods for diagnosing phylogenetic MCMC convergence in a single convenient interface.
Journal Article•10.1109/TAC.2017.2690401•
Fast Convergence Rates for Distributed Non-Bayesian Learning

[...]

Angelia Nedic1, Alex Olshevsky2, César A. Uribe3•
Arizona State University1, Boston University2, University of Illinois at Urbana–Champaign3
31 Mar 2017-IEEE Transactions on Automatic Control
TL;DR: In this article, the authors consider the problem of distributed learning where a network of agents collectively aim to agree on a hypothesis that best explains a set of distributed observations of conditionally independent random processes.
Abstract: We consider the problem of distributed learning , where a network of agents collectively aim to agree on a hypothesis that best explains a set of distributed observations of conditionally independent random processes. We propose a distributed algorithm and establish consistency, as well as a nonasymptotic, explicit, and geometric convergence rate for the concentration of the beliefs around the set of optimal hypotheses. Additionally, if the agents interact over static networks, we provide an improved learning protocol with better scalability with respect to the number of nodes in the network.
Posted Content•
Structured Bayesian Pruning via Log-Normal Multiplicative Noise

[...]

Kirill Neklyudov1, Dmitry Molchanov1, Arsenii Ashukha1, Dmitry Vetrov1•
National Research University – Higher School of Economics1
20 May 2017-arXiv: Machine Learning
TL;DR: A new Bayesian model is proposed that takes into account the computational structure of neural networks and provides structured sparsity, e.g. removes neurons and/or convolutional channels in CNNs and provides significant acceleration on a number of deep neural architectures.
Abstract: Dropout-based regularization methods can be regarded as injecting random noise with pre-defined magnitude to different parts of the neural network during training. It was recently shown that Bayesian dropout procedure not only improves generalization but also leads to extremely sparse neural architectures by automatically setting the individual noise magnitude per weight. However, this sparsity can hardly be used for acceleration since it is unstructured. In the paper, we propose a new Bayesian model that takes into account the computational structure of neural networks and provides structured sparsity, e.g. removes neurons and/or convolutional channels in CNNs. To do this we inject noise to the neurons outputs while keeping the weights unregularized. We establish the probabilistic model with a proper truncated log-uniform prior over the noise and truncated log-normal variational approximation that ensures that the KL-term in the evidence lower bound is computed in closed-form. The model leads to structured sparsity by removing elements with a low SNR from the computation graph and provides significant acceleration on a number of deep neural architectures. The model is easy to implement as it can be formulated as a separate dropout-like layer.
Journal Article•10.1093/SYSBIO/SYW119•
Efficient Bayesian species tree inference under the multispecies coalescent.

[...]

Bruce Rannala1, Ziheng Yang2•
University of California, Davis1, University College London2
01 Sep 2017-Systematic Biology
TL;DR: In this paper, a Bayesian method for inferring the species phylogeny under the multispecies coalescent (MSC) model was developed, which integrates over gene trees, naturally taking account of the uncertainty of gene tree topology and branch lengths given the sequence data.
Abstract: We develop a Bayesian method for inferring the species phylogeny under the multispecies coalescent (MSC) model. To improve the mixing properties of the Markov chain Monte Carlo (MCMC) algorithm that traverses the space of species trees, we implement two efficient MCMC proposals: the first is based on the Subtree Pruning and Regrafting (SPR) algorithm and the second is based on a node-slider algorithm. Like the Nearest-Neighbor Interchange (NNI) algorithm we implemented previously, both new algorithms propose changes to the species tree, while simultaneously altering the gene trees at multiple genetic loci to automatically avoid conflicts with the newly proposed species tree. The method integrates over gene trees, naturally taking account of the uncertainty of gene tree topology and branch lengths given the sequence data. A simulation study was performed to examine the statistical properties of the new method. The method was found to show excellent statistical performance, inferring the correct species tree with near certainty when 10 loci were included in the dataset. The prior on species trees has some impact, particularly for small numbers of loci. We analyzed several previously published datasets (both real and simulated) for rattlesnakes and Philippine shrews, in comparison with alternative methods. The results suggest that the Bayesian coalescent-based method is statistically more efficient than heuristic methods based on summary statistics, and that our implementation is computationally more efficient than alternative full-likelihood methods under the MSC. Parameter estimates for the rattlesnake data suggest drastically different evolutionary dynamics between the nuclear and mitochondrial loci, even though they support largely consistent species trees. We discuss the different challenges facing the marginal likelihood calculation and transmodel MCMC as alternative strategies for estimating posterior probabilities for species trees. [Bayes factor; Bayesian inference; MCMC; multispecies coalescent; nodeslider; species tree; SPR.].
Posted Content•
Hierarchical Implicit Models and Likelihood-Free Variational Inference

[...]

Dustin Tran1, Rajesh Ranganath2, David M. Blei1•
Columbia University1, Princeton University2
28 Feb 2017-arXiv: Machine Learning
TL;DR: This article proposed hierarchical implicit models (HIMs) which combine the idea of implicit densities with hierarchical Bayesian modeling, thereby defining models via simulators of data with rich hidden structure.
Abstract: Implicit probabilistic models are a flexible class of models defined by a simulation process for data. They form the basis for theories which encompass our understanding of the physical world. Despite this fundamental nature, the use of implicit models remains limited due to challenges in specifying complex latent structure in them, and in performing inferences in such models with large data sets. In this paper, we first introduce hierarchical implicit models (HIMs). HIMs combine the idea of implicit densities with hierarchical Bayesian modeling, thereby defining models via simulators of data with rich hidden structure. Next, we develop likelihood-free variational inference (LFVI), a scalable variational inference algorithm for HIMs. Key to LFVI is specifying a variational family that is also implicit. This matches the model's flexibility and allows for accurate approximation of the posterior. We demonstrate diverse applications: a large-scale physical simulator for predator-prey populations in ecology; a Bayesian generative adversarial network for discrete data; and a deep implicit model for text generation.
Journal Article•10.1016/J.ELECTACTA.2017.07.050•
Bayesian and Hierarchical Bayesian Based Regularization for Deconvolving the Distribution of Relaxation Times from Electrochemical Impedance Spectroscopy Data

[...]

Mohammed B. Effat1, Francesco Ciucci1•
Hong Kong University of Science and Technology1
01 Sep 2017-Electrochimica Acta
TL;DR: By applying this framework to synthetic experiments and real EIS data, it is shown that one can obtain far more insight than with ridge regression (or Tikhonov regularization) and sample the DRT pdf given the data and the hypotheses on the regularization parameter and weights.
Book Chapter•10.1007/978-3-319-66182-7_70•
Bayesian Image Quality Transfer with CNNs: Exploring Uncertainty in dMRI Super-Resolution

[...]

Ryutaro Tanno1, Daniel E. Worrall1, Aurobrata Ghosh1, Enrico Kaden1, Stamatios N. Sotiropoulos2, Antonio Criminisi3, Daniel C. Alexander1 •
University College London1, University of Oxford2, Microsoft3
10 Sep 2017
TL;DR: In this paper, a per-patch heteroscedastic noise model and parameter uncertainty through approximate Bayesian inference in the form of variational dropout are proposed for 3D super-resolution with convolutional neural networks.
Abstract: In this work, we investigate the value of uncertainty modelling in 3D super-resolution with convolutional neural networks (CNNs). Deep learning has shown success in a plethora of medical image transformation problems, such as super-resolution (SR) and image synthesis. However, the highly ill-posed nature of such problems results in inevitable ambiguity in the learning of networks. We propose to account for intrinsic uncertainty through a per-patch heteroscedastic noise model and for parameter uncertainty through approximate Bayesian inference in the form of variational dropout. We show that the combined benefits of both lead to the state-of-the-art performance SR of diffusion MR brain images in terms of errors compared to ground truth. We further show that the reduced error scores produce tangible benefits in downstream tractography. In addition, the probabilistic nature of the methods naturally confers a mechanism to quantify uncertainty over the super-resolved output. We demonstrate through experiments on both healthy and pathological brains the potential utility of such an uncertainty measure in the risk assessment of the super-resolved images for subsequent clinical use.
Journal Article•10.1214/17-BA1091•
Using stacking to average Bayesian predictive distributions

[...]

Yuling Yao, Aki Vehtari, Daniel Simpson, Andrew Gelman
06 Apr 2017-arXiv: Methodology
TL;DR: This work takes the idea of stacking from the point estimation literature and generalizes to the combination of predictive distributions, extending the utility function to any proper scoring rule, using Pareto smoothed importance sampling to efficiently compute the required leave-one-out posterior distributions and regularization to get more stability.
Abstract: The widely recommended procedure of Bayesian model averaging is flawed in the M-open setting in which the true data-generating process is not one of the candidate models being fit. We take the idea of stacking from the point estimation literature and generalize to the combination of predictive distributions, extending the utility function to any proper scoring rule, using Pareto smoothed importance sampling to efficiently compute the required leave-one-out posterior distributions and regularization to get more stability. We compare stacking of predictive distributions to several alternatives: stacking of means, Bayesian model averaging (BMA), pseudo-BMA using AIC-type weighting, and a variant of pseudo-BMA that is stabilized using the Bayesian bootstrap. Based on simulations and real-data applications, we recommend stacking of predictive distributions, with BB-pseudo-BMA as an approximate alternative when computation cost is an issue.
Journal Article•10.1080/23743603.2017.1326760•
A Bayesian model-averaged meta-analysis of the power pose effect with informed and default priors: the case of felt power

[...]

Quentin Frederik Gronau1, Sara van Erp2, Daniel W. Heck3, Joseph Cesario4, Kai J. Jonas5, Eric-Jan Wagenmakers1 •
University of Amsterdam1, Tilburg University2, University of Mannheim3, Michigan State University4, Maastricht University5
28 Jun 2017
TL;DR: In this article, a Bayesian meta-analysis of six pre-registered studies from this special issue, focusing on the effect of power posing on felt power was presented, and the results showed that participants who adopted expansive body postures reported feeling more powerful, showed an increase in testosterone and a decrease in cortisol, and displayed an increased tolerance for risk.
Abstract: Earlier work found that – compared to participants who adopted constrictive body postures – participants who adopted expansive body postures reported feeling more powerful, showed an increase in testosterone and a decrease in cortisol, and displayed an increased tolerance for risk. However, these power pose effects have recently come under considerable scrutiny. Here, we present a Bayesian meta-analysis of six preregistered studies from this special issue, focusing on the effect of power posing on felt power. Our analysis improves on standard classical meta-analyses in several ways. First and foremost, we considered only preregistered studies, eliminating concerns about publication bias. Second, the Bayesian approach enables us to quantify evidence for both the alternative and the null hypothesis. Third, we use Bayesian model-averaging to account for the uncertainty with respect to the choice for a fixed-effect model or a random-effect model. Fourth, based on a literature review, we obtained an em...
...

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve