Abstract: | Scalable Bayes for big data is a rapidly growing research area, but most existing methods are either highly complex to implement for practitioners, or lack theoretical justification for uncertainty quantification. Bayesian methods quantify uncertainty through posterior and predictive distributions. For massive datasets, it is difficult to efficiently estimate summaries of these distributions, such as posterior quantiles and credible intervals. In small scale problems, posterior sampling algorithms such as Markov chain Monte Carlo (MCMC) remain the gold standard, but they face major problems in scaling up to big data. We propose a very simple and general Posterior Interval Estimation (PIE) algorithm to evaluate the posterior distributions of one-dimensional (1-d) functionals, which are typically the focus in many applications. The PIE algorithm consists of three steps. First, full data are partitioned into computationally tractable subsets. Second, sampling algorithms such as MCMC are run in parallel across every subset. Finally, PIE approximates the full posterior by simply averaging posterior quantiles estimated from each subset. This allows standard Bayesian algorithms such as MCMC to be trivially scaled up to big data. We provide strong theoretical guarantees for PIE on its posterior uncertainty quantification, and compare its empirical performance with variational Bayes and the recent WASP algorithm for mixed effects models and nonparametric Bayesian mixture models. |
Date: | 22 February 2016 |
Time: | 1:45pm - 2:45pm |
Speaker: | Dr Cheng LI |
Venue: | Room 14-222, 14/F, Academic 3 |
[ Back ]