Ricardo Silva — Bharath Sriperumbudur — Carlos Stein — JaeMo Sung — Dougal Sutherland — Zoltan Szabo — Emo Todorov — Sina Tootoonian — Srinivas Turaga — Carl van Vreeswijk — Chu Wei — Max Welling — Vincent Adam - Sanjeevan Ahilan - Misha Ahrens - Laurence Aitchison - Michael Arbel - Arik Azran - David Barrett - Matthew Beal - Tom Bennett - Charles Blundell - Kacper Chwialkowski - Lea Duncker - Lloyd Elliott - Jan Gasthaus - Arthur Guez - Phillipp Hehrmann - Katherine Heller - Quentin Huys - Sofia Jativa - Wittawat Jitkrittum - Alexander Korenberg - Kai Krueger - Balaji Lakshminarayanan - Kevin Wenliang Li - Maria Lomeli - Federico Mancinelli - Loic Matthey - Guy Mayraz - Although my group works on a variety of topics ranging from theoretical, through to methodological and applications, I am personally particularly interested in three overlapping themes: Bayesian nonparametrics and probabilistic learning, large scale machine learning, and deep learning.
These themes are motivated by the phenomenal growth in the quantity, diversity and heterogeneity of data now available. The analysis of such data is crucial to opening doors to new scientific frontiers and future economic growth. In the longer term, the development of general methods that can deal with such data are important testing grounds for general-purpose intelligent computational systems and general artificial intelligence. Firstly, I am interested in developing Bayesian nonparametric models and methodologies.
These are highly flexible models that can accommodate the diversity and heterogeneity of data, and can sidestep model selection, one of the core difficulties of parametric machine learning approaches.
Both aspects of the learning process are derived by optimizing a joint objective function. We show that our approach supports efficient transfer on complex 3D environments, outperforming several related methods.
Moreover, the proposed learning process is more robust and more stable—attributes that are critical in deep reinforcement learning.
Yet, in some cases agents may wish to emphasize the low or high returns regardless of their probability. Borrowing from the economics and control literature, we review the risk-sensitive value function that arises from an exponential utility and illustrate its effects on an example.
We illustrate the benefit of the policy gradients of this objective in Cliffworld. This is a large class of discrete RPMs which encompasses most of the popular discrete RPMs used in Bayesian nonparametrics, such as the Dirichlet process, Pitman-Yor process, the normalized inverse Gaussian process and the normalized generalized Gamma process. Specifically, we introduce a novel and efficient MCMC sampling scheme in an augmented space that has a fixed number of auxiliary variables per iteration.
We apply our sampling scheme to a density estimation and clustering tasks with unidimensional and multidimensional datasets, and compare it against competing MCMC sampling schemes. We also perform exploratory data analysis to understand which census variables are predictive of voting for Trump, voting for Clinton, or not voting for either.
All of our methods are implemented in python and R and are available online for replication. To each node of the network is associated a positive parameter, modeling the sociability of that node. Sociabilities are assumed to evolve over time, and are modeled via a dynamic point process model. The evolution of the sociabilities is described by a tractable time-varying gamma process.
We provide some theoretical insights into the model and apply it to three real world datasets. Driven by the idea of using the kernel to explicitly model user-item similarities, we formulate the GP in a way that allows the incorporation of low-rank matrix factorisation, arriving at our model, the Tucker Gaussian Process TGP.
Consequently, TGP generalises classical Bayesian matrix factorisation models, and goes beyond them to give a natural and elegant method for incorporating side information, giving enhanced predictive performance for CF problems.
Moreover we show that it is a novel model for regression, especially well-suited to grid-structured data and problems where the dependence on covariates is close to being separable. Abstract BibTeX arXiv URL Project: bigbayes Performing exact posterior inference in complex generative models is often difficult or impossible due to an expensive to evaluate or intractable likelihood function.
Approximate Bayesian computation ABC is an inference framework that constructs an approximation to the true likelihood based on the similarity between the observed and simulated data as measured by a predefined set of summary statistics. Although the choice of appropriate problem-specific summary statistics crucially influences the quality of the likelihood approximation and hence also the quality of the posterior sample in ABC, there are only few principled general-purpose approaches to the selection or construction of such summary statistics.
In this paper, we develop a novel framework for this task using kernel-based distribution regression. We model the functional relationship between data distributions and the optimal choice with respect to a loss function of summary statistics using kernel-based distribution regression.
We show that our approach can be implemented in a computationally and statistically efficient way using the random Fourier features framework for large-scale kernel learning. In addition to that, our framework shows superior performance when compared to related methods on toy and real-world problems.
The model is centred on a parametric baseline hazard, and uses a Gaussian process to model variations away from it nonparametrically, as well as dependence on covariates. As opposed to many other methods in survival analysis, our framework does not impose unnecessary constraints in the hazard rate or in the survival function. Furthermore, our model handles left, right and interval censoring mechanisms common in survival analysis. We propose a MCMC algorithm to perform inference and an approximation scheme based on random Fourier features to make computations faster.
We report experimental results on synthetic and real data, showing that our model performs better than competing models such as Cox proportional hazards, ANOVA-DDP and random survival forests. This is undesirable not only because the average size of data sets is growing fast, but also because there is potentially more information in bigger data, implying a greater need for more expressive models that can discover sophisticated structure.
We propose a scalable version of ABCD, to encompass big data within the boundaries of automated statistical modelling. Abstract BibTeX Project: bigbayes Genetic sequence data are well described by hidden Markov models HMMs in which latent states correspond to clusters of similar mutation patterns. Theory from statistical genetics suggests that these HMMs are nonhomogeneous their transition probabilities vary along the chromosome and have large support for self transitions.
We develop a new nonparametric model of genetic sequence data, based on the hierarchical Dirichlet process, which supports these self transitions and nonhomogeneity. Our model provides a parameterization of the genetic process that is more parsimonious than other more general nonparametric models which have previously been applied to population genetics.
In a series of experiments on male X chromosome data from the Thousand Genomes Project and also on data simulated from a population bottleneck we show the benefits of our model over the popular finite model fastPHASE, which can itself be seen as a parametric truncation of our model.
We find that the number of HMM states found by our model is correlated with the time to the most recent common ancestor in population bottlenecks. This work demonstrates the flexibility of Bayesian nonparametrics applied to large and complex genetic data. Many of their distributional properties are known by now but their stick-breaking representation is missing. Here we display such a representation, which will feature dependent stick-breaking weights, and then derive explicit versions for noteworthy special cases of hNRMI.
Posterior characterizations are also discussed. Finally, we devise an algorithm for slice sampling mixture models based on hNRMIs, which relies on the representation we have obtained, and implement it to analyze real data. It is suitable for both batch and online learning, and admits a fast kernel-width-selection procedure as the random features can be re-used efficiently for all kernel widths.
The features are constructed by sampling trees via a Mondrian process [Roy and Teh, ], and we highlight the connection to Mondrian forests [Lakshminarayanan et al. This link provides a new insight into the relationship between kernel methods and random forests. Both the calculation of the acceptance probability and the creation of informed proposals usually require an iteration through the whole data set.
We provide in this article a rigorous mathematical framework for analysing this algorithm. The recently proposed stochastic gradient Langevin dynamics SGLD method circumvents this problem in three ways: it generates proposed moves using only a subset of the data, it skips the Metropolis- Hastings accept-reject step, and it uses sequences of decreasing step sizes. In Teh et al.
However, in practice the SGLD is run for a relatively small number of iterations, and its step size is not decreased to zero. The present article investigates the behaviour of the SGLD with fixed step size. In particular we characterise the asymptotic bias explicitly, along with its dependence on the step size and the variance of the stochastic gradient. On that basis a modified SGLD which removes the asymptotic bias due to the variance of the stochastic gradients up to first order in the step size is derived.
Moreover, we are able to obtain bounds on the finite-time bias, variance and mean squared error MSE. The theory is illustrated with a Gaussian toy model for which the bias and the MSE for the estimation of moments can be obtained explicitly.
For this toy model we study the gain of the SGLD over the standard Euler method in the limit of large data sets. Standard decision forests deliver efficient state-of-the-art predictive performance, but high-quality uncertainty estimates are lacking. Gaussian processes GPs deliver uncertainty estimates, but scaling GPs to large-scale data sets comes at the cost of approximating the uncertainty estimates.
We extend Mondrian forests, first proposed by Lakshminarayanan et al. Using a novel hierarchical Gaussian prior that dovetails with the Mondrian forest framework, we obtain principled uncertainty estimates, while still retaining the computational advantages of decision forests. Through a combination of illustrative examples, real-world large-scale datasets and Bayesian optimization benchmarks, we demonstrate that Mondrian forests outperform approximate GPs on large-scale regression tasks and deliver better-calibrated uncertainty assessments than decision-forest-based methods.
The system relies on an interactive search paradigm where at each round a user is presented with k images and selects the one closest to their ideal target. Two approaches, one based on the Dirichlet distribution and one based the Beta distribution, are used to model the problem motivating an algorithm that trades exploration and exploitation in presenting the images in each round. Experimental results show that the new approach compares favourably with previous work.
A priori, it is unknown which species are present in each region and what are their corresponding frequencies. Species are shared among populations and each species can be present in more than one region with its frequency varying across populations. In this paper we consider the problem of sequentially sampling these populations in order to observe the greatest number of different species.
We adopt a Bayesian nonparametric approach and endow P1, As a consequence of the hierarchical structure, the J unknown discrete probability measures share the same support, that of their common random base measure. Given this prior choice, we propose a sequential rule that, at every time step, given the information available up to that point, selects the population from which to collect the next observation.
Rather than picking the population with the highest posterior estimate of producing a new value, the proposed rule includes a Thompson sampling step to better balance the exploration-exploitation trade-off. We also propose an extension of the algorithm to deal with incidence data, where multiple observations are collected in a time period.
The performance of the proposed algorithms is assessed through a simulation study and compared to three other strategies. Finally, we compare these algorithms using a dataset of species of trees, collected from different plots in South America.
Abstract BibTeX arXiv PDF URL Project: bigbayes The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities.
In this article, we investigate the relationships between the celebrated Good—Turing approach, which is a frequentist nonparametric approach developed in the s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good—Turing estimators.
As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library. User annotations are often noisy, so methods to combine the annotations to produce reliable estimates of the ground truth are necessary.
We claim that considering the existence of clusters of users in this combination step can improve the performance. This is especially important in early stages of crowdsourcing implementations, where the number of annotations is low.
At this stage there is not enough information to accurately estimate the bias introduced by each annotator separately, so we have to resort to models that consider the statistical links among them. In addition, finding these clusters is interesting in itself as knowing the behavior of the pool of annotators allows implementing efficient active learning strategies.
Based on this, we propose in this paper two new fully unsupervised models based on a Chinese restaurant process CRP prior and a hierarchical structure that allows inferring these groups jointly with the ground truth and the properties of the users. Efficient inference algorithms based on Gibbs sampling with auxiliary variables are proposed. Finally, we perform experiments, both on synthetic and real databases, to show the advantages of our models over state-of-the-art algorithms.
We present a novel and compact way of representing the infinite dimensional component of the model such that while explicitly representing this infinite component it has less memory and storage requirements than previous MCMC schemes.
We describe comparative simulation results demonstrating the efficacy of the proposed MCMC algorithm against existing marginal and conditional MCMC samplers.
Abstract BibTeX URL Project: bigbayes Making inference on population structure from genotype data requires to identify the actual subpopulations and assign individuals to these populations. The source populations are assumed to be in Hardy-Weinberg equilibrium, but the allelic frequencies of these populations and even the number of populations present in a sample are unknown.
In this chapter we present a review of some Bayesian parametric and nonparametric models for making inference on population structure, with emphasis on model-based clustering methods. Our aim is to show how recent developments in Bayesian nonparametrics have been usefully exploited in order to introduce natural nonparametric counterparts of some of the most celebrated parametric approaches for inferring population structure.
The algorithm constructs adaptively efficient proposal distributions approximating the local beliefs at each note of the MRF.
This is achieved by considering proposal distributions in the exponential family whose parameters are updated iterately in an Expectation Propagation EP framework. The proposed particle scheme provides consistent estimation of the LBP marginals as the number of particles increases. We demonstrate that it provides more accurate results than the Particle Belief Propagation PBP algorithm of Ihler and McAllester at a fraction of the computational cost and is additionally more robust empirically.
The computational complexity of our algorithm at each iteration is quadratic in the number of particles. We also propose an accelerated implementation with sub-quadratic computational complexity which still provides consistent estimates of the loopy BP marginal distributions and performs almost as well as the original procedure. Our result fills this gap. The Mondrian process is a guillotine-partition-valued stochastic process that possesses an elegant self-consistency property.
The first part of the report uses simple concepts from applied probability to define the Mondrian process and explore its properties. The Mondrian process has been used as the main building block of a clever online random forest classification algorithm that turns out to be equivalent to its batch counterpart. We outline a slight adaptation of this algorithm to regression, as the remainder of the report uses regression as a case study of how Mondrian processes can be utilized in machine learning.
In particular, the Mondrian process will be used to construct a fast approximation to the computationally expensive kernel ridge regression problem with a Laplace kernel. The complexity of random guillotine partitions generated by a Mondrian process and hence the complexity of the resulting regression models is controlled by a lifetime hyperparameter.
It turns out that these models can be efficiently trained and evaluated for all lifetimes in a given range at once, without needing to retrain them from scratch for each lifetime value. This leads to an efficient procedure for determining the right model complexity for a dataset at hand. The limitation of having a single lifetime hyperparameter will motivate the final Mondrian grid model, in which each input dimension is endowed with its own lifetime parameter.
In this model we preserve the property that its hyperparameters can be tweaked without needing to retrain the modified model from scratch.
One class of such objects, known as Indian buffet processes, has become a popular tool in machine learning. Based on an equivalence between Indian buffet and scale-invariant Poisson processes, we identify a random scaling variable whose role is similar to that played in exchangeable partition models by the total mass of a random measure.
Analogous to the construction of exchangeable partitions from normalized subordinators, random families of sets can be constructed from randomly scaled subordinators. Coupling to a heavy-tailed scaling variable induces a power law on the number of sets containing the first n elements. Several examples, with properties desirable in applications, are derived explicitly.
A relationship to exchangeable partitions is made precise as a correspondence between scaled subordinators and Poisson-Kingman measures, generalizing a result of Arratia, Barbour and Tavare on scale-invariant processes. Given multilocus genotype data from a sample of individuals, the model allows inferring classifying individuals as unadmixed or admixed, inferring the number of subpopulations ancestral to an admixed population and the population of origin of chromosomal regions.
Our model does not assume any specific mutation process and can be applied to most of the commonly used genetic markers. We present a MCMC algorithm to perform posterior inference from the model and discuss methods to summarise the MCMC output for the analysis of population admixture. We demonstrate the performance of the proposed model in simulations and in a real application, using genetic data from the EDAR gene, which is considered to be ancestry-informative due to well-known variations in allele frequency as well as phenotypic effects across ancestry.
In application domains, such as bioinformatics, where there is also demand for probabilistic predictions with measures of uncertainty, the Bayesian additive regression trees BART model, introduced by Chipman et al. As data sets have grown in size, however, the standard Metropolis—Hastings algorithms used to per- form inference in BART are proving inadequate. In particular, these Markov chains make local changes to the trees and suffer from slow mixing when the data are high- dimensional or the best-fitting trees are more than a few layers deep.
Rather than making local changes to individual trees, the PG sampler proposes a complete tree to fit the residual. Experiments show that the PG sampler outperforms existing samplers in many settings. We assume that the dataset is partitioned and stored across nodes of a cluster. Our procedure involves an independent MCMC posterior sampler at each node based on its local partition of the data.
Moment statistics of the local posteriors are collected from each sampler and propagated across the cluster using expectation propagation message passing with low communication costs. The moment sharing scheme improves posterior estimation quality by enforcing agreement among the samplers.
We demonstrate the speed and inference quality of our method with empirical studies on Bayesian logistic regression and sparse linear regression with a spike-and-slab prior. Abstract BibTeX URL Project: bigbayes We investigate the use of a large class of discrete random probability measures, which is referred to as the class Q, in the context of Bayesian nonparametric mixture modeling.
The class Q encompasses both the the two-parameter Poisson—Dirichlet process and the normalized generalized Gamma process, thus allowing us to comparatively study the inferential advantages of these two well-known nonparametric priors. Apart from a highly flexible parameterization, the distinguishing feature of the class Q is the availability of a tractable posterior distribution.
This feature, in turn, leads to derive an efficient marginal MCMC algorithm for posterior sampling within the framework of mixture models. We demonstrate the efficacy of our modeling framework on both one-dimensional and multi-dimensional datasets. Markov chain Monte Carlo based on Gibbs sampling and split-merge moves are widely used for inference in these models. However, both methods are restricted to limited types of transitions and suffer from torpid mixing and low accept rates even for problems of modest size.
We propose a method that considers a broader range of transitions that are close to equilibrium by exploiting multiple chains in parallel and using the past states adaptively to inform the proposal distribution. The method significantly improves on Gibbs and split-merge sampling as quantified using convergence diagnostics and acceptance rates.
Adaptive MCMC methods which use past states to inform the proposal distribution has given rise to many ingenious sampling schemes for continuous problems and the present work can be seen as an important first step in bringing these benefits to partition-based problems.
This class includes as special cases most of the discrete priors commonly used in Bayesian nonparametrics, such as the two parameter Poisson-Dirichlet process and the normalized generalized Gamma process. The particle cascade is an asynchronous, anytime alternative to traditional sequential Monte Carlo algorithms that is amenable to parallel and distributed implementations.
It uses no barrier synchronizations which leads to improved particle throughput and memory efficiency. It is an anytime algorithm in the sense that it can be run forever to emit an unbounded number of particles while keeping within a fixed memory budget.
We prove that the particle cascade provides an unbiased marginal likelihood estimator which can be straightforwardly plugged into existing pseudo-marginal methods. We start by developing a Bayesian nonparametric extension of the popular Plackett—Luce choice model that can handle an infinite number of choice items. Our framework is based on the theory of random atomic measures, with the prior specified by a completely random measure. We characterise the posterior distribution given data, and derive a simple and effective Gibbs sampler for posterior simulation.
We then develop a Dirichlet process mixture extension of our model and apply it to investigate the clustering of preferences for college degree programmes amongst Irish secondary school graduates. The existence of clusters of applicants who have similar preferences for degree programmes is established and we determine that subject matter and geographical location of the third level institution characterise these clusters.
Making use of some recent posterior characterizations for the class of normalized random measures, we propose novel Markov chain Monte Carlo methods of both marginal type and conditional type. For both the marginal and conditional methods, we consider as a running example a mixture model with an underlying normalized generalized Gamma process prior, and describe comparative simulation results demonstrating the efficacies of the proposed methods.
Abstract BibTeX PDF Software Decision tree learning is a popular approach for classification and regression in machine learning and statistics, and Bayesian formulations—which introduce a prior distribution over decision trees, and formulate learning as posterior inference given data—have been shown to produce competitive performance. Unlike classic decision tree learning algorithms like ID3, C4.
We present a sequential Monte Carlo SMC algorithm that instead works in a top-down manner, mimicking the behavior and speed of classic algorithms. We demonstrate empirically that our approach delivers accuracy comparable to the most popular MCMC method, but operates more than an order of magnitude faster, and thus represents a better computation-accuracy tradeoff.
Our model is a tree-structured mixture of potentially exponentially many stochastic blockmodels. We describe a family of greedy agglomerative model selection algorithms whose worst case scales quadratically in the number of vertices of the network, but independent of the number of communities. Our algorithms are two orders of magnitude faster than the infinite relational model, achieving comparable or better accuracy.
In this paper, we tackle the problem of simulating from the posterior distribution over paths in these models, given partial and noisy observations. Our approach is an auxiliary variable Gibbs sampler, and is based on the idea of uniformization. This sets up a Markov chain over paths by alternately sampling a finite set of virtual jump times given the current path, and then sampling a new path given the set of extant and virtual jump times.
The first step involves simulating a piecewise-constant inhomogeneous Poisson process, while for the second, we use a standard hidden Markov model forward filtering-backward sampling algorithm.
Our method is exact and does not involve approximations like time- discretization. We demonstrate how our sampler extends naturally to MJP-based models like Markov-modulated Poisson processes and continuous-time Bayesian networks, and show significant computational benefits over state-of-the-art MCMC samplers for these models. We apply this method to latent Dirichlet allocation in an online setting, and demonstrate that it achieves substantial performance improvements to the state of the art online variational Bayesian methods.
To exploit invariances, most existing methods resort to approximations that either lead to expensive optimization problems such as semi-definite programming, or rely on separation oracles to retain tractability. Some methods further limit the space of functions and settle for non-convex models.
In this paper, we propose a framework for learning in reproducing kernel Hilbert spaces RKHS using local invariances that explicitly characterize the behavior of the target function around data instances. Based on a representer theorem that we establish, our formulation can be efficiently optimized via a convex program.
For the representer theorem to hold, the linear functionals are required to be bounded in the RKHS, and we show that this is true for a variety of commonly used RKHS and invariances. Experiments on learning with unlabeled data and transform invariances show that the proposed method yields better or similar results compared with the state of the art.
Abstract BibTeX arXiv PDF A popular approach for large scale data annotation tasks is crowdsourcing, wherein each data point is labeled by multiple noisy annotators. We consider the problem of inferring ground truth from noisy ordinal labels obtained from multiple annotators of varying and unknown expertise levels. We propose a new model for crowdsourced ordinal data that accounts for instance difficulty as well as annotator expertise, and derive a variational Bayesian inference algorithm for parameter estimation.
Our framework is based on the theory of random atomic measures, with the prior specified by a gamma process. We derive a posterior characterization and a simple and effective Gibbs sampler for posterior simulation. We then develop a time-varying extension of our model, and apply our model to the New York Times lists of weekly bestselling books.
Although simple and effective, it is also wasteful, unnatural and rigidly hardwired. We propose strategies to search for objects which intelligently explore the space of windows by making sequential observations at locations decided based on previous observations. Our strategies adapt to the class being searched and to the content of a particular test image. Their driving force is exploiting context as the statistical relation between the appearance of a window and its location relative to the object, as observed in the training set.
In addition to being more elegant than sliding windows, we demonstrate experimentally on the PASCAL VOC dataset that our strategies evaluate two orders of magnitude fewer windows while at the same time achieving higher detection accuracy. We construct an exact MCMC sampler for such systems by alternately sampling a random discretization of time given a trajectory of the system, and then a new trajectory given the discretization. The first step can be performed efficiently using properties of the Poisson process, while the second step can avail of discrete-time MCMC techniques based on the forward-backward algorithm.
We compare our approach to particle MCMC and a uniformization-based sampler, and show its advantages. Unfortunately, the adoption of NPLMs is held back by their notoriously long training times, which can be measured in weeks even for moderately-sized datasets.
These are a consequence of the models being explicitly normalized, which leads to having to consider all words in the vocabulary when computing the log-likelihood gradients.
We propose a fast and simple algorithm for training NPLMs based on noise-contrastive estimation, a newly introduced procedure for estimating unnormalized continuous distributions.
We investigate the behaviour of the algorithm on the Penn Treebank corpus and show that it reduces the training times by more than an order of magnitude without affecting the quality of the resulting models. The algorithm is also more efficient and much more stable than importance sampling because it requires far fewer noise samples to perform well. We demonstrate the scalability of the proposed approach by training several neural language models on a 47M-word corpus with a 80K-word vocabulary, obtaining state-of-the-art results in the Microsoft Research Sentence Completion Challenge.
We parametrize policies using energy-based models particularly restricted Boltzmann machines , and train them using policy gradient learning. Our approach builds upon Sallans and Hinton , who parameterized value functions using energy-based models, trained using a non-linear variant of temporal-difference TD learning.
Unfortunately, non-linear TD is known to diverge in theory and practice. We introduce the first sound and efficient algorithm for training energy-based policies, based on an actor-critic architecture.
Our algorithm is computationally efficient, converges close to a local optimum, and outperforms Sallans and Hinton in several high dimensional domains. The partitions at consecutive locations in the genome are related by their clusters first splitting and then merging. Our model can be thought of as a discrete time analogue of continuous time fragmentation-coagulation processes [Teh et al ], preserving the important properties of projectivity, exchangeability and reversibility, while being more scalable.
We apply this model to the problem of genotype imputation, showing improved computational efficiency while maintaining the same accuracies as in [Teh et al ]. Modulated renewal processes allow these distributions to vary with time, allowing the introduction nonstationarity.
In this work, we take a nonparametric Bayesian approach, modeling this nonstationarity with a Gaussian process. Our approach is based on the idea of uniformization, allowing us to draw exact samples from an otherwise intractable distribution. We develop a novel and efficient MCMC sampler for posterior inference. In our experiments, we test these on a number of synthetic and real datasets. In this paper, we tackle the problem of inferring unobserved paths in these models by introducing a fast auxiliary variable Gibbs sampler.
Our approach is based on the idea of uniformization, and sets up a Markov chain over paths by sampling a finite set of virtual jump times and then running a standard hidden Markov model forward filtering-backward sampling algorithm over states at the set of extant and virtual jump times.
We demonstrate significant computational benefits over a state-of-the-art Gibbs sampler on a number of continuous time Bayesian networks. Abstract BibTeX URL We describe a method for generating independent samples from univariate density functions using adaptive rejection sampling without the log-concavity requirement.
The method makes use of the fact that many functions can be expressed as a sum of concave and convex functions. Using a concave-convex decomposition, we bound the log-density by separately bounding the concave and convex parts using piecewise linear functions. The upper bound can then be used as the proposal distribution in rejection sampling. We demonstrate the applicability of the concave-convex approach on a number of standard distributions and describe an application to the efficient construction of sequential Monte Carlo proposal distributions for inference over genealogical trees.
0コメント