Conditional independence

Topic Tools

Papers published on a yearly basis

1 / 2

Papers

Journal Article•10.1162/089976602760128018•

Training products of experts by minimizing contrastive divergence

[...]

Geoffrey E. Hinton¹•Institutions (1)

University College London¹

01 Aug 2002-Neural Computation

TL;DR: A product of experts (PoE) is an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary because it is hard even to approximate the derivatives of the renormalization term in the combination rule.

...read moreread less

Abstract: It is possible to combine multiple latent-variable models of the same data by multiplying their probability distributions together and then renormalizing. This way of combining individual "expert" models makes it hard to generate samples from the combined model but easy to infer the values of the latent variables of each expert, because the combination rule ensures that the latent variables of different experts are conditionally independent when given the data. A product of experts (PoE) is therefore an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary. Training a PoE by maximizing the likelihood of the data is difficult because it is hard even to approximate the derivatives of the renormalization term in the combination rule. Fortunately, a PoE can be trained using a different objective function called "contrastive divergence" whose derivatives with regard to the parameters can be approximated accurately and efficiently. Examples are presented of contrastive divergence learning using several types of expert on several types of data.

...read moreread less

6,239 citations

Journal Article•10.1080/01621459.1993.10594284•

Approximate inference in generalized linear mixed models

[...]

Norman E. Breslow¹, D. G. Clayton•Institutions (1)

University of Washington¹

01 Mar 1993-Journal of the American Statistical Association

TL;DR: In this paper, generalized linear mixed models (GLMM) are used to estimate the marginal quasi-likelihood for the mean parameters and the conditional variance for the variances, and the dispersion matrix is specified in terms of a rank deficient inverse covariance matrix.

...read moreread less

Abstract: Statistical approaches to overdispersion, correlated errors, shrinkage estimation, and smoothing of regression relationships may be encompassed within the framework of the generalized linear mixed model (GLMM). Given an unobserved vector of random effects, observations are assumed to be conditionally independent with means that depend on the linear predictor through a specified link function and conditional variances that are specified by a variance function, known prior weights and a scale factor. The random effects are assumed to be normally distributed with mean zero and dispersion matrix depending on unknown variance components. For problems involving time series, spatial aggregation and smoothing, the dispersion may be specified in terms of a rank deficient inverse covariance matrix. Approximation of the marginal quasi-likelihood using Laplace's method leads eventually to estimating equations based on penalized quasilikelihood or PQL for the mean parameters and pseudo-likelihood for the variances. Im...

...read moreread less

4,657 citations

Journal Article•10.1023/A:1022623210503•

Learning Bayesian Networks: The Combination of Knowledge and Statistical Data

[...]

David Heckerman¹, Dan Geiger¹, David Maxwell Chickering¹•Institutions (1)

Microsoft¹

15 Sep 1995-Machine Learning

TL;DR: In this article, a Bayesian approach for learning Bayesian networks from a combination of prior knowledge and statistical data is presented, which is derived from a set of assumptions made previously as well as the assumption of likelihood equivalence, which says that data should not help to discriminate network structures that represent the same assertions of conditional independence.

...read moreread less

Abstract: We describe a Bayesian approach for learning Bayesian networks from a combination of prior knowledge and statistical data. First and foremost, we develop a methodology for assessing informative priors needed for learning. Our approach is derived from a set of assumptions made previously as well as the assumption of likelihood equivalence, which says that data should not help to discriminate network structures that represent the same assertions of conditional independence. We show that likelihood equivalence when combined with previously made assumptions implies that the user's priors for network parameters can be encoded in a single Bayesian network for the next case to be seen—a prior network—and a single measure of confidence for that network. Second, using these priors, we show how to compute the relative posterior probabilities of network structures given data. Third, we describe search methods for identifying network structures with high posterior probabilities. We describe polynomial algorithms for finding the highest-scoring network structures in the special case where every node has at most k e 1 parent. For the general case (k > 1), which is NP-hard, we review heuristic search algorithms including local search, iterative local search, and simulated annealing. Finally, we describe a methodology for evaluating Bayesian-network learning algorithms, and apply this approach to a comparison of various approaches.

...read moreread less

4,427 citations

Journal Article•10.1214/009053606000000281•

High-dimensional graphs and variable selection with the Lasso

[...]

Nicolai Meinshausen, Peter Bühlmann

01 Jun 2006-Annals of Statistics

TL;DR: It is shown that neighborhood selection with the Lasso is a computationally attractive alternative to standard covariance selection for sparse high-dimensional graphs and is hence equivalent to variable selection for Gaussian linear models.

...read moreread less

Abstract: The pattern of zero entries in the inverse covariance matrix of a multivariate normal distribution corresponds to conditional independence restrictions between variables. Covariance selection aims at estimating those structural zeros from data. We show that neighborhood selection with the Lasso is a computationally attractive alternative to standard covariance selection for sparse high-dimensional graphs. Neighborhood selection estimates the conditional independence restrictions separately for each node in the graph and is hence equivalent to variable selection for Gaussian linear models. We show that the proposed neighborhood selection scheme is consistent for sparse high-dimensional graphs. Consistency hinges on the choice of the penalty parameter. The oracle value for optimal prediction does not lead to a consistent neighborhood estimate. Controlling instead the probability of falsely joining some distinct connectivity components of the graph, consistent estimation for sparse graphs is achieved (with exponential rates), even when the number of variables grows as the number of observations raised to an arbitrary power.

...read moreread less

4,258 citations

Journal Article•10.1214/009053606000000281•

High-dimensional graphs and variable selection with the Lasso

[...]

Nicolai Meinshausen, Peter Bühlmann

01 Aug 2006-arXiv: Statistics Theory

TL;DR: In this article, neighborhood selection with the Lasso is proposed as a computationally attractive alternative to standard covariance selection for sparse high-dimensional graphs, which is equivalent to variable selection for Gaussian linear models.

...read moreread less

3,063 citations

...

Expand

Year	Papers
2026	1
2025	29
2024	70
2023	184
2022	288
2021	176

Topic Tools

Papers published on a yearly basis

Papers

Training products of experts by minimizing contrastive divergence

Approximate inference in generalized linear mixed models

Learning Bayesian Networks: The Combination of Knowledge and Statistical Data

High-dimensional graphs and variable selection with the Lasso

High-dimensional graphs and variable selection with the Lasso

Related Topics (5)

Performance Metrics