Latent tree models for multivariate density estimation: algorithms and applications

Open AccessDissertation

Latent tree models for multivariate density estimation: algorithms and applications

- 01 Jan 2009

8

TL;DR: This thesis proposes two latent tree model learning algorithms specifically for density estimation and proposes a model family called latent tree models for the task of density estimation, which have distinct characteristics and are suitable for different applications.

Abstract: Multivariate density estimation is a fundamental problem in Applied Statistics and Machine Learning. Given a collection of data sampled from an unknown distribution, the task is to approximately reconstruct the generative distribution. There are two different approaches to the problem, the parametric approach and the non-parametric approach. In the parametric approach, the approximate distribution is represented by a model from a predetermined family. In this thesis, we adopt the parametric approach and investigate the use of a model family called latent tree models for the task of density estimation. Latent tree models are tree-structured Bayesian networks in which leaf nodes represent observed variables, while internal nodes represent hidden variables. Such models can represent complex relationships among observed variables, and in the meantime, admit efficient inference among them. Consequently, they are a desirable tool for density estimation. While latent tree models are studied for the first time in this thesis for the purpose of density estimation, they have been investigated earlier for clustering and latent structure discovery. Several algorithms for learning latent tree models have been proposed. The state-of-the-art is an algorithm called EAST. EAST determines model structures through principled and systematic search, and determines model parameters using the EM algorithm. It has been shown to be capable of achieving good trade-off between fit to data and model complexity. It is also capable of discovering latent structures behind data. Unfortunately, it has a high computational complexity, which limits its applicability to density estimation problems. In this thesis, we propose two latent tree model learning algorithms specifically for density estimation. The two algorithms have distinct characteristics and are suitable for different applications. The first algorithm is called HCL. HCL assumes a predetermined bound on model complexity and restricts to binary model structures. It first builds a binary tree structure based on mutual information and then runs the EM algorithm once on the resulting structure to determine the parameters. As such, it is efficient and can deal with large applications. The second algorithm is called Pyramid. Pyramid does not assume predetermined bounds on model complexity and does not restrict to binary tree structures. It builds model structures using heuristics based on mutual information and local search. It is slower than HCL. However, it is faster than EAST and is only slightly inferior to EAST in terms of the quality of the resulting models. In this thesis, we also study two applications of the density estimation techniques that we develop. The first application is to approximate probabilistic inference in Bayesian networks. A Bayesian network represents a joint distribution over a set of random variables. It often happens that the network structure is very complex and making inference directly on the network is computational intractable. We propose to approximate the joint distribution using a latent tree model and exploit the latent tree model for faster inference. The idea is to sample data from the Bayesian network, learn a latent tree model from the data offline, and when online, make inference with the latent tree model instead of the original Bayesian network. HCL is used here because the sample size needs to be large to produce accurate approximation and it is possible to predetermine a bound on the online running. Empirical evidence shows that this method can achieve good approximation accuracy at low online computational cost. The second application is classification. A common approach to this task is to formulate it as a density estimation problem: One constructs the class-conditional density for each class and then uses the Bayes rule to make classification. We propose to estimate those class-conditional densities using either EAST or Pyramid. Empiricalevidence shows that this method yields good classification performances. Moreover, the latent tree models built for the class-conditional densities are often meaningful, which is conducive to user confidence. A comparison between EAST and Pyramid reveals that Pyramid is significantly more efficient than EAST, while it results in more or less the same classification performance as the latter.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1016/J.IJAR.2012.08.001

Model-based clustering of high-dimensional data: Variable selection versus facet determination

Leonard K. M. Poon, +3 more

- 01 Jan 2013

- International Journal of Approximate Rea...

TL;DR: This paper proposes a generalization of the Gaussian mixture models and demonstrates its ability to automatically identify natural facets of data and cluster data along each of those facets simultaneously, to show that facet determination usually leads to better clustering results than variable selection.

...read moreread less

47

•Journal Article•10.1016/J.IJAR.2012.06.024

LTC: A latent tree approach to classification

Yi Wang, +3 more

- 01 Jun 2013

- International Journal of Approximate Rea...

TL;DR: This paper proposes a novel generative classifier called latent tree classifier (LTC), which represents each class-conditional distribution of attributes using a latent tree model, and uses Bayes rule to make prediction.

...read moreread less

10

Proceedings Article•10.1109/IJCNN.2016.7727414

Semi-hierarchical naïve Bayes classifier

Hasna Njah, +2 more

- 24 Jul 2016

TL;DR: A new semi-hierarchical naïve Bayes that uses the latent variables for abstracting the features of a given dataset in order to reduce the dimensionality and is suitable for finding graphically and semantically analyzable models.

...read moreread less

8

Book Chapter•10.1007/978-3-642-22152-1_35

Latent tree classifier

Yi Wang, +3 more

- 29 Jun 2011

TL;DR: This work proposes a novel generative model for classification called latent tree classifier (LTC), which represents each class-conditional distribution of attributes using a latent tree model, and uses Bayes rule to make prediction.

...read moreread less

5

•Journal Article

Latent tree classifier

Yi Wang, +3 more

- 01 Jan 2011

- Lecture Notes in Computer Science

TL;DR: In this article, a generative model called latent tree classifier (LTC) is proposed to represent each class-conditional distribution of attributes using a latent tree model, and uses Bayes rule to make prediction.

...read moreread less

5

References

•Journal Article•10.1109/TAC.1974.1100705

A new look at the statistical model identification

Hirotugu Akaike

- 01 Dec 1974

- IEEE Transactions on Automatic Control

TL;DR: In this article, a new estimate minimum information theoretical criterion estimate (MAICE) is introduced for the purpose of statistical identification, which is free from the ambiguities inherent in the application of conventional hypothesis testing procedure.

...read moreread less

53.1K

•Book

Elements of information theory

Thomas M. Cover, +1 more

- 01 Jan 1991

TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.

...read moreread less

52.2K

•Journal Article•10.1214/AOS/1176344136

Estimating the Dimension of a Model

Gideon Schwarz

- 01 Mar 1978

- Annals of Statistics

TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.

...read moreread less

45K

Estimating the dimension of a model

Gideon Schwarz

- 01 Jan 2005

TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.

...read moreread less

40.6K

•Book

C4.5: Programs for Machine Learning

J. Ross Quinlan

- 15 Oct 1992

TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.

...read moreread less

27.2K