Open AccessPosted Content
Deep Exponential Families
TL;DR: This extensive study shows that going beyond one layer improves predictions for DEFs, and demonstrates that DEFs find interesting exploratory structure in large data sets, and give better predictive performance than state-of-the-art models.
read more
Abstract: We describe \textit{deep exponential families} (DEFs), a class of latent variable models that are inspired by the hidden structures used in deep neural networks. DEFs capture a hierarchy of dependencies between latent variables, and are easily generalized to many settings through exponential families. We perform inference using recent "black box" variational inference techniques. We then evaluate various DEFs on text and combine multiple DEFs into a model for pairwise recommendation data. In an extensive study, we show that going beyond one layer improves predictions for DEFs. We demonstrate that DEFs find interesting exploratory structure in large data sets, and give better predictive performance than state-of-the-art models.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Table 1: A summary of all the DEFs we present in terms of their layer distributions, weight distributions, and link functions. 
Figure 4: A fraction of the three layer topic hierarchy of the Science corpus. The top words are shown for each “topic.” The arrows represent hierarchical groupings. We choose top three components at each layer. Similar “topics” are grouped into “super topics.” The two “concepts” share a “super topic.” 
Table 2: Perplexity on held out collection of 1K Science and NYT documents. Lower values are better. The DEF W column indicates the type of prior distribution over the DEF weights, Γ for the gamma prior and N for normal (recall that one layer DEFs consist only of a layer of latent variables, thus we represent their prior with the ∅). 
Figure 1: A fraction of the three layer topic hierarchy on 166K The New York Times articles. The top words are shown for each topic. The arrows represent hierarchical groupings. 
Figure 2: The deep exponential family with V observations. 
Figure 3: Draws from the Poisson (blue) and sparse gamma distribution (orange) with low and high mean. The shape of the sparse gamma is held fixed. Note the high mean shifts the Poisson, while does not shift the sparse gamma. Notice the spike-slab appearance of the sparse gamma distribution.
Citations
Opportunities and obstacles for deep learning in biology and medicine.
Travers Ching,Daniel Himmelstein,Brett K. Beaulieu-Jones,Alexandr A. Kalinin,Brian T. Do,Gregory P. Way,Enrico Ferrero,Paul-Michael Agapow,Michael Zietz,Michael M. Hoffman,Michael M. Hoffman,Wei Xie,Gail L. Rosen,Benjamin J. Lengerich,Johnny Israeli,Jack Lanchantin,Stephen Woloszynek,Anne E. Carpenter,Avanti Shrikumar,Jinbo Xu,Evan M. Cofer,Evan M. Cofer,Christopher A. Lavender,Srinivas C. Turaga,Amr Alexandari,Zhiyong Lu,David J. Harris,Dave DeCaprio,Yanjun Qi,Anshul Kundaje,Yifan Peng,Laura K. Wiley,Marwin H. S. Segler,Simina M. Boca,S. Joshua Swamidass,Austin Huang,Anthony Gitter,Anthony Gitter,Casey S. Greene +38 more
TL;DR: It is found that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art.
2K
•Posted Content
The Blessings of Multiple Causes
Yixin Wang,David M. Blei +1 more
TL;DR: The decon-founder algorithm as mentioned in this paper combines unsupervised machine learning and predictive model checking to perform causal inference in multiple-cause settings, using a latent variable as a substitute for unobserved confounders.
165
•Book
A Brief Introduction to Machine Learning for Engineers
Osvaldo Simeone
- 14 Aug 2018
TL;DR: A Brief Introduction to Machine Learning for Engineers as mentioned in this paper is the entry point to machine learning for students, practitioners, and researchers with an engineering background in probability and linear algebra, providing a basic and compact reference that describes key ideas and principles in simple terms and within a unified treatment.
Neural Models for Documents with Metadata
Dallas Card,Chenhao Tan,Noah A. Smith +2 more
- 01 Jul 2018
TL;DR: The authors proposed a general neural framework based on topic models to enable flexible incorporation of metadata and allow for rapid exploration of alternative models, which achieves strong performance, with a manageable tradeoff between perplexity, coherence, and sparsity.
•Posted Content
Exponential Family Embeddings
TL;DR: On all three applications—neural activity of zebrafish, users' shopping behavior, and movie ratings—the exponential family embedding models are found to be more effective than other types of dimension reduction and better reconstruct held-out data and find interesting qualitative structure.
References
Latent dirichlet allocation
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
A fast learning algorithm for deep belief nets
TL;DR: A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.
Generalized Linear Models
TL;DR: This is the rst book on generalized linear models written by authors not mostly associated with the biological sciences, and it is thoroughly enjoyable to read.
14.7K
Representation Learning: A Review and New Perspectives
TL;DR: Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.
Learning the parts of objects by non-negative matrix factorization
TL;DR: An algorithm for non-negative matrix factorization is demonstrated that is able to learn parts of faces and semantic features of text and is in contrast to other methods that learn holistic, not parts-based, representations.
14.2K