About: Multinomial distribution is a research topic. Over the lifetime, 3622 publications have been published within this topic receiving 114213 citations.
TL;DR: A First Course in Probability (8th ed.) by S. Ross is a lively text that covers the basic ideas of probability theory including those needed in statistics.
Abstract: Office hours: MWF, immediately after class or early afternoon (time TBA). We will cover the mathematical foundations of probability theory. The basic terminology and concepts of probability theory include: random experiments, sample or outcome spaces (discrete and continuous case), events and their algebra, probability measures, conditional probability A First Course in Probability (8th ed.) by S. Ross. This is a lively text that covers the basic ideas of probability theory including those needed in statistics. Theoretical concepts are introduced via interesting concrete examples. In 394 I will begin my lectures with the basics of probability theory in Chapter 2. However, your first assignment is to review Chapter 1, which treats elementary counting methods. They are used in applications in Chapter 2. I expect to cover Chapters 2-5 plus portions of 6 and 7. You are encouraged to read ahead. In lectures I will not be able to cover every topic and example in Ross, and conversely, I may cover some topics/examples in lectures that are not treated in Ross. You will be responsible for all material in my lectures, assigned reading, and homework, including supplementary handouts if any.
TL;DR: It is found that the multi-variate Bernoulli performs well with small vocabulary sizes, but that the multinomial performs usually performs even better at larger vocabulary sizes--providing on average a 27% reduction in error over the multi -variateBernoulli model at any vocabulary size.
Abstract: Recent work in text classification has used two different first-order probabilistic models for classification, both of which make the naive Bayes assumption. Some use a multi-variate Bernoulli model, that is, a Bayesian Network with no dependencies between words and binary word features (e.g. Larkey and Croft 1996; Koller and Sahami 1997). Others use a multinomial model, that is, a uni-gram language model with integer word counts (e.g. Lewis and Gale 1994; Mitchell 1997). This paper aims to clarify the confusion by describing the differences and details of these two models, and by empirically comparing their classification performance on five text corpora. We find that the multi-variate Bernoulli performs well with small vocabulary sizes, but that the multinomial performs usually performs even better at larger vocabulary sizes--providing on average a 27% reduction in error over the multi-variate Bernoulli model at any vocabulary size.
TL;DR: In this paper, the Gibbs sampler is used to indirectly sample from the multinomial posterior distribution on the set of possible subset choices to identify the promising subsets by their more frequent appearance in the Gibbs sample.
Abstract: A crucial problem in building a multiple regression model is the selection of predictors to include. The main thrust of this article is to propose and develop a procedure that uses probabilistic considerations for selecting promising subsets. This procedure entails embedding the regression setup in a hierarchical normal mixture model where latent variables are used to identify subset choices. In this framework the promising subsets of predictors can be identified as those with higher posterior probability. The computational burden is then alleviated by using the Gibbs sampler to indirectly sample from this multinomial posterior distribution on the set of possible subset choices. Those subsets with higher probability—the promising ones—can then be identified by their more frequent appearance in the Gibbs sample.
TL;DR: In this paper, a simple algorithm is constructed and shown to converge monotonically to yield a maximum likelihood estimate of a distribution function when the data are incomplete due to grouping, censoring and/or truncation.
Abstract: SUMMARY This paper is concerned with the non-parametric estimation of a distribution function F, when the data are incomplete due to grouping, censoring and/or truncation. Using the idea of self-consistency, a simple algorithm is constructed and shown to converge monotonically to yield a maximum likelihood estimate of F. An application to hypothesis testing is indicated.
TL;DR: In this article, the basics of Bayesian analysis are discussed, and a WinBUGS-based approach is presented to get started with WinBUGs, which is based on the SIMPLE model of memory.
Abstract: Part I. Getting Started: 1. The basics of Bayesian analysis 2. Getting started with WinBUGS Part II. Parameter Estimation: 3. Inferences with binomials 4. Inferences with Gaussians 5. Some examples of data analysis 6. Latent mixture models Part III. Model Selection: 7. Bayesian model comparison 8. Comparing Gaussian means 9. Comparing binomial rates Part IV. Case Studies: 10. Memory retention 11. Signal detection theory 12. Psychophysical functions 13. Extrasensory perception 14. Multinomial processing trees 15. The SIMPLE model of memory 16. The BART model of risk taking 17. The GCM model of categorization 18. Heuristic decision-making 19. Number concept development.