About: Conditional expectation is a research topic. Over the lifetime, 3762 publications have been published within this topic receiving 130325 citations.
TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Abstract: The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give partitions which are reasonably efficient in the sense of within-class variance. That is, if p is the probability mass function for the population, S = {S1, S2, * *, Sk} is a partition of EN, and ui, i = 1, 2, * , k, is the conditional mean of p over the set Si, then W2(S) = ff=ISi f z u42 dp(z) tends to be low for the partitions S generated by the method. We say 'tends to be low,' primarily because of intuitive considerations, corroborated to some extent by mathematical analysis and practical computational experience. Also, the k-means procedure is easily programmed and is computationally economical, so that it is feasible to process very large samples on a digital computer. Possible applications include methods for similarity grouping, nonlinear prediction, approximating multivariate distributions, and nonparametric tests for independence among several variables. In addition to suggesting practical classification methods, the study of k-means has proved to be theoretically interesting. The k-means concept represents a generalization of the ordinary sample mean, and one is naturally led to study the pertinent asymptotic behavior, the object being to establish some sort of law of large numbers for the k-means. This problem is sufficiently interesting, in fact, for us to devote a good portion of this paper to it. The k-means are defined in section 2.1, and the main results which have been obtained on the asymptotic behavior are given there. The rest of section 2 is devoted to the proofs of these results. Section 3 describes several specific possible applications, and reports some preliminary results from computer experiments conducted to explore the possibilities inherent in the k-means idea. The extension to general metric spaces is indicated briefly in section 4. The original point of departure for the work described here was a series of problems in optimal classification (MacQueen [9]) which represented special
TL;DR: In this paper, the authors study the properties of the quasi-maximum likelihood estimator and related test statistics in dynamic models that jointly parameterize conditional means and conditional covariances, when a normal log-likelihood is maximized but the assumption of normality is violated.
Abstract: We study the properties of the quasi-maximum likelihood estimator (QMLE) and related test statistics in dynamic models that jointly parameterize conditional means and conditional covariances, when a normal log-likelihood os maximized but the assumption of normality is violated. Because the score of the normal log-likelihood has the martingale difference property when the forst two conditional moments are correctly specified, the QMLE is generally Consistent and has a limiting normal destribution. We provide easily computable formulas for asymptotic standard errors that are valid under nonnormality. Further, we show how robust LM tests for the adequacy of the jointly parameterized mean and variance can be computed from simple auxiliary regressions. An appealing feature of these robyst inference procedures is that only first derivatives of the conditional mean and variance functions are needed. A monte Carlo study indicates that the asymptotic results carry over to finite samples. Estimation of several AR a...
TL;DR: In this paper, a test of manipulation related to continuity of the running variable density function was developed for popular elections to the House of Representatives and roll call voting in the House, where sorting is neither expected nor found.
TL;DR: In this paper, a new class of semiparametric estimators, based on inverse probability weighted estimating equations, were proposed for parameter vector α 0 of the conditional mean model when the data are missing at random in the sense of Rubin and the missingness probabilities are either known or can be parametrically modeled.
Abstract: In applied problems it is common to specify a model for the conditional mean of a response given a set of regressors. A subset of the regressors may be missing for some study subjects either by design or happenstance. In this article we propose a new class of semiparametric estimators, based on inverse probability weighted estimating equations, that are consistent for parameter vector α0 of the conditional mean model when the data are missing at random in the sense of Rubin and the missingness probabilities are either known or can be parametrically modeled. We show that the asymptotic variance of the optimal estimator in our class attains the semiparametric variance bound for the model by first showing that our estimation problem is a special case of the general problem of parameter estimation in an arbitrary semiparametric model in which the data are missing at random and the probability of observing complete data is bounded away from 0, and then deriving a representation for the efficient score...
TL;DR: In this paper, a test of manipulation related to continuity of the running variable density function was developed for popular elections to the House of Representatives and roll-call voting in the House, where sorting is neither expected nor found.
Abstract: Standard sufficient conditions for identification in the regression discontinuity design are continuity of the conditional expectation of counterfactual outcomes in the running variable. These continuity assumptions may not be plausible if agents are able to manipulate the running variable. This paper develops a test of manipulation related to continuity of the running variable density function. The methodology is applied to popular elections to the House of Representatives, where sorting is neither expected nor found, and to roll-call voting in the House, where sorting is both expected and found.