TL;DR: In this paper, the 100th anniversary of Galton's first discussion of regression and correlation is celebrated, and 13 different formulas representing a different computational and conceptual definition of Pearson's r are presented.
Abstract: In 1885, Sir Francis Galton first defined the term “regression” and completed the theory of bivariate correlation. A decade later, Karl Pearson developed the index that we still use to measure correlation, Pearson's r. Our article is written in recognition of the 100th anniversary of Galton's first discussion of regression and correlation. We begin with a brief history. Then we present 13 different formulas, each of which represents a different computational and conceptual definition of r. Each formula suggests a different way of thinking about this index, from algebraic, geometric, and trigonometric settings. We show that Pearson's r (or simple functions of r) may variously be thought of as a special type of mean, a special type of variance, the ratio of two means, the ratio of two variances, the slope of a line, the cosine of an angle, and the tangent to an ellipse, and may be looked at from several other interesting perspectives.
TL;DR: In this paper, a statistical biologist attribntes the correlation between two functions like u and v to organic relationship and the particular case that is likely to occur is when u and V are indices with the same denominator for the correlation of indices seems at first sight a very plausible measure of organic correlation.
Abstract: (1) If the ratio of two absolute measurements on the same or different organs be taken it is convenient to term this ratio an index. If u = f1(x, y) and v = f2(x, y) be two functions of the three variables x, y, z, and these variables be selected at random so that there exists no correlation between x, y, y, z, or z, x, there will still be found to exist correlation between u and v. Thus a real danger arises when a statistical biologist attribntes the correlation between two functions like u and v to organic relationship. The particular case that is likely to occur is when u and v are indices with the same denominator for the correlation of indices seems at first sight a very plausible measure of organic correlation.
TL;DR: In this paper, it is shown that using a single ratio as either the dependent or one of the independent variables in a multiple-regression analysis can lead to incorrect or misleading inferences.
Abstract: Spurious correlation refers to the correlation between indices that have a common component. A «per ratio» standard is based on a biological measurement adjusted for some physical measurement by division. Renowned statisticians and biologists (Pearson, Neyman and Tanner) have warned about the problems in interpretation that ratios cause. This warning has been largely ignored. The consequences of using a single ratio as either the dependent or one of the independent variables in a multiple-regression analysis are described. It is shown that the use of ratios in regression analyses can lead to incorrect or misleading inferences. A recommendation is made that the use of ratios in regression analyses be avoided
TL;DR: In this article, the authors apply a difference-in-differences approach to circumvent the problem of spurious correlation between good reviews and high demand, using the timing of the reviews by two popular movie critics, Siskel and Ebert, relative to opening weekend box office revenue.
Abstract: An inherent problem in measuring the influence of expert reviews on the demand for experience goods is that a correlation between good reviews and high demand may be spurious, induced by an underlying correlation with unobservable quality signals. Using the timing of the reviews by two popular movie critics, Siskel and Ebert, relative to opening weekend box office revenue, we apply a difference-in-differences approach to circumvent the problem of spurious correlation. After purging the spurious correlation, the measured influence effect is smaller though still detectable. Positive reviews have a particularly large influence on the demand for dramas and narrowly-released movies.