TL;DR: A slightly more complex rule-of thumb is introduced that estimates minimum sample size as function of effect size as well as the number of predictors and it is argued that researchers should use methods to determine sample size that incorporate effect size.
Abstract: Numerous rules-of-thumb have been suggested for determining the minimum number of subjects required to conduct multiple regression analyses. These rules-of-thumb are evaluated by comparing their results against those based on power analyses for tests of hypotheses of multiple and partial correlations. The results did not support the use of rules-of-thumb that simply specify some constant (e.g., 100 subjects) as the minimum number of subjects or a minimum ratio of number of subjects (N) to number of predictors (m). Some support was obtained for a rule-of-thumb that N ≥ 50 + 8 m for the multiple correlation and N ≥104 + m for the partial correlation. However, the rule-of-thumb for the multiple correlation yields values too large for N when m ≥ 7, and both rules-of-thumb assume all studies have a medium-size relationship between criterion and predictors. Accordingly, a slightly more complex rule-of thumb is introduced that estimates minimum sample size as function of effect size as well as the number of predictors. It is argued that researchers should use methods to determine sample size that incorporate effect size.
TL;DR: Set correlation is a realization of the general multi variate linear model, can be viewed as a multivariate generalization of multiple correlation analysis, and may be employed in the analysis of m... as mentioned in this paper.
Abstract: Set correlation is a realization of the general multi variate linear model, can be viewed as a multivariate generalization of multiple correlation analysis, and may be employed in the analysis of m...
TL;DR: This paper discusses the uses of the correlation coefficient r, either as a way to infer correlation, or to test linearity, and recommends the use of z Fisher transformation instead of r values because r is not normally distributed but z is (at least in approximation).
Abstract: Correlation and regression are different, but not mutually exclusive, techniques. Roughly, regression is used for prediction (which does not extrapolate beyond the data used in the analysis) whereas correlation is used to determine the degree of association. There situations in which the x variable is not fixed or readily chosen by the experimenter, but instead is a random covariate to the y variable. This paper shows the relationships between the coefficient of determination, the multiple correlation coefficient, the covariance, the correlation coefficient and the coefficient of alienation, for the case of two related variables x and y. It discusses the uses of the correlation coefficient r, either as a way to infer correlation, or to test linearity. A number of graphical examples are provided as well as examples of actual chemical applications. The paper recommends the use of z Fisher transformation instead of r values because r is not normally distributed but z is (at least in approximation). For eithe...
TL;DR: The onion method is explained in terms of elliptical distributions and extended to allow generating random correlation matrices from the same joint distribution as the vine method to study the relationship between the multiple correlation and partial correlations on a regular vine.
TL;DR: It is shown that several of the rejection methods, of differing types, each discard precisely those variables known to be redundant, for all but a few sets of data.
Abstract: Often, results obtained from the use of principal component analysis are little changed if some of the variables involved are discarded beforehand. This paper examines some of the possible methods for deciding which variables to reject and these rejection methods are tested on artificial data containing variables known to be “redundant”. It is shown that several of the rejection methods, of differing types, each discard precisely those variables known to be redundant, for all but a few sets of data.