TL;DR: This paper provides an illustrative simulation to demonstrate how a simple model becomes adversely affected by small numbers of clusters and outlines methodological topics that have yet to be addressed in the literature on multilevel models with a small number of clusters.
Abstract: Multilevel models are an increasingly popular method to analyze data that originate from a clustered or hierarchical structure. To effectively utilize multilevel models, one must have an adequately large number of clusters; otherwise, some model parameters will be estimated with bias. The goals for this paper are to (1) raise awareness of the problems associated with a small number of clusters, (2) review previous studies on multilevel models with a small number of clusters, (3) to provide an illustrative simulation to demonstrate how a simple model becomes adversely affected by small numbers of clusters, (4) to provide researchers with remedies if they encounter clustered data with a small number of clusters, and (5) to outline methodological topics that have yet to be addressed in the literature.
TL;DR: The results suggest that the development of numerical estimation is built on a logarithmic coding of numbers--the hallmark of the approximate number system--and is subsequently shaped by the acquisition of cultural practices with numbers.
Abstract: Children’s sense of numbers before formal education is thought to rely on an approximate number system based on logarithmically compressed analog magnitudes that increases in resolution throughout childhood. School-age children performing a numerical estimation task have been shown to increasingly rely on a formally appropriate, linear representation and decrease their use of an intuitive, logarithmic one. We investigated the development of numerical estimation in a younger population (3.5- to 6.5-year-olds) using 0–100 and 2 novel sets of 1–10 and 1–20 number lines. Children’s estimates shifted from logarithmic to linear in the small number range, whereas they became more accurate but increasingly logarithmic on the larger interval. Estimation accuracy was correlated with knowledge of Arabic numerals and numerical order. These results suggest that the development of numerical estimation is built on a logarithmic coding of numbers—the hallmark of the approximate number system—and is subsequently shaped by the acquisition of cultural practices with numbers.
TL;DR: The theoretical background, algorithm and validation of a recently developed novel method of ranking based on the sum of ranking differences, called Sum of Ranking differences (SRD) and Comparison of Ranks by Random Numbers (CRNN), respectively are described.
TL;DR: The results show that adaptation towards a fixed optimum is generally characterized by an exponential effects trend, including changes in the distribution of mutational effects as well as in the nature of the character studied.
Abstract: It is now clear that the genetic basis of adaptation does not resemble that assumed by the infinitesimal model. Instead, adaptation often involves a modest number of factors of large effect and a greater number of factors of smaller effect. After reviewing relevant experimental studies, I consider recent theoretical attempts to predict the genetic architecture of adaptation from first principles. In particular, I review the history of work on Fisher's geometric model of adaptation, including recent studies which suggest that adaptation should be characterized by exponential distributions of gene effects. I also present the results of new simulation studies that test the robustness of this finding. I explore the effects of changes in the distribution of mutational effects (absolute versus relative) as well as in the nature of the character studied (total phenotypic effect versus single characters). The results show that adaptation towards a fixed optimum is generally characterized by an exponential effects trend.The situation to which these studies point is not one of a large number of genes all with more or less equal effect. It seems, rather, that a small number of genes with large effects are responsible for most of the response, the remainder of the response being due to a larger number of loci with small effects.D. S. Falconer (1981)
TL;DR: Although the gamma model often provides a good parametric model for this type of data, rate estimates from an equal-probability discrete gamma model with a small number of categories will tend to underestimate the largest rates, an alternative implementation of the gamma distribution is proposed that is computationally more efficient during optimization and can provide more accurate estimates of site rates.
Abstract: Previous work has shown that it is often essential to account for the variation in rates at different sites in phylogenetic models in order to avoid phylogenetic artifacts such as long branch attraction. In most current models, the gamma distribution is used for the rates-across-sites distributions and is implemented as an equal-probability discrete gamma. In this article, we introduce discrete distribution estimates with large numbers of equally spaced rate categories allowing us to investigate the appropriateness of the gamma model. With large numbers of rate categories, these discrete estimates are flexible enough to approximate the shape of almost any distribution. Likelihood ratio statistical tests and a nonparametric bootstrap confidence-bound estimation procedure based on the discrete estimates are presented that can be used to test the fit of a parametric family. We applied the methodology to several different protein data sets, and found that although the gamma model often provides a good parametric model for this type of data, rate estimates from an equal-probability discrete gamma model with a small number of categories will tend to underestimate the largest rates. In cases when the gamma model assumption is in doubt, rate estimates coming from the discrete rate distribution estimate with a large number of rate categories provide a robust alternative to gamma estimates. An alternative implementation of the gamma distribution is proposed that, for equal numbers of rate categories, is computationally more efficient during optimization than the standard gamma implementation and can provide more accurate estimates of site rates.