About: Circular analysis is a research topic. Over the lifetime, 5 publications have been published within this topic receiving 2334 citations. The topic is also known as: double dipping.
TL;DR: It is argued that systems neuroscience needs to adjust some widespread practices to avoid the circularity that can arise from selection, and 'double dipping' the use of the same dataset for selection and selective analysis is suggested.
Abstract: A neuroscientific experiment typically generates a large amount of data, of which only a small fraction is analyzed in detail and presented in a publication. However, selection among noisy measurements can render circular an otherwise appropriate analysis and invalidate results. Here we argue that systems neuroscience needs to adjust some widespread practices to avoid the circularity that can arise from selection. In particular, ‘double dipping’, the use of the same dataset for selection and selective analysis, will give distorted descriptive statistics and invalid statistical inference whenever the results statistics are not inherently independent of the selection criteria under the null hypothesis. To demonstrate the problem, we apply widely used analyses to noise data known to not contain the experimental effects in question. Spurious effects can appear in the context of both univariate activation analysis and multivariate pattern-information analysis. We suggest a policy for avoiding circularity.
TL;DR: High classification accuracies in neuroimaging studies of ADHD appear to be inflated by circular analysis and small sample size, implying additional bias contributing to reported accuracies at lower sample sizes.
TL;DR: This work proposes different implementations of the classification-based approach in the case it comprises a variable selection step together with a classification step, and investigates the associated bias for each different implementation.
Abstract: Classification-based approaches for data analysis are provoking wide interest and increasing adoption within the neuroscience community. Topics like "brain decoding", "multi-voxel patternanalysis" and "brain-computer interface" are prominent examples of this trend. The core problem of these investigations is hypothesis testing, i.e., finding evidence of some effect produced by the stimulation protocol within neural correlates. A classification algorithm is trained on the recorded data to learn how to discriminate between different stimuli. Then the misclassification rate of the predictions is estimated to answer the statistical test. This generic classification problem can be implemented in several ways depending on the exact neuroscientific question under investigation. However some implementations produce biased estimates due to circular analysis issues that could invalidate the conclusion of the scientific study. Therefore the most suited implementation of the classification problem must be used in order to avoid biases, to detect weak stimulus-related information within noise and to give the proper answer to the neuroscientific question at hand. In this work we propose different implementations of the classification-based approach in the case it comprises a variable selection step together with a classification step. For each different implementation we investigate the associated bias. Analyses are conducted on synthetic data and MEG data from a covert spatial attention task. The effects of different implementations of the classification algorithm are quantified by means of expected misclassification rate. Results prove the importance of adopting a proper error rate estimation process.
TL;DR: In experiments where a number of individuals are each observed more than once, second-order statistics must be used because twice repeated statistics do not allow valid conclusions in experiments of this type.
Abstract: Improved statistical techniques for the analysis of circular data have increased our awareness of certain invalid procedures in earlier orientation research. Even now, however, the current literature contains many examples of inappropriately analyzed data sets. In experiments where a number of individuals are each observed more than once, second-order statistics must be used. Twice repeated, first-order statistics do not allow valid conclusions in experiments of this type.
TL;DR: It is demonstrated that selection-bias is as relevant for EEG/MEG analysis as for fMRI methods, and sensor-selection must be independent from the contrast analyzed with statistical comparisons, because otherwise a distorted or 'circular' analysis might result.