TL;DR: This chapter discusses Accuracy Assessment, which examines the impact of sample design on cost, statistical Validity, and measuring Variability in the context of data collection and analysis.
Abstract: Introduction Why Accuracy Assessment? Overview Historical Review Aerial Photography Digital Assessments Data Collection Considerations Classification Scheme Statistical Considerations Data Distribution Randomness Spatial Autocorrelation Sample Size Sampling Scheme Sample Unit Reference Data Collection Basic Collection Forms Basic Analysis Techniques Non-Site Specific Assessments Site Specific Assessments Area Estimation/Correction Practicals Impact of Sample Design on Cost Recommendations for Collecting Reference Data ASources of Variation in Reference Data Photo Interpretation vs. Ground Visitation Interpreter Variability Observations vs. Measurements What is Correct? Labeling Map vs. Labeling the Reference Data Qualitative vs. Quantitative Analysis Local vs. Regional vs. Global Assessments Advanced Topics Beyond the Error Matrix Modifying the Error Matrix Fuzzy Set Theory Measuring Variability Complex Data Sets Change Detection Multi-Layer Assessments California Hardwood Rangeland Monitoring Project Case Study Balancing Statistical Validity with Practical Reality Bibliography
TL;DR: This work provides practitioners with a set of “good practice” recommendations for designing and implementing an accuracy assessment of a change map and estimating area based on the reference sample data.
TL;DR: chieving the full potential from CS projects requires meta-data describing the sampling process, reference data to allow for standardization, and insightful modeling suitable to the question of interest.
TL;DR: This work proposes a method for conducting epigenome-wide association studies analysis when a reference dataset is unavailable, including a bootstrap method for estimating standard errors and demonstrates that it can perform as well as or better than methods that make explicit use of reference datasets.
Abstract: Motivation: Recently there has been increasing interest in the effects of cell mixture on the measurement of DNA methylation, specifically the extent to which small perturbations in cell mixture proportions can register as changes in DNA methylation. A recently published set of statistical methods exploits this association to infer changes in cell mixture proportions, and these methods are presently being applied to adjust for cell mixture effect in the context of epigenome-wide association studies. However, these adjustments require the existence of reference datasets, which may be laborious or expensive to collect. For some tissues such as placenta, saliva, adipose or tumor tissue, the relevant underlying cell types may not be known. Results: We propose a method for conducting epigenome-wide association studies analysis when a reference dataset is unavailable, including a bootstrap method for estimating standard errors. We demonstrate via simulation study and several real data analyses that our proposed method can perform as well as or better than methods that make explicit use of reference datasets. In particular, it may adjust for detailed cell type differences that may be unavailable even in existing reference datasets. Availability and implementation: Software is available in the R package RefFreeEWAS. Data for three of four examples were obtained from Gene Expression Omnibus (GEO), accession numbers GSE37008, GSE42861 and GSE30601, while reference data were obtained from GEO accession number GSE39981. Contact: andres.houseman@oregonstate.edu Supplementary information: Supplementary data are available at Bioinformatics online.
TL;DR: The current status of accuracy assessment that has emerged from nearly 50 years of practice is described and improved methods are required to address new challenges created by advanced technology that has expanded the capacity to map land cover extensively in space and intensively in time.