TL;DR: A screening window coefficient, called "Z- factor," is defined, which is reflective of both the assay signal dynamic range and the data variation associated with the signal measurements, and therefore is suitable for assay quality assessment.
Abstract: The ability to identify active compounds (³hits²) from large chemical libraries accurately and rapidly has been the ultimate goal in developing high-throughput screening (HTS) assays. The ability to identify hits from a particular HTS assay depends largely on the suitability or quality of the assay used in the screening. The criteria or parameters for evaluating the ³suitability² of an HTS assay for hit identification are not well defined and hence it still remains difficult to compare the quality of assays directly. In this report, a screening window coefficient, called ³Z-factor,² is defined. This coefficient is reflective of both the assay signal dynamic range and the data variation associated with the signal measurements, and therefore is suitable for assay quality assessment. The Z-factor is a dimensionless, simple statistical characteristic for each HTS assay. The Z-factor provides a useful tool for comparison and evaluation of the quality of assays, and can be utilized in assay optimization and validation.
TL;DR: It is argued that replicate measurements are needed to verify assumptions of current methods and to suggest data analysis strategies when assumptions are not met, and the integration of replicates with robust statistical methods in primary screens will facilitate the discovery of reliable hits, ultimately improving the sensitivity and specificity of the screening process.
Abstract: High-throughput screening is an early critical step in drug discovery. Its aim is to screen a large number of diverse chemical compounds to identify candidate 'hits' rapidly and accurately. Few statistical tools are currently available, however, to detect quality hits with a high degree of confidence. We examine statistical aspects of data preprocessing and hit identification for primary screens. We focus on concerns related to positional effects of wells within plates, choice of hit threshold and the importance of minimizing false-positive and false-negative rates. We argue that replicate measurements are needed to verify assumptions of current methods and to suggest data analysis strategies when assumptions are not met. The integration of replicates with robust statistical methods in primary screens will facilitate the discovery of reliable hits, ultimately improving the sensitivity and specificity of the screening process.
TL;DR: The authors describe and show numerous real examples from the biologist-friendly Stat Server® HTS application (SHS), a custom-developed software tool built on the commercially available S-PLUS and StatServer statistical analysis and server software that remotely processes HTS data using powerful and sophisticated statistical methodology.
Abstract: High-throughput screening (HTS) plays a central role in modern drug discovery, allowing the rapid screening of large compound collections against a variety of putative drug targets. HTS is an industrial-scale process, relying on sophisticated auto mation, control, and state-of-the art detection technologies to organize, test, and measure hundreds of thousands to millions of compounds in nano-to microliter volumes. Despite this high technology, hit selection for HTS is still typically done using simple data analysis and basic statistical methods. The authors discuss in this article some shortcomings of these methods and present alternatives based on modern methods of statistical data analysis. Most important, they describe and show numerous real examples from the biologist-friendly Stat Server® HTS application (SHS), a custom-developed software tool built on the commercially available S-PLUS® and StatServer® statistical analysis and server software. This system remotely processes HTS data using powerful an...
TL;DR: Basic concepts of z score, z* score, strictly standardized mean difference (SSMD), SSMD*, and t statistic are presented, their commonality and difference are elaborated, and some common misusage that people should avoid are described.
Abstract: Hit selection is the ultimate goal in many high-throughput screens. Various analytic methods are available for this purpose. Some commonly used ones are z score, z* score, strictly standardized mean difference (SSMD), SSMD*, and t statistic. It is critical to know how to use them correctly because the misusage of them can readily produce misleading results. Here, the author presents basic concepts, elaborates their commonality and difference, describes some common misusage that people should avoid, and uses simulated simple examples to illustrate how to use them correctly.
TL;DR: A pair of new parameters are proposed, strictly standardized mean difference (SSMD) and coefficient of variability in difference (CVD), as QC metrics in RNAi HTS assays, compared to S/B and S/N, which capture the variabilities in both compared populations.