About: Evaluation of binary classifiers is a research topic. Over the lifetime, 2 publications have been published within this topic receiving 21 citations.
TL;DR: A number of statistically grounded performance evaluation metrics capable of evaluating binary classifiers in absence of annotated Ground Truth, and their evaluation metrics requiring no Ground Truth have high correlation with traditional metrics.
Abstract: In this paper, we present a number of statistically grounded performance evaluation metrics capable of evaluating binary classifiers in absence of annotated Ground Truth. These metrics are generic and can be applied to any type of classifier but are experimentally validated on binarization algorithms. We applied the statistically grounded metrics and compared them with metrics based on annotated data. Our approach has statistically significant better than random results in classifiers selection, and our evaluation metrics requiring no Ground Truth have high correlation with traditional metrics. We conducted experiments on the images from the DIBCO binarization contests between 2009 and 2013.
TL;DR: This article is an introduction to some of the most commonly used performance measures for the evaluation of binary classifiers, and explains how to assess the statistical significance of an obtained performance value, how to calculate approximate and exact parametric confidence intervals, and how to derive percentile bootstrap confidence intervals for a performance measure.
Abstract: This article is an introduction to some of the most commonly used performance measures for the evaluation of binary classifiers. These measures are categorized into three broad families: measures based on a single classification threshold, measures based on a probabilistic interpretation of error, and ranking measures. Graphical methods, such as ROC curves, precision-recall curves, TPR-FPR plots, gain charts, and lift charts, are also discussed. Using a simple example, we illustrate how to calculate the various performance measures and show how they are related. The article also explains how to assess the statistical significance of an obtained performance value, how to calculate approximate and exact parametric confidence intervals, and how to derive percentile bootstrap confidence intervals for a performance measure.