Proceedings Article10.1109/ISBI.2010.5490373
Predicting classifier performance with a small training set: Applications to computer-aided diagnosis and prognosis
Ajay Basavanhally,Scott Doyle,Anant Madabhushi +2 more
- 14 Apr 2010
- pp 229-232
TL;DR: A power law model is utilized to evaluate and compare various classifiers (Support Vector Machine, C4.5 decision tree, k-nearest neighbor) for four distinct CAD problems and suggests that, given sufficient training data, SVMs tend to be the best classifiers.
read more
Abstract: Selection of an appropriate classifier for computer-aided diagnosis (CAD) applications has typically been an ad hoc process. It is difficult to know a priori which classifier will yield high accuracies for a specific application, especially when well-annotated data for classifier training is scarce. In this study, we utilize an inverse power-law model of statistical learning to predict classifier performance when only limited amounts of annotated training data is available. The objectives of this study are to (a) predict classifier error in the context of different CAD problems when larger data cohorts become available, and (b) compare classifier performance and trends (both at the sample/patient level and at the pixel level) as additional data is accrued (such as in a clinical trial). In this paper we utilize a power law model to evaluate and compare various classifiers (Support Vector Machine (SVM), C4.5 decision tree, k-nearest neighbor) for four distinct CAD problems. The first two datasets deal with sample/patient-level classification for distinguishing between (1) high from low grade breast cancers and (2) high from low levels of lymphocytic infiltration in breast cancer specimens. The other two datasets are pixel-level classification problems for discriminating cancerous and non-cancerous regions on prostate (3) MRI and (4) histopathology. Our empirical results suggest that, given sufficient training data, SVMs tend to be the best classifiers. This was true for datasets (1), (2), and (3), while the C4.5 decision tree was the best classifier for dataset (4). Our results also suggest that results of classifier comparison made on small data cohorts should not be generalized as holding true when large amounts of data become available.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Evaluating the effect of dataset size on predictive model using supervised learning technique
Adeleke Raheem Ajiboye,Ruzaini Abdullah-Arshah,Hongwu Qin,H. Isah-Kebbe +3 more
- 01 Feb 2015
TL;DR: Findings from this study reveals that, the quantity of data partitioned for the purpose of training must be of good representation of the entire sets and sufficient enough to span through the input space and shows that the learning model with the largest size of training sets appears to be the most accurate and consistently delivers a much better and stable results.
3D Lacunarity in Multifractal Analysis of Breast Tumor Lesions in Dynamic Contrast-Enhanced Magnetic Resonance Imaging
TL;DR: A novel method of 3D multifractal analysis to characterize the spatial complexity (spatial arrangement of texture) of breast tumors at multiple scales and confirms the presence of multifractality in DCE-MR volumes of the breast, whereby multiple degrees of self-similarity prevail at multiple scale.
46
Improving classifier training efficiency for automatic cyberbullying detection with Feature Density
Juuso Kalevi Kristian Eronen,Michal Ptaszynski,Fumito Masui,Aleksander Smywiński-Pohl,Gniewosz Leliwa,Michal Wroczynski +5 more
TL;DR: It is hypothesized that estimating dataset complexity allows for the reduction of the number of required experiments iterations, which can optimize the resource-intensive training of ML models which is becoming a serious issue due to the increases in available dataset sizes and the ever rising popularity of models based on Deep Neural Networks.
39
Classification of Breast Masses on Contrast-Enhanced Magnetic Resonance Images Through Log Detrended Fluctuation Cumulant-Based Multifractal Analysis
TL;DR: The results suggest that the log-cumulant C2 can be effective in classifying typically biopsy-recommended cases and can contribute to novel feature classification techniques to aid radiologists every time there is a change in the clinical course, namely, when biopsy should be considered.
Predicting Classifier Performance with Limited Training Data: Applications to Computer-Aided Diagnosis in Breast and Prostate Cancer
TL;DR: This paper presents a framework for comparative evaluation of classifiers using only limited amounts of training data by using random repeated sampling (RRS) in conjunction with a cross-validation sampling strategy and suggests that this approach consistently yields error rates with lower variability.
References
Histological grading and prognosis in breast cancer; a study of 1409 cases of which 359 have been followed for 15 years.
H. J. G. Bloom,W. W. Richardson +1 more
TL;DR: This is a selection of photographs from around the world taken in the period of May 21 to 29, 1997, which were taken at the request of the then-president of the United States, George W. Bush.
3.1K
Estimating dataset size requirements for classifying DNA microarray data.
Sayan Mukherjee,Pablo Tamayo,Simon Rogers,Ryan Rifkin,Anna Engle,Colin Campbell,Todd R. Golub,Jill P. Mesirov +7 more
TL;DR: A statistical methodology for estimating dataset size requirements for classifying microarray data using learning curves is introduced, based on fitting inverse power-law models to construct empirical learning curves.
316
Computerized Image-Based Detection and Grading of Lymphocytic Infiltration in HER2+ Breast Cancer Histopathology
Ajay Basavanhally,Shridar Ganesan,Shannon Agner,James Monaco,Michael Feldman,John E. Tomaszewski,Gyan Bhanot,Anant Madabhushi +7 more
TL;DR: A computer-aided diagnosis (CADx) scheme to automatically detect and grade the extent of lymphocytic infiltration in digitized HER2+ BC histopathology will potentially help clinicians determine disease outcome and allow them to make better therapy recommendations for patients with HER2- BC.
298
Integrating structural and functional imaging for computer assisted detection of prostate cancer on multi-protocol in vivo 3 Tesla MRI
Satish Viswanath,B. Nicolas Bloch,Mark A. Rosen,Jonathan Chappelow,Robert Toth,Neil Rofsky,Robert E. Lenkinski,Elizabeth Genega,Arjun Kalyanpur,Anant Madabhushi +9 more
TL;DR: A novel comprehensive computer-aided scheme for CaP detection from high resolution in vivo multi-protocol MRI by integrating functional and structural information obtained via dynamic-contrast enhanced (DCE) and T2-weighted (T2-w) MRI, respectively is presented.
Computer-aided prognosis of ER+ breast cancer histopathology and correlating survival outcome with Oncotype DX assay
Ajay Basavanhally,Jun Xu,Anant Madabhushi,Shridar Ganesan +3 more
- 28 Jun 2009
TL;DR: A novel computer-aided prognosis (CAP) scheme that employs quantitatively derived image information to predict patient outcome analogous to the Oncotype DX Recurrence Score (RS), with high RS implying poor outcome and vice versa, is presented.