Towards Practical Active Learning for Classification

doi:10.4233/UUID:12720D56-8A35-4B36-B287-2E301AE69BD0

Open Access10.4233/UUID:12720D56-8A35-4B36-B287-2E301AE69BD0

Towards Practical Active Learning for Classification

- 20 Nov 2018

6

TL;DR: This thesis addresses the particular challenge of using as few annotations as possible, while at the same time, maintaining a good learning performance, and proposes two novel active learning methods that show a clear advantage over passive learning.

Abstract: In recent decades, the availability of a large amount of data has propelled the field of machine learning enormously. Machine learning, however, relies heavily on the availability of annotated data, typically labels indicating to which class a data instance belongs. With the huge amounts of data, this raises the question of how to efficiently annotate data, certainly when having limited resources. This thesis addresses the particular challenge of using as few annotations as possible, while at the same time, maintaining a good learning performance. For that we utilize active learning, which iteratively chooses the most valuable instances as to obtain the labels froman oracle (e.g. a human expert). Though many studies have demonstrated that active learning can reduce the annotation cost, there are still several issues that limit its practical use. This thesis makes a further step forwards making active learning more practical for real-world applications. We first provide a benchmark and comparison of six different categories of active learning algorithms built on logistic regression. This work provides a better understanding of the underlying characteristics of various active learners and illustrates the potential benefits of using such techniques, but it also provides many cases for which active learning fails to outperform passive learning (i.e. randomly selecting instances for labeling). Those failed cases motivate us to propose two novel active learning methods that show a clear advantage over passive learning. The first one proposes to weight the so-called retraining-based criteria with an uncertainty score that is measured by the estimated posterior probability. The second one measures the usefulness of unlabeled instances according to the variance of the predictive probability. This method takes an additional step towards practical active learning, clearly outperforming current state of the art on binary andmulti-class classification tasks. We further consider two realistic issues when applying active learning to real-world problems. One is how to find an initial set that contains at least one instance per class to start the active labeling cycle. The other one is dealing with the absence of human annotators in the interactive labeling loop. We propose new approaches to tackle the above problems and observe good performance compared to existing methods. This thesis concludes with an analysis of the contributions and limitations of our work, as well as research directions that deserve further studies. We hope that this thesis also inspires others to make active learning more suitable for real-world applications.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1007/s10844-022-00746-0

TROMPA-MER: an open dataset for personalized music emotion recognition

Juan Sebastián Gómez-Cañón, +10 more

- 19 Sep 2022

- Journal of Intelligent Information Syste...

TL;DR: In this article , the authors present a platform and a dataset to help research on music emotion recognition (MER) using citizen science strategies and generate music emotion annotations, where participants annotated each music excerpt with single free-text emotion words (in native language), distinct forced-choice emotion categories, preference and familiarity.

...read moreread less

16

Enhancing Deep Active Learning Using Selective Self-Training For Image Classification

Emmeleia Panagiota Mastoropoulou

- 01 Jan 2019

TL;DR: A high quality and large scale training data-set is an important guarantee to teach an ideal classifier for image classification.

...read moreread less

9

•Posted Content•10.21203/rs.3.rs-1926421/v1

TROMPA-MER: an open dataset for personalized Music Emotion Recognition

10 Aug 2022

TL;DR: In this paper , the authors present a platform and a dataset to help research on music emotion recognition (MER) using citizen science strategies and generate music emotion annotations, where participants annotated each music excerpt with single free-text emotion words (in native language), distinct forced-choice emotion categories, preference and familiarity.

...read moreread less

8

•10.5281/ZENODO.5624399

Let's agree to disagree: Consensus Entropy Active Learning for Personalized Music Emotion Recognition

Juan S. Gómez-Cañón, +4 more

- 07 Nov 2021

TL;DR: This work proposes a methodology based on uncertainty sampling and query-by-committee, adopting prior knowledge from the agreement of human annotations as an oracle for active learning (AL), and suggests that this methodology can be beneficial to produce personalized classification models that exhibit different results depending on the algorithms’ complexity.

...read moreread less

5

•Journal Article

A variance minimization criterion to active learning on graphs

Ming Ji, +1 more

- 01 Jan 2012

- Journal of Machine Learning Research

TL;DR: In this paper, the authors considered the problem of active learning over the vertices in a graph, without feature representation, based on the common graph smoothness assumption, which is formulated in a Gaussian random field model and analyzed the probability distribution over the unlabeled vertices conditioned on the label information.

...read moreread less

3

References

•Journal Article•10.1145/3065386

ImageNet classification with deep convolutional neural networks

Alex Krizhevsky, +2 more

- 24 May 2017

- Communications of The ACM

TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

...read moreread less

98.2K

•Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

- 03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

88.4K

Journal Article•10.1109/5.726791

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

- 01 Jan 1998

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

53.5K

Journal Article•10.1145/1961189.1961199

LIBSVM: A library for support vector machines

Chih-Chung Chang, +1 more

- 06 May 2011

- ACM Transactions on Intelligent Systems ...

TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

...read moreread less

46.3K

•Proceedings Article•10.1109/CVPR.2005.177

Histograms of oriented gradients for human detection

Navneet Dalal, +1 more

- 20 Jun 2005

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.

...read moreread less

36.7K

...

Expand

Towards Practical Active Learning for Classification

Chat with Paper

AI Agents for this Paper

Citations

TROMPA-MER: an open dataset for personalized music emotion recognition

Enhancing Deep Active Learning Using Selective Self-Training For Image Classification

TROMPA-MER: an open dataset for personalized Music Emotion Recognition

Let's agree to disagree: Consensus Entropy Active Learning for Personalized Music Emotion Recognition

A variance minimization criterion to active learning on graphs

References

ImageNet classification with deep convolutional neural networks

ImageNet Classification with Deep Convolutional Neural Networks

Gradient-based learning applied to document recognition

LIBSVM: A library for support vector machines

Histograms of oriented gradients for human detection

Related Papers (5)

Link-based Active Learning

An Overview and a Benchmark of Active Learning for One-Class Classification.

Query by Committee in a Heterogeneous Environment

Active learning from relative queries

A survey on domain adaptation theory