XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners

doi:10.48550/arxiv.2310.05502

Journal Article10.48550/arxiv.2310.05502

XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners

Yun Luo, +7 more

- 09 Oct 2023

- arXiv.org

- Vol. abs/2310.05502

TL;DR: This work proposes a novel Explainable Active Learning framework (XAL) for low-resource text classification, which aims to encourage classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations.

Abstract: Active learning aims to construct an effective training set by iteratively curating the most informative unlabeled data for annotation, which is practical in low-resource tasks. Most active learning techniques in classification rely on the model's uncertainty or disagreement to choose unlabeled data. However, previous work indicates that existing models are poor at quantifying predictive uncertainty, which can lead to over-confidence in superficial patterns and a lack of exploration. Inspired by the cognitive processes in which humans deduce and predict through causal information, we propose a novel Explainable Active Learning framework (XAL) for low-resource text classification, which aims to encourage classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations. Specifically, besides using a pre-trained bi-directional encoder for classification, we employ a pre-trained uni-directional decoder to generate and score the explanation. A ranking loss is proposed to enhance the decoder's capability in scoring explanations. During the selection of unlabeled data, we combine the predictive uncertainty of the encoder and the explanation score of the decoder to acquire informative data for annotation. As XAL is a general framework for text classification, we test our methods on six different classification tasks. Extensive experiments show that XAL achieves substantial improvement on all six tasks over previous AL methods. Ablation studies demonstrate the effectiveness of each component, and human evaluation shows that the model trained in XAL performs surprisingly well in explaining its prediction.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Figure 4: Results given the data selection budget 500 instances in six text classification tasks, where 100 instances are selected for annotation in each iteration. Here we plot the specific values of XAL and the second significant performance when using 500 instances, and the detailed performance values can be found in Appendix D.1.

Table 3: Tasks and examples for experiments.

Table 10: Case study of the explanation generation of our model. The model is trained on 500 labeled data instances following the AL process in Section 4.1.

Figure 1: Data selection strategy in Explainable Active Learning (XAL). Previous work selects the unlabeled data mostly relying on the model’s uncertainly (a), but XAL proposes to further leverage the model’s explanation of its prediction (b).

Figure 5: Experimental results demonstrate how much data, when selected using AL methods, is required for the models to achieve 90% of the performance of those trained on the complete training datasets. In each iteration, we annotate 50 instances. The performance of models trained on the whole training sets is, (a) RTE – 83.11%, (b) MRPC – 84.74%, (c) COVID19 – 75.45%, and (d) DEBA – 65.71%. The green triangles refer to the average values of the experiments on three different initial sets Dl and three different random seeds. The circles refer to outliers. Detailed results can be seen in Appendix D.2.

Figure 6: Results of ablation study in the six text classification tasks. We select 100 instances in each iteration and conduct 4 iterations (the same with Section 4.1). The results are measured using macro-F1 scores and they are the average values on three different initial sets Dl and three different random seeds.

References

•Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

138.5K

•Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018

- arXiv: Computation and Language

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

81.7K

•Journal Article

Visualizing Data using t-SNE

Laurens van der Maaten, +1 more

- 01 Jan 2008

- Journal of Machine Learning Research

TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.

...read moreread less

45.8K

•Proceedings Article

Language Models are Few-Shot Learners

Tom B. Brown, +30 more

- 28 May 2020

TL;DR: GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

...read moreread less

25.2K

•Proceedings Article•10.18653/V1/W18-5446

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

Alex Wang, +5 more

- 01 Nov 2018

TL;DR: The gluebenchmark as mentioned in this paper is a benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models.

...read moreread less

7.3K

...

Expand