Journal Article10.48550/arxiv.2310.05502
XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners
Yun Luo,Zhen Yang,Fandong Meng,Yingjie Li,Fang Guo,Qinglin Qi,Jie Zhou,Yue Zhang +7 more
TL;DR: This work proposes a novel Explainable Active Learning framework (XAL) for low-resource text classification, which aims to encourage classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations.
read more
Abstract: Active learning aims to construct an effective training set by iteratively curating the most informative unlabeled data for annotation, which is practical in low-resource tasks. Most active learning techniques in classification rely on the model's uncertainty or disagreement to choose unlabeled data. However, previous work indicates that existing models are poor at quantifying predictive uncertainty, which can lead to over-confidence in superficial patterns and a lack of exploration. Inspired by the cognitive processes in which humans deduce and predict through causal information, we propose a novel Explainable Active Learning framework (XAL) for low-resource text classification, which aims to encourage classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations. Specifically, besides using a pre-trained bi-directional encoder for classification, we employ a pre-trained uni-directional decoder to generate and score the explanation. A ranking loss is proposed to enhance the decoder's capability in scoring explanations. During the selection of unlabeled data, we combine the predictive uncertainty of the encoder and the explanation score of the decoder to acquire informative data for annotation. As XAL is a general framework for text classification, we test our methods on six different classification tasks. Extensive experiments show that XAL achieves substantial improvement on all six tasks over previous AL methods. Ablation studies demonstrate the effectiveness of each component, and human evaluation shows that the model trained in XAL performs surprisingly well in explaining its prediction.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Figure 4: Results given the data selection budget 500 instances in six text classification tasks, where 100 instances are selected for annotation in each iteration. Here we plot the specific values of XAL and the second significant performance when using 500 instances, and the detailed performance values can be found in Appendix D.1. 
Table 3: Tasks and examples for experiments. 
Table 10: Case study of the explanation generation of our model. The model is trained on 500 labeled data instances following the AL process in Section 4.1. 
Figure 1: Data selection strategy in Explainable Active Learning (XAL). Previous work selects the unlabeled data mostly relying on the model’s uncertainly (a), but XAL proposes to further leverage the model’s explanation of its prediction (b). 
Figure 5: Experimental results demonstrate how much data, when selected using AL methods, is required for the models to achieve 90% of the performance of those trained on the complete training datasets. In each iteration, we annotate 50 instances. The performance of models trained on the whole training sets is, (a) RTE – 83.11%, (b) MRPC – 84.74%, (c) COVID19 – 75.45%, and (d) DEBA – 65.71%. The green triangles refer to the average values of the experiments on three different initial sets Dl and three different random seeds. The circles refer to outliers. Detailed results can be seen in Appendix D.2. 
Figure 6: Results of ablation study in the six text classification tasks. We select 100 instances in each iteration and conduct 4 iterations (the same with Section 4.1). The results are measured using macro-F1 scores and they are the average values on three different initial sets Dl and three different random seeds.
References
•Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
- 01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
138.5K
•Posted Content
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
81.7K
•Journal Article
Visualizing Data using t-SNE
TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
•Proceedings Article
Language Models are Few-Shot Learners
Tom B. Brown,Benjamin Mann,Nick Ryder,Melanie Subbiah,Jared Kaplan,Prafulla Dhariwal,Arvind Neelakantan,Pranav Shyam,Girish Sastry,Amanda Askell,Sandhini Agarwal,Ariel Herbert-Voss,Gretchen Krueger,Thomas Henighan,Rewon Child,Aditya Ramesh,Daniel M. Ziegler,Jeffrey Wu,Clemens Winter,Christopher Hesse,Mark Chen,Eric Sigler,Mateusz Litwin,Scott Gray,Benjamin Chess,Jack Clark,Christopher Berner,Samuel McCandlish,Alec Radford,Ilya Sutskever,Dario Amodei +30 more
- 28 May 2020
TL;DR: GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Wang,Amanpreet Singh,Julian Michael,Felix Hill,Omer Levy,Samuel R. Bowman +5 more
- 01 Nov 2018
TL;DR: The gluebenchmark as mentioned in this paper is a benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models.