Multi-Granular Text Encoding for Self-Explaining Categorization
Zhiguo Wang,Yue Zhang,Mo Yu,Wei Zhang,Lin Pan,Linfeng Song,Kun Xu,Yousef El-Kurdi +7 more
- 01 Jul 2019
- pp 41-45
TL;DR: This work defines multi-granular ngrams as basic units for explanation, and organizes all n Grammars into a hierarchical structure, so that shorter n grams can be reused while computing longer n Grammar, and can extract intuitive multi- granular evidence to support its predictions.
read more
Abstract: Self-explaining text categorization requires a classifier to make a prediction along with supporting evidence. A popular type of evidence is sub-sequences extracted from the input text which are sufficient for the classifier to make the prediction. In this work, we define multi-granular ngrams as basic units for explanation, and organize all ngrams into a hierarchical structure, so that shorter ngrams can be reused while computing longer ngrams. We leverage the tree-structured LSTM to learn a context-independent representation for each unit via parameter sharing. Experiments on medical disease classification show that our model is more accurate, efficient and compact than the BiLSTM and CNN baselines. More importantly, our model can extract intuitive multi-granular evidence to support its predictions.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Evaluating Explanation Methods for Neural Machine Translation
Jierui Li,Lemao Liu,Huayang Li,Guanlin Li,Guoping Huang,Shuming Shi +5 more
- 04 May 2020
TL;DR: This article proposed a principled metric based on fidelity in regard to the predictive behavior of the NMT model, and quantitatively evaluated several explanation methods in terms of the proposed metric and reveal some valuable findings for these explanation methods.
•Posted Content
Evaluating Explanation Methods for Neural Machine Translation
TL;DR: An initial attempt to evaluate explanation methods from an alternative viewpoint and proposes a principled metric based on fidelity in regard to the predictive behavior of the NMT model.
16
•Posted Content
Local Interpretations for Explainable Natural Language Processing: A Survey.
TL;DR: This article investigated various methods to improve the interpretability of deep neural networks for natural language processing (NLP) tasks, including machine translation and sentiment analysis, and provided a comprehensive discussion on the definition of the term ''interpretability'' and its various aspects at the beginning of this work.
14
•Posted Content
Open-Retrieval Conversational Machine Reading
TL;DR: The authors proposed a multi-passage Discourse-aware Entailment Reasoning Network (MUDERN), which extracts conditions in the rule texts through discourse segmentation, conducts multipassage entailment reasoning to answer user questions directly, or asks clarification follow-up questions to inquiry more information.
12
Local Interpretations for Explainable Natural Language Processing: A Survey
Siwen Luo,Hamish Ivison,Soyeon Caren Han,Josiah Poon +3 more
TL;DR: Local interpretations for Explainable NLP models survey local interpretation techniques for improving the interpretability of deep neural networks for NLP tasks.
5
References
•Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
- 01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
138.5K
•Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
- 12 Jun 2017
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
•Posted Content
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.
82.5K
•Posted Content
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
81.7K
Attention Is All You Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Łukasz Kaiser,Illia Polosukhin +7 more
- 01 Jan 2017
Abstract: The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
51.8K