Multi-Granular Text Encoding for Self-Explaining Categorization.

Open AccessPosted Content

Multi-Granular Text Encoding for Self-Explaining Categorization.

- 19 Jul 2019

3

TL;DR: In this article, a tree-structured LSTM is used to learn a context-independent representation for each unit via parameter sharing, which can extract intuitive multi-granular evidence to support its predictions.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Figure 2: Structures for a sentence w1w2w3w4, where each node corresponds to a phrase or ngram.

Figure 4: Effectiveness of the extracted evidence.

Table 1: A medical report snippet and its diagnoses.

Citations

•Posted Content

Evaluating Explanation Methods for Neural Machine Translation

Jierui Li, +5 more

- 04 May 2020

- arXiv: Computation and Language

TL;DR: An initial attempt to evaluate explanation methods from an alternative viewpoint and proposes a principled metric based on fidelity in regard to the predictive behavior of the NMT model.

...read moreread less

16

•Journal Article•10.1155/2022/7559523

Intelligent Classification Method of Archive Data Based on Multigranular Semantics

Xiaobo Jiang

- 14 May 2022

- Computational Intelligence and Neuroscie...

TL;DR: An intelligent classification method for archive data based on multigranular semantics is proposed, which uses the multilabel data set to train the constructed semantic-label multigramular attention model, and outputs the classification result.

...read moreread less

2

•Posted Content

Answering Ambiguous Questions through Generative Evidence Fusion and Round-Trip Prediction

Yifan Gao, +9 more

- 26 Nov 2020

- arXiv: Computation and Language

TL;DR: The proposed round-trip prediction is a model-agnostic general approach for answering ambiguous open-domain questions, which improves the state-of-the-art Refuel as well as several baseline models.

...read moreread less

References

•Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

138.5K

•Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- 12 Jun 2017

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

94.2K

•Posted Content

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 22 Dec 2014

- arXiv: Learning

TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.

...read moreread less

82.5K

•Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018

- arXiv: Computation and Language

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

81.7K

Proceedings Article•10.3115/V1/D14-1162

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

- 01 Oct 2014

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

41.6K