Interpretable Entity Representations through Large-Scale Typing
Yasumasa Onoe,Greg Durrett +1 more
- 01 Apr 2020
- pp 612-624
TL;DR: This paper presents an approach to creating entity representations that are human readable and achieve high performance on entity-related tasks out of the box, and shows that these embeddings can be post-hoc modified through a small number of rules to incorporate domain knowledge and improve performance.
read more
Abstract: In standard methodology for natural language processing, entities in text are typically embedded in dense vector spaces with pre-trained models. The embeddings produced this way are effective when fed into downstream models, but they require end-task fine-tuning and are fundamentally difficult to interpret. In this paper, we present an approach to creating entity representations that are human readable and achieve high performance on entity-related tasks out of the box. Our representations are vectors whose values correspond to posterior probabilities over fine-grained entity types, indicating the confidence of a typing model’s decision that the entity belongs to the corresponding type. We obtain these representations using a fine-grained entity typing model, trained either on supervised ultra-fine entity typing data (Choi et al. 2018) or distantly-supervised examples from Wikipedia. On entity probing tasks involving recognizing entity identity, our embeddings used in parameter-free downstream models achieve competitive performance with ELMo- and BERT-based embeddings in trained models. We also show that it is possible to reduce the size of our type set in a learning-based way for particular domains. Finally, we show that these embeddings can be post-hoc modified through a small number of rules to incorporate domain knowledge and improve performance.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Modeling Fine-Grained Entity Types with Box Embeddings
Yasumasa Onoe,Michael Boratko,Andrew McCallum,Greg Durrett +3 more
- 01 Aug 2021
TL;DR: The authors propose to represent both types and entity mentions as boxes and use a BERT-based model to embed each mention and its context into a box space, which is then used to derive both the posterior probability of a mention exhibiting a given type and the conditional probability relations between types themselves.
Ultra-Fine Entity Typing with Weak Supervision from a Masked Language Model
Hongliang Dai,Yangqiu Song,Haixun Wang +2 more
- 01 Aug 2021
TL;DR: The authors used a BERT Masked Language Model (MLM) to predict context dependent hypernyms of the mention, which can be used as type labels for fine-grained entity typing.
•Posted Content
CREAK: A Dataset for Commonsense Reasoning over Entity Knowledge.
TL;DR: The authors introduce CREAK, a testbed for commonsense reasoning about entity knowledge, bridging fact-checking about entities with commonsense inferences (if you are good at a skill you can teach others how to do it).
Unified Semantic Typing with Meaningful Label Inference
01 Jan 2022
TL;DR: Huang et al. as mentioned in this paper presented a paper on the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAML 2022.
19
BLINK with Elasticsearch for Efficient Entity Linking in Business Conversations
01 Jan 2022
TL;DR: Laskar et al. as mentioned in this paper presented a paper at the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track.
References
•Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
- 01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
138.5K
•Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
- 12 Jun 2017
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
•Posted Content
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.
82.5K
•Posted Content
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
81.7K
Attention Is All You Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Łukasz Kaiser,Illia Polosukhin +7 more
- 01 Jan 2017
Abstract: The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
51.8K
Related Papers (5)
Eunsol Choi,Omer Levy,Yejin Choi,Luke Zettlemoyer +3 more
- 01 Jul 2018
Yasumasa Onoe,Greg Durrett +1 more
- 01 May 2019