Learning Sentence-Level Representations with Predictive Coding
TL;DR: The authors extend BERT-style models with bottom-up and top-down computation to predict future sentences in latent space at each intermediate layer in the networks and conduct extensive experimentation with various benchmarks for the English and Spanish languages.
read more
Abstract: Learning sentence representations is an essential and challenging topic in the deep learning and natural language processing communities. Recent methods pre-train big models on a massive text corpus, focusing mainly on learning the representation of contextualized words. As a result, these models cannot generate informative sentence embeddings since they do not explicitly exploit the structure and discourse relationships existing in contiguous sentences. Drawing inspiration from human language processing, this work explores how to improve sentence-level representations of pre-trained models by borrowing ideas from predictive coding theory. Specifically, we extend BERT-style models with bottom-up and top-down computation to predict future sentences in latent space at each intermediate layer in the networks. We conduct extensive experimentation with various benchmarks for the English and Spanish languages, designed to assess sentence- and discourse-level representations and pragmatics-focused assessments. Our results show that our approach improves sentence representations consistently for both languages. Furthermore, the experiments also indicate that our models capture discourse and pragmatics knowledge. In addition, to validate the proposed method, we carried out an ablation study and a qualitative study with which we verified that the predictive mechanism helps to improve the quality of the representations.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
The Linguistic Pathways Model: Capturing the Multiple Dimensions of Reading Development
Xiuhong Tong,S. Hélène Deacon,Xiuhong Tong,S. Hélène Deacon +3 more
Abstract: ABSTRACT The importance of oral language skills in reading comprehension is widely recognized in contemporary models. Building on this foundation, we propose the Linguistic Pathways Model. In this model, we illuminate mechanistic and developmental detail by which individual components of oral language support reading comprehension and embrace the multiple dimensions across which reading development plays out. This is the level of theoretical detail needed to inform instruction in the classroom that is most likely to propel children on strong trajectories of reading development. We illustrate the value of this model by focusing on syntactic skills—the ability to understand and manipulate sentence structure. We hypothesize two core pathways by which syntactic skills impact reading comprehension. In the syntax‐to‐lexicon pathway, syntactic skills influence how readers construct lexical representations, ultimately impacting reading comprehension. In the syntax‐to‐sentence pathway, syntactic skills affect reading comprehension by shaping how readers parse sentences and generate predictions about upcoming information. In each, we elaborate on mechanisms of these influences. We also detail the nature of developmental effects, including changes in relative reliance on skills over time and the temporal order of effects, and the interactions between the two. This work provides a new theoretical model for understanding the precise pathways through which individual oral language skills contribute to reading comprehension development, making predictions that are testable in classrooms.
Next-Gen Language Mastery: Exploring Advances in Natural Language Processing Post-transformers
Mily Lal,Manisha Bhende,Swati Sharma,Pallavi Thorat,Ashok K. Goel,Poi Tamrakar,A. K. Pathak +6 more
- 01 Jan 2024
Learning to Plan Long-Term for Language Modeling
Florian Mai,Nathan Cornille,Marie‐Francine Moens +2 more
- 23 Aug 2024
TL;DR: Researchers propose a planner for language models to predict long-term text continuations by sampling multiple plans, conditioning the model on a distribution of text continuations, and trading computation time for improved next token prediction accuracy.
Extracting and Encoding: Leveraging Large Language Models and Medical Knowledge to Enhance Radiological Text Representation
Pablo Messina,René Víctor Valqui Vidal,Denis Parra,Álvaro Soto,Vladimir Araujo +4 more
- 02 Jul 2024
TL;DR: This study presents a two-stage framework leveraging large language models and medical knowledge to enhance radiological text representation, outperforming state-of-the-art methods in tasks like sentence ranking and label extraction from radiology reports.
GenKubeSec: LLM-Based Kubernetes Misconfiguration Detection, Localization, Reasoning, and Remediation
Ehud Malul,Yair Meidan,Dudu Mimran,Yuval Elovici,Asaf Shabtai +4 more
- 30 May 2024
TL;DR: GenKubeSec is an LLM-based method for detecting, localizing, reasoning about, and remediating KCF misconfigurations. It achieves high precision and recall, and provides detailed explanations for misconfigurations.
References
•Journal Article
Visualizing Data using t-SNE
TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation
Kyunghyun Cho,Bart van Merriënboer,Caglar Gulcehre,Dzmitry Bahdanau,Fethi Bougares,Holger Schwenk,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio +8 more
- 01 Jan 2014
TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers,Iryna Gurevych +1 more
- 14 Aug 2019
TL;DR: Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity is presented.
Deep contextualized word representations
Matthew E. Peters,Mark Neumann,Mohit Iyyer,Matt Gardner,Christopher Clark,Kenton Lee,Luke Zettlemoyer +6 more
- 15 Feb 2018
TL;DR: This paper introduced a new type of deep contextualized word representation that models both complex characteristics of word use (e.g., syntax and semantics), and how these uses vary across linguistic contexts (i.e., to model polysemy).
•Proceedings Article
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang,Zihang Dai,Yiming Yang,Jaime G. Carbonell,Ruslan Salakhutdinov,Quoc V. Le +5 more
- 19 Jun 2019
TL;DR: The authors proposes XLNet, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT The authors.