Proceedings Article10.48550/arXiv.2203.00274
TableFormer: Robust Transformer Modeling for Table-Text Encoding
Jingfeng Yang,Aditya Gupta,Shyam Upadhyay,Luheng He,Rahul Goel,Shachi Paul +5 more
- 01 Mar 2022
pp 528-537
TL;DR: This work proposes a robust and structurally aware table-text encoding architecture TableFormer, where tabular structural biases are incorporated completely through learnable attention biases, and could understand tables better due to its tabular inductive biases.
read more
Abstract: Understanding tables is an important aspect of natural language understanding. Existing models for table understanding require linearization of the table structure, where row or column order is encoded as an unwanted bias. Such spurious biases make the model vulnerable to row and column order perturbations. Additionally, prior work has not thoroughly modeled the table structures or table-text alignments, hindering the table-text understanding ability. In this work, we propose a robust and structurally aware table-text encoding architecture TableFormer, where tabular structural biases are incorporated completely through learnable attention biases. TableFormer is (1) strictly invariant to row and column orders, and, (2) could understand tables better due to its tabular inductive biases. Our evaluations showed that TableFormer outperforms strong baselines in all settings on SQA, WTQ and TabFact table reasoning datasets, and achieves state-of-the-art performance on SQA, especially when facing answer-invariant row and column order perturbations (6% improvement over the best baseline), because previous SOTA models’ performance drops by 4% - 6% when facing such perturbations while TableFormer is not affected.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Table 3: Denotation accuracy on WTQ development and test set. Median of 5 independent runs are reported. 
Table 2: Binary classification accuracy on TABFACT development and 4 splits of test set, as well as performance on test sets with our perturbation evaluation. Median of 5 independent runs are reported. Missing values are those not reported in the original paper. 
Table 4: Model size comparison. 
Table 5: ALL questions’ cell selection accuracy of TABLEFORMER variants on SQA development set. rcgp represents the setting including row ids, column ids and global positional ids, c-gp represents column ids and global positional ids, gp represents global positional ids, and pcp represents per-cell positional ids. “SAT” represents masking out some attention scores. “SO” represents adding attention bias before scaling. 
Figure 2: TABLEFORMER input and attention biases in the self attention module. This example corresponds to table (a) in Figure 1 and its paired question “query”. Different colors in the attention bias matrix denote different types of task independent biases derived based on the table structure and the associated text. 
Table 7: Ablation study of proposed attention biases.
Citations
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
TL;DR: Mooler et al. as mentioned in this paper presented a comprehensive and practical guide for practitioners and end-users working with large language models (LLMs) in their downstream natural language processing (NLP) tasks.
MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data
Yilun Zhao,Yunxiang Li,Chenying Li,Rui Zhang +3 more
- 03 Jun 2022
TL;DR: A new large-scale benchmark, MultiHiertt, with QA pairs over Multi Hierarchical Tabular and Textual data is constructed and a novel QA model termed MT2Net is introduced, which first applies facts retrieving to extract relevant supporting facts from both tables and text and then uses a reasoning module to perform symbolic reasoning over retrieved facts.
Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning
TL;DR: The authors decompose huge evidence (a huge table) into sub-evidence (a small table) to mitigate the interference of useless information for table reasoning, and decompose complex questions into simpler sub-questions for text reasoning.
TransTab: Learning Transferable Tabular Transformers Across Tables
Zifeng Wang,Jimeng Sun +1 more
- 19 May 2022
TL;DR: The goal of TransTab is to convert each sample to a generalizable embedding vector, and then apply stacked transformers for feature encoding, and one methodology insight is combining column description and table cells as the raw input to a gated transformer model.
71
Unified Training of Universal Time Series Forecasting Transformers
Gerald Woo,Chenghao Liu,Akshat Kumar,Caiming Xiong,Silvio Savarese,Doyen Sahoo +5 more
TL;DR: This work presents novel enhancements to the conventional time series Transformer architecture, resulting in the proposed Masked Encoder-based Universal Time Series Forecasting Transformer (Moirai), which achieves competitive or superior performance when compared to full-shot models.
56
References
•Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
- 12 Jun 2017
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
•Posted Content
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
81.7K
Transformer-XL: Attentive Language Models beyond a Fixed-Length Context.
Zihang Dai,Zhilin Yang,Yiming Yang,Jaime G. Carbonell,Quoc V. Le,Ruslan Salakhutdinov +5 more
- 09 Jan 2019
TL;DR: This work proposes a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence, which consists of a segment-level recurrence mechanism and a novel positional encoding scheme.
•Posted Content
Longformer: The Long-Document Transformer
TL;DR: Following prior work on long-sequence transformers, the Longformer is evaluated on character-level language modeling and achieves state-of-the-art results on text8 and enwik8 and pretrain Longformer and finetune it on a variety of downstream tasks.
3.9K
TaPas: Weakly Supervised Table Parsing via Pre-training
Jonathan Herzig,Pawel Krzysztof Nowak,Thomas Müller,Francesco Piccinno,Julian Martin Eisenschlos +4 more
- 01 Jul 2020
TL;DR: TaPas is presented, an approach to question answering over tables without generating logical forms that outperforms or rivals semantic parsing models by improving state-of-the-art accuracy on SQA and performing on par with the state of theart on WikiSQL and WikiTQ, but with a simpler model architecture.