BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding.

Open AccessPosted Content

BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding.

- 11 Sep 2019

71

TL;DR: BERTgrid as discussed by the authors represents a document as a grid of contextualized word piece embedding vectors, thereby making its spatial structure and semantics accessible to the processing neural network, and uses BERTgrid in combination with a fully convolutional network for extracting fields from invoices.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Book Chapter•10.1007/978-3-030-86331-9_47

Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer

Rafal Powalski, +5 more

- 05 Sep 2021

TL;DR: This article proposed a TILT neural network architecture which simultaneously learns layout information, visual features, and textual semantics, and achieved state-of-the-art results in extracting information from documents and answering questions which demand layout understanding.

...read moreread less

198

•Posted Content

DocFormer: End-to-End Transformer for Document Understanding

Srikar Appalaraju, +4 more

- 22 Jun 2021

- arXiv: Computer Vision and Pattern Recog...

TL;DR: Docformer as mentioned in this paper uses text, vision and spatial features and combines them using a novel multi-modal self-attention layer, which makes it easy for the model to correlate text to visual tokens and vice versa.

...read moreread less

173

•Proceedings Article•10.18653/v1/2022.acl-long.534

LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding

Jiapeng Wang, +2 more

- 28 Feb 2022

TL;DR: Experimental results on eight languages have shown that LiLT can achieve competitive or even superior performance on diverse widely-used downstream benchmarks, which enables language-independent benefit from the pre-training of document layout structure.

...read moreread less

154

•Proceedings Article•10.18653/V1/2020.ACL-MAIN.580

Representation Learning for Information Extraction from Form-like Documents

Bodhisattwa Prasad Majumder, +5 more

- 01 Jul 2020

TL;DR: An extraction system that uses knowledge of the types of the target fields to generate extraction candidates and a neural network architecture that learns a dense representation of each candidate based on neighboring words in the document is proposed.

...read moreread less

146

•Book Chapter•10.1007/978-3-030-86549-8_34

LAMBERT: Layout-Aware Language Modeling for Information Extraction.

Łukasz Garncarek, +6 more

- 05 Sep 2021

TL;DR: The authors modify the Transformer encoder architecture in a way that allows it to use layout features obtained from an OCR system, without the need to re-learn language semantics from scratch.

...read moreread less

126

...

Expand

References

•Proceedings Article

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, +3 more

- 16 Jan 2013

TL;DR: Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.

...read moreread less

27.5K

•Posted Content

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, +3 more

- 04 Jun 2015

- arXiv: Computer Vision and Pattern Recog...

TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.

...read moreread less

25.3K

Proceedings Article•10.18653/V1/N19-1423

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

24.6K

•Proceedings Article

Faster R-CNN: towards real-time object detection with region proposal networks

Shaoqing Ren, +3 more

- 07 Dec 2015

TL;DR: Ren et al. as discussed by the authors proposed a region proposal network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.

...read moreread less

13.8K

•Proceedings Article•10.18653/V1/D18-1476

Chargrid: Towards Understanding 2D Documents.

Anoop R Katti, +6 more

- 24 Sep 2018

TL;DR: In this paper, a generic document understanding pipeline for structured documents is presented, which makes use of a fully convolutional encoder-decoder network that predicts a segmentation mask and bounding boxes.

...read moreread less

202