Open AccessPosted Content
BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding.
Timo I. Denk,Christian Reisswig +1 more
TL;DR: BERTgrid as discussed by the authors represents a document as a grid of contextualized word piece embedding vectors, thereby making its spatial structure and semantics accessible to the processing neural network, and uses BERTgrid in combination with a fully convolutional network for extracting fields from invoices.
read more
Abstract: For understanding generic documents, information like font sizes, column layout, and generally the positioning of words may carry semantic information that is crucial for solving a downstream document intelligence task. Our novel BERTgrid, which is based on Chargrid by Katti et al. (2018), represents a document as a grid of contextualized word piece embedding vectors, thereby making its spatial structure and semantics accessible to the processing neural network. The contextualized embedding vectors are retrieved from a BERT language model. We use BERTgrid in combination with a fully convolutional network on a semantic instance segmentation task for extracting fields from invoices. We demonstrate its performance on tabulated line item and document header field extraction.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
Rafal Powalski,Lukasz Borchmann,Dawid Jurkiewicz,Tomasz Dwojak,Michał Pietruszka,Gabriela Pałka +5 more
- 05 Sep 2021
TL;DR: This article proposed a TILT neural network architecture which simultaneously learns layout information, visual features, and textual semantics, and achieved state-of-the-art results in extracting information from documents and answering questions which demand layout understanding.
198
•Posted Content
DocFormer: End-to-End Transformer for Document Understanding
TL;DR: Docformer as mentioned in this paper uses text, vision and spatial features and combines them using a novel multi-modal self-attention layer, which makes it easy for the model to correlate text to visual tokens and vice versa.
173
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding
Jiapeng Wang,Lianwen Jin,Kai Ding +2 more
- 28 Feb 2022
TL;DR: Experimental results on eight languages have shown that LiLT can achieve competitive or even superior performance on diverse widely-used downstream benchmarks, which enables language-independent benefit from the pre-training of document layout structure.
Representation Learning for Information Extraction from Form-like Documents
Bodhisattwa Prasad Majumder,Navneet Potti,Sandeep Tata,James B. Wendt,Qi Zhao,Marc Najork +5 more
- 01 Jul 2020
TL;DR: An extraction system that uses knowledge of the types of the target fields to generate extraction candidates and a neural network architecture that learns a dense representation of each candidate based on neighboring words in the document is proposed.
LAMBERT: Layout-Aware Language Modeling for Information Extraction.
Łukasz Garncarek,Rafał Powalski,Tomasz Stanisławek,Bartosz Topolski,Piotr Halama,Michał Turski,Filip Graliński +6 more
- 05 Sep 2021
TL;DR: The authors modify the Transformer encoder architecture in a way that allows it to use layout features obtained from an OCR system, without the need to re-learn language semantics from scratch.
126
References
•Proceedings Article
Efficient Estimation of Word Representations in Vector Space
Tomas Mikolov,Kai Chen,Greg S. Corrado,Jeffrey Dean +3 more
- 16 Jan 2013
TL;DR: Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.
27.5K
•Posted Content
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.
25.3K
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin,Ming-Wei Chang,Kenton Lee,Kristina Toutanova +3 more
- 11 Oct 2018
TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
24.6K
•Proceedings Article
Faster R-CNN: towards real-time object detection with region proposal networks
Shaoqing Ren,Kaiming He,Ross Girshick,Jian Sun +3 more
- 07 Dec 2015
TL;DR: Ren et al. as discussed by the authors proposed a region proposal network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Chargrid: Towards Understanding 2D Documents.
Anoop R Katti,Christian Reisswig,Cordula Guder,Sebastian Brarda,Steffen Bickel,Johannes Höhne,Jean Baptiste Faddoul +6 more
- 24 Sep 2018
TL;DR: In this paper, a generic document understanding pipeline for structured documents is presented, which makes use of a fully convolutional encoder-decoder network that predicts a segmentation mask and bounding boxes.
202