Document layout analysis

Topic Tools

Papers published on a yearly basis

Papers

Journal Article•10.1109/34.244677•

The document spectrum for page layout analysis

[...]

Lawrence O'Gorman¹•Institutions (1)

Bell Labs¹

01 Nov 1993-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The document spectrum (or docstrum) as discussed by the authors is a method for structural page layout analysis based on bottom-up, nearest-neighbor clustering of page components, which yields an accurate measure of skew, within-line, and between-line spacings and locates text lines and text blocks.

...read moreread less

Abstract: Page layout analysis is a document processing technique used to determine the format of a page. This paper describes the document spectrum (or docstrum), which is a method for structural page layout analysis based on bottom-up, nearest-neighbor clustering of page components. The method yields an accurate measure of skew, within-line, and between-line spacings and locates text lines and text blocks. It is advantageous over many other methods in three main ways: independence from skew angle, independence from different text spacings, and the ability to process local regions of different text orientations within the same image. Results of the method shown for several different page formats and for randomly oriented subpages on the same image illustrate the versatility of the method. We also discuss the differences, advantages, and disadvantages of the docstrum with respect to other lay-out methods. >

...read moreread less

756 citations

Journal Article•10.1006/JVLC.1995.1010•

Layout Adjustment and the Mental Map

[...]

Kazuo Misue, Peter Eades, Wei Lai, Kozo Sugiyama

01 Jun 1995-Journal of Visual Languages and Computing

TL;DR: This paper discusses some layout adjustment methods and the preservation of the 'mental map' of the diagram, and two kinds of layout adjustments are described, an algorithm for rearranging a diagram to avoid overlapping nodes and a method aimed at changing the focus of interest of the user without destroying the mental map.

...read moreread less

Abstract: Many models in software and information engineering use graph representations; examples are data flow diagrams, state transition diagrams, flow charts, PERT charts, organization charts, Petri nets and entity-relationship diagrams. The usefulness of these graph representations depends on the quality of the layout of the graphs. Automatic graph layout, which can release humans from graph drawing, is now available in several visualization systems. Most automatic layout facilities take a purely combinatorial description of a graph and produce a layout of the graph; these methods are called 'layout creation' methods. For interactive systems, another kind of layout is needed: a facility which can adjust a layout after a change is made by the user or by the application. Although layout adjustment is essential in interactive systems, most existing layout algorithms are designed for layout creation. The use of a layout creation method for layout adjustment may totally rearrange the layout and thus destroy the user's 'mental map' of the diagram; thus a set of layout adjustment methods, separate from layout creation methods, is needed. This paper discusses some layout adjustment methods and the preservation of the 'mental map' of the diagram. First, several models are proposed to make the concept of 'mental map' more precise. Then two kinds of layout adjustments are described. One is an algorithm for rearranging a diagram to avoid overlapping nodes, and the other is a method aimed at changing the focus of interest of the user without destroying the mental map. Next, some experience with visualization systems in which the techniques have been employed is also described.

...read moreread less

662 citations

Book•

The document spectrum for page layout analysis

[...]

Lawrence O'Gorman

1 Jan 1995

TL;DR: The document spectrum (or docstrum), which is a method for structural page layout analysis based on bottom-up, nearest-neighbor clustering of page components, yields an accurate measure of skew, within-line, and between-line spacings and locates text lines and text blocks.

...read moreread less

628 citations

Journal Article•10.1109/2.144436•

A prototype document image analysis system for technical journals

[...]

George Nagy¹, Sharad C. Seth², Mahesh Viswanathan³•Institutions (3)

Rensselaer Polytechnic Institute¹, University of Nebraska–Lincoln², IBM³

01 Jul 1992-IEEE Computer

TL;DR: The document image acquisition process and the knowledge base that must be entered into the system to process a family of page images are described, and the process by which the X-Y tree data structure converts a 2-D page-segmentation problem into a series of 1-D string-parsing problems that can be tackled using conventional compiler tools.

...read moreread less

Abstract: Gobbledoc, a system providing remote access to stored documents, which is based on syntactic document analysis and optical character recognition (OCR), is discussed. In Gobbledoc, image processing, document analysis, and OCR operations take place in batch mode when the documents are acquired. The document image acquisition process and the knowledge base that must be entered into the system to process a family of page images are described. The process by which the X-Y tree data structure converts a 2-D page-segmentation problem into a series of 1-D string-parsing problems that can be tackled using conventional compiler tools is also described. Syntactic analysis is used in Gobbledoc to divide each page into labeled rectangular blocks. Blocks labeled text are converted by OCR to obtain a secondary (ASCII) document representation. Since such symbolic files are better suited for computerized search than for human access to the document content and because too many visual layout clues are lost in the OCR process (including some special characters), Gobbledoc preserves the original block images for human browsing. Storage, networking, and display issues specific to document images are also discussed. >

...read moreread less

480 citations

Proceedings Article•10.1109/ICDAR.2019.00166•

PubLayNet: Largest Dataset Ever for Document Layout Analysis

[...]

Xu Zhong¹, Jianbin Tang¹, Antonio Jimeno Yepes¹•Institutions (1)

IBM¹

16 Aug 2019

TL;DR: The PubLayNet dataset for document layout analysis is developed by automatically matching the XML representations and the content of over 1 million PDF articles that are publicly available on PubMed Central and demonstrated that deep neural networks trained on Pub LayNet accurately recognize the layout of scientific articles.

...read moreread less

Abstract: Recognizing the layout of unstructured digital documents is an important step when parsing the documents into structured machine-readable format for downstream applications. Deep neural networks that are developed for computer vision have been proven to be an effective method to analyze layout of document images. However, document layout datasets that are currently publicly available are several magnitudes smaller than established computing vision datasets. Models have to be trained by transfer learning from a base model that is pre-trained on a traditional computer vision dataset. In this paper, we develop the PubLayNet dataset for document layout analysis by automatically matching the XML representations and the content of over 1 million PDF articles that are publicly available on PubMed Central. The size of the dataset is comparable to established computer vision datasets, containing over 360 thousand document images, where typical document layout elements are annotated. The experiments demonstrate that deep neural networks trained on PubLayNet accurately recognize the layout of scientific articles. The pre-trained models are also a more effective base mode for transfer learning on a different document domain. We release the dataset (https://github.com/ibm-aur-nlp/PubLayNet) to support development and evaluation of more advanced models for document layout analysis.

...read moreread less

377 citations

...

Expand

Year	Papers
2024	5
2023	21
2022	20
2021	34
2020	19
2019	14

Topic Tools

Papers published on a yearly basis

Papers

The document spectrum for page layout analysis

Layout Adjustment and the Mental Map

The document spectrum for page layout analysis

A prototype document image analysis system for technical journals

PubLayNet: Largest Dataset Ever for Document Layout Analysis

Related Topics (5)

Performance Metrics