Enhancing document structure analysis using visual analytics

doi:10.1145/1774088.1774091

Open AccessProceedings Article10.1145/1774088.1774091

Enhancing document structure analysis using visual analytics

Andreas Stoffel, +3 more

- 22 Mar 2010

- pp 8-12

29

TL;DR: A new approach for analyzing the logical structure of text documents is presented, combining state-of-the-art machine learning with novel interactive visualization techniques, allowing a quick adaptation of the structure analysis process to unknown document classes and new tasks without requiring a predefined training set.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Figure 3: Average F-measure of the structure analysis in the different reference data iterations.

Figure 1: The different components of the system.

Table 4: Performance of the proposed system on product manuals.

Figure 2: Visualization of the structure analysis results.

Table 1: Performance of different algorithms on INTEGERS articles.

Table 2: Performance of different algorithms on computer science publications.

Citations

Proceedings Article•10.1109/ICDAR.2013.292

ICDAR 2013 Table Competition

Max Göbel, +3 more

- 25 Aug 2013

TL;DR: The Table Competition held in the context of ICDAR 2013 is the first attempt at objectively evaluating these techniques against each other in a standardized way, across several input formats.

...read moreread less

289

•Journal Article•10.1109/TVCG.2011.266

Visual Readability Analysis: How to Make Your Writings Easier to Read

Daniela Oelke, +3 more

- 01 May 2012

- IEEE Transactions on Visualization and C...

TL;DR: A semiautomatic feature selection approach is discussed that is used to choose appropriate measures from a collection of 141 candidate readability features and the visual analysis tool VisRA is presented, which allows the user to analyze the feature values across the text and within single sentences.

...read moreread less

57

•Journal Article•10.1002/ASI.21651

Using structural information and citation evidence to detect significant plagiarism cases in scientific publications

Salha Alzahrani, +3 more

- 01 Feb 2012

- Journal of the Association for Informati...

TL;DR: An empirical study on the system's response shows that structural information, unlike existing plagiarism detectors, helps to flag significant plagiarism cases, improve the similarity index, and provide human-like plagiarism screening results.

...read moreread less

29

•Proceedings Article•10.1109/VAST.2010.5652926

Visual readability analysis: How to make your writings easier to read

Daniela Oelke, +3 more

- 10 Dec 2010

TL;DR: A semi-automatic feature selection approach is discussed that is used to choose appropriate measures from a collection of 141 candidate readability features and the visual analysis tool VisRA is presented, which allows the user to analyze the feature values across the text and within single sentences.

...read moreread less

25

•Proceedings Article•10.1145/3474085.3475541

CanvasEmb: Learning Layout Representation with Large-scale Pre-training for Graphic Design

Yuxi Xie, +3 more

- 17 Oct 2021

TL;DR: CanvasEmb as mentioned in this paper pre-trains deep representations from unlabeled graphic designs by jointly conditioning on all the context elements in a canvas, with a multidimensional feature encoder and a multi-task learning objective.

...read moreread less

20

...

Expand

References

•Book

Introduction to Information Retrieval

Christopher D. Manning, +2 more

- 01 Jan 2008

TL;DR: In this article, the authors present an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections.

...read moreread less

13.1K

Journal Article•10.1109/34.824820

Twenty years of document image analysis in PAMI

George Nagy

- 01 Jan 2000

- IEEE Transactions on Pattern Analysis an...

TL;DR: The contributions to document image analysis of 99 papers published in the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) are clustered, summarized, interpolated, interpreted, and evaluated.

...read moreread less

580

Journal Article•10.1038/SCIENTIFICAMERICAN0471-56

Advances in Pattern Recognition

Richard G. Casey, +1 more

- 01 Apr 1971

- Scientific American

TL;DR: This paper proposes training costsentitive neural networks with editing techniques for handling the class imbalance problem on multi-class datasets and proposes a strategy to remove majority samples while compensating theclass imbalance during the training process.

...read moreread less

304

Journal Article•10.1038/scientificamerican0471-56

Advances in Pattern Recognition

R. Casey, +1 more

- 01 Apr 1971

TL;DR: Several areas of application as well as different types of requirements and constraints are discussed in this chapter prior to the presentation of the methods in the rest of the book.

...read moreread less

283

•Book Chapter•10.1007/978-1-84628-726-8_2

Document Structure and Layout Analysis

Anoop M. Namboodiri, +1 more

- 01 Jan 2007

TL;DR: Automatic analysis of an arbitrary document with complex layout is an extremely difficult task and is beyond the capabilities of the state-of-the-art document structure and layout analysis systems.

...read moreread less

114