Top 24 papers presented at Document Analysis Systems in 1996

Proceedings Article•

Document Image skew Detection: Survey and Annotated Bibliography.

[...]

1 Jan 1996

TL;DR: Algorithms that estimate the angle at which a document image is rotated (called a document’s skew) are surveyed and the contributions of individual algorithms within each class are discussed.

...read moreread less

Abstract: Algorithms that estimate the angle at which a document image is rotated (called a document’s skew) are surveyed. Four broad classes of technique are identified. These include methods that calculate skew from a horizontal projection profile, a distribution of feature locations, a Hough transform, or the distribution of responses from local, directionally sensitive masks. The basic method used by each class of technique is presented and the contributions of individual algorithms within each class are discussed.

...read moreread less

175 citations

Proceedings Article•

The Development of a General Framework for Intelligent Document Image Retrieval.

[...]

David Doermann, Jaakko Sauvola, Hannu Kauniskangas, Christian K. Shin, Matti Pietikäinen, Azriel Rosenfeld - Show less +2 more

1 Jan 1996

TL;DR: The general framework, feature extraction modules, query capabilities, a graphical query interface, and the application interface are introduced and each component of the system is demonstrated and how the query mechanisms can be used to handle both content and structural queries eeectively.

...read moreread less

Abstract: Work has recently begun on a joint project between the Universities of Maryland and Oulu on the development of a system for Intelligent Document Image Retrieval (IDIR). The IDIR system will provide close connections with and utilization of document analysis and image processing techniques, advanced computing and networking, and modern approaches to database management. The system design consists of aggressively modularized components to enhance the development of individual parts which are used in the complete solution, including: Interface speciications, multipurpose feature extraction, an integrated eecient query language, physical retrieval from an object-oriented database, and delivery of retrieved objects. In this paper, we introduce the general framework, feature extraction modules, query capabilities, a graphical query interface, and the application interface. We demonstrate each component of the system and how the query mechanisms can be used to handle both content and structural queries eeectively.

...read moreread less

66 citations

Proceedings Article•

Language identification in Complex, Unoriented, and Degraded Document Images.

[...]

Dar-Shyang Lee, Craig R. Nohl, Henry S. Baird

1 Jan 1996

TL;DR: Algorithms for identifying the language of text in document images which are complex, unoriented, and degraded are described and a variety of decision procedures are used.

...read moreread less

Abstract: We describe algorithms for identifying the language of text in document images which are complex, unoriented, and degraded. We distinguish among seven lan-page layouts may be complex, containing text blocks in unknown roughly Manhat-tan arrangements. The pages may be unoriented, that is, upright or rotated by 90, 180, or 270 degrees. The images may be degraded by digitization at coarse and unequal spatial sampling rates as in FAXes. We begin by segmenting the page into text lines in a manner oblivious to page skew and both page and text-line orientation. Then we distinguish between Asian and Latin scripts at any orientation. Chinese versus Japanese is decided at any orientation, and then their orientation is detected. On Latin scripts, we detect rst orientation and then language. A variety of decision procedures are used, some hand-crafted (e.g. using spatial features and optical density distributions) and others trainable (e.g. using word unigram relative entropy models). Tests on 1088 standard (low) resolution FAX images show that our method accurately identiies scripts (98.16%), and language and page orientations (94.76%).

...read moreread less

49 citations

Proceedings Article•

Document Analysis and the World Wide Web.

[...]

Daniel P. Lopresti, Jiangying Zhou

1 Jan 1996

38 citations

Proceedings Article•

A Multi-Layered Corroboration-Based Check Reader.

[...]

Gilles Houle, David Aragon, Robert W. Smith, Malayappan Shridhar, D. Kimura - Show less +1 more

1 Jan 1996

32 citations

Proceedings Article•

Advances in Document Classification by Voting of Competitive Approaches.

[...]

Claudia Wenzel, Stephan Baumann, Thorsten Jäger

1 Jan 1996

TL;DR: This paper presents a complex approach for the content-based text categorization of printed German business letters into pre-defined message types such as order, invoice, offer, etc.

...read moreread less

Abstract: This paper presents a complex approach for the content-based text categorization of printed German business letters into pre-defined message types such as order, invoice, offer, etc. The categorization results of two competing classifiers are combined by means of a voting component embodying knowledge about the strengths and weaknesses of the classifiers. The individual classifiers differ strongly in their basic assumptions: While the first one considers layout and typographic information with respect to certain keywords the second one is a more conventional text categorization approach which merely incorporates textual features. Since this whole categorization tool is embedded into a document analysis system, a highly precise classification is essential for a subsequent goal-directed extraction of structured information aimed at the integration of the document into the current business workflow of a company.

...read moreread less

18 citations

Proceedings Article•

A System for the Automated Evaluation of Invoices.

[...]

Mario Köppen, Dörte Waldöstl, Bertram Nickolay

1 Jan 1996

TL;DR: The system runs as a C++ class library under the OS/2 operating system and shows encouraging recognition results.

...read moreread less

Abstract: This paper presents a system for the automated evaluation of invoices. The purpose of the system is to detect and recognize price entries of item tables in invoices. Due to the lack of a layout model of all possible invoices the system is composed of several processing stages. These stages are: text stripe extraction and skew detection by a combination of mathematical morphology and heuristic search; orientation detection by a new approach called \row-delta-histogram"; the optical character recognition of text stripes using a multilayer backpropagation network as a classi er and fractal based Peano features; table extraction which uses a genetic algorithm to adapt a suitable table row template; price entry extraction by using the context of optical character recognition. Also, the system comprises further functionalities, to name a few: reprocessing of all processing stages with di erent parameter settings, font typeface estimation and evaluation of divider information. The system runs as a C++ class library under the OS/2 operating system and shows encouraging recognition results.

...read moreread less

15 citations

Proceedings Article•

Formclas - a System for OCR Free identification of Forms.

[...]

Frank Dubiel, Andreas Dengel

1 Jan 1996

12 citations

Proceedings Article•

Integrated Multi-Agent Architecture for Assisted Document Recognition.

[...]

Frédéric Bapst, Rolf Brugger, Abdel Wahab Zramdini, Rolf Ingold

1 Jan 1996

TL;DR: This work proposes a concrete framework built upon existing software pieces, and following the multi-agent paradigm, which serves as the main document management package and a prototype that interactively recognizes the entire physical structure has been developed.

...read moreread less

Abstract: In the context of a new project around structured document recognition, we address the problem of designing a software architecture which is able to integrate all the necessary, but heterogeneous know-how. Starting from the new needs brought by the CIDRE project, we propose a concrete framework built upon existing software pieces, and following the multi-agent paradigm. DAFS serves as the main document management package. The computational engine is written on a distributed and multi-threaded platform, and an original coupling with the GUI is presented. To demonstrate the validity of the approach, a prototype that interactively recognizes the entire physical structure has been developed.

...read moreread less

11 citations

Proceedings Article•

Priming the recognizer.

[...]

George Nagy, Yihong Xu

1 Jan 1996

11 citations

Proceedings Article•

Docbrowse: a System for Textual and Graphical Querying on Degraded Document Image Data.

[...]

Mysore Y. Jaisimha, Andrew G. Bruce, Thien Nguyen

1 Jan 1996

Proceedings Article•

Adaptive Coordination of Multiple Classifiers.

[...]

Tin Kam Ho

1 Jan 1996

Proceedings Article•

Clustering and error-Correcting Matching of graphs for Learning and Recognition of Symbols in Engineering Drawings.

[...]

Bruno T. Messmer, Horst Bunke

1 Jan 1996

Proceedings Article•

Language-Independent and Segmentation-Free Techniques for Optical Character Recognition.

[...]

John Makhoul, Richard Schwartz, Christopher LaPre, Christopher Raphael, Issam Bazzi - Show less +1 more

1 Jan 1996

Proceedings Article•

A System for Text Recognition Based on Graph Embedding Matching.

[...]

Hongwei Shi, Theodosios Pavlidis

1 Jan 1996

Proceedings Article•

Document Understanding System for Multiple Document Representations.

[...]

Suzanne Liebowitz Taylor, Mark Lipshutz

1 Jan 1996

Proceedings Article•

Documents on the Move: Da&ir-Driven Mail Piece Processing Today and Tomorrow.

[...]

Udo Miletzki

1 Jan 1996

Proceedings Article•

Evaluating the Performance of Techniques for the Extraction of Primitives from Line Drawings Composed of Horizontal and Vertical Lines.

[...]

Juan F. Arias, Rangachar Kasturi, Atul K. Chhabra

1 Jan 1996

Proceedings Article•

Document de-Blurring using Maximum likelihood Methods.

[...]

Theodosios Pavlidis

1 Jan 1996

Proceedings Article•

A Prototype for Extracting Logical Elements from Tables of Contents of journals.

[...]

Tao Hu, Katsumi Marukawa, Yoshihiro Shima, Hiromichi Fujisawa

1 Jan 1996

Proceedings Article•

Information Extraction from Tax Assessment Forms.

[...]

Heike Mogg-Schneider, Claus Aufmuth

1 Jan 1996

Proceedings Article•

Evaluating Japanese Document Recognition in the Internet/Intranet Environment.

[...]

Tao Hong, ShuFang Wu, Sargur N. Srihari

1 Jan 1996

Proceedings Article•

Prediction of OCR accuracy using a Neural Network.

[...]

Juan González, Junichi Kanai, Thomas A. Nartker

1 Jan 1996

TL;DR: In this project, multilayer perceptrons were trained to predict the character accuracy performance of two OCR systems using the backpropagation training method, and results show that a prediction system can reduce the total cost of converting a set of documents.

...read moreread less

Abstract: A method for predicting the accuracy achieved by an OCR system on an input image is presented. It is assumed that there is an ideal prediction function. A neural network is trained to estimate the unknown ideal function. In this project, multilayer perceptrons were trained to predict the character accuracy performance of two OCR systems using the backpropagation training method. The results show that this approach is sound. The feasibility of using an accuracy prediction system as a lter to discriminate good quality images (for OCR) from poor quality images (for manual keying) was also examined using a cost model of a large-scale document conversion process. Results show that a prediction system can reduce the total cost of converting a set of documents.

...read moreread less

Proceedings Article•

Spam: a Scientific Paper Access Method.

[...]

A. Lawrence Spitz

1 Jan 1996

TL;DR: A holder and dispenser for razor blades of the type having double cutting edges and a central longitudinal slot and adapted to be discharged one at a time from the end of the holder to which the particular blade is oriented.

...read moreread less

Abstract: A holder and dispenser for razor blades of the type having double cutting edges and a central longitudinal slot, the blades being arranged in a stack in alternately longitudinally offset relation and adapted to be discharged one at a time from the end of the holder to which the particular blade is oriented, the blades being mounted on three lugs or vertical ribs comprising a left blade guide lug, a right blade guide lug and an intermediate blade retaining lug, the inner ends of the blades all being looped over the blade retaining lug in overlapping relation the three lugs each being elastically mounted and depressible independently of the other two.

...read moreread less

Showing papers presented at "Document Analysis Systems in 1996"

Document Image skew Detection: Survey and Annotated Bibliography.

The Development of a General Framework for Intelligent Document Image Retrieval.

Language identification in Complex, Unoriented, and Degraded Document Images.

Document Analysis and the World Wide Web.

A Multi-Layered Corroboration-Based Check Reader.

Advances in Document Classification by Voting of Competitive Approaches.

A System for the Automated Evaluation of Invoices.

Formclas - a System for OCR Free identification of Forms.

Integrated Multi-Agent Architecture for Assisted Document Recognition.

Priming the recognizer.

Docbrowse: a System for Textual and Graphical Querying on Degraded Document Image Data.

Adaptive Coordination of Multiple Classifiers.

Clustering and error-Correcting Matching of graphs for Learning and Recognition of Symbols in Engineering Drawings.

Language-Independent and Segmentation-Free Techniques for Optical Character Recognition.

A System for Text Recognition Based on Graph Embedding Matching.

Document Understanding System for Multiple Document Representations.

Documents on the Move: Da&ir-Driven Mail Piece Processing Today and Tomorrow.

Evaluating the Performance of Techniques for the Extraction of Primitives from Line Drawings Composed of Horizontal and Vertical Lines.

Document de-Blurring using Maximum likelihood Methods.

A Prototype for Extracting Logical Elements from Tables of Contents of journals.

Information Extraction from Tax Assessment Forms.

Evaluating Japanese Document Recognition in the Internet/Intranet Environment.

Prediction of OCR accuracy using a Neural Network.

Spam: a Scientific Paper Access Method.