TL;DR: Algorithms that estimate the angle at which a document image is rotated (called a document’s skew) are surveyed and the contributions of individual algorithms within each class are discussed.
Abstract: Algorithms that estimate the angle at which a document image is rotated (called a document’s skew) are surveyed. Four broad classes of technique are identified. These include methods that calculate skew from a horizontal projection profile, a distribution of feature locations, a Hough transform, or the distribution of responses from local, directionally sensitive masks. The basic method used by each class of technique is presented and the contributions of individual algorithms within each class are discussed.
TL;DR: The general framework, feature extraction modules, query capabilities, a graphical query interface, and the application interface are introduced and each component of the system is demonstrated and how the query mechanisms can be used to handle both content and structural queries eeectively.
Abstract: Work has recently begun on a joint project between the Universities of Maryland and Oulu on the development of a system for Intelligent Document Image Retrieval (IDIR). The IDIR system will provide close connections with and utilization of document analysis and image processing techniques, advanced computing and networking, and modern approaches to database management. The system design consists of aggressively modularized components to enhance the development of individual parts which are used in the complete solution, including: Interface speciications, multipurpose feature extraction, an integrated eecient query language, physical retrieval from an object-oriented database, and delivery of retrieved objects. In this paper, we introduce the general framework, feature extraction modules, query capabilities, a graphical query interface, and the application interface. We demonstrate each component of the system and how the query mechanisms can be used to handle both content and structural queries eeectively.
TL;DR: Algorithms for identifying the language of text in document images which are complex, unoriented, and degraded are described and a variety of decision procedures are used.
Abstract: We describe algorithms for identifying the language of text in document images which are complex, unoriented, and degraded. We distinguish among seven lan-page layouts may be complex, containing text blocks in unknown roughly Manhat-tan arrangements. The pages may be unoriented, that is, upright or rotated by 90, 180, or 270 degrees. The images may be degraded by digitization at coarse and unequal spatial sampling rates as in FAXes. We begin by segmenting the page into text lines in a manner oblivious to page skew and both page and text-line orientation. Then we distinguish between Asian and Latin scripts at any orientation. Chinese versus Japanese is decided at any orientation, and then their orientation is detected. On Latin scripts, we detect rst orientation and then language. A variety of decision procedures are used, some hand-crafted (e.g. using spatial features and optical density distributions) and others trainable (e.g. using word unigram relative entropy models). Tests on 1088 standard (low) resolution FAX images show that our method accurately identiies scripts (98.16%), and language and page orientations (94.76%).
TL;DR: This paper presents a complex approach for the content-based text categorization of printed German business letters into pre-defined message types such as order, invoice, offer, etc.
Abstract: This paper presents a complex approach for the content-based text categorization of printed German business letters into pre-defined message types such as order, invoice, offer, etc. The categorization results of two competing classifiers are combined by means of a voting component embodying knowledge about the strengths and weaknesses of the classifiers. The individual classifiers differ strongly in their basic assumptions: While the first one considers layout and typographic information with respect to certain keywords the second one is a more conventional text categorization approach which merely incorporates textual features. Since this whole categorization tool is embedded into a document analysis system, a highly precise classification is essential for a subsequent goal-directed extraction of structured information aimed at the integration of the document into the current business workflow of a company.
TL;DR: The system runs as a C++ class library under the OS/2 operating system and shows encouraging recognition results.
Abstract: This paper presents a system for the automated evaluation of invoices. The purpose of the system is to detect and recognize price entries of item tables in invoices. Due to the lack of a layout model of all possible invoices the system is composed of several processing stages. These stages are: text stripe extraction and skew detection by a combination of mathematical morphology and heuristic search; orientation detection by a new approach called \row-delta-histogram"; the optical character recognition of text stripes using a multilayer backpropagation network as a classi er and fractal based Peano features; table extraction which uses a genetic algorithm to adapt a suitable table row template; price entry extraction by using the context of optical character recognition. Also, the system comprises further functionalities, to name a few: reprocessing of all processing stages with di erent parameter settings, font typeface estimation and evaluation of divider information. The system runs as a C++ class library under the OS/2 operating system and shows encouraging recognition results.
TL;DR: This work proposes a concrete framework built upon existing software pieces, and following the multi-agent paradigm, which serves as the main document management package and a prototype that interactively recognizes the entire physical structure has been developed.
Abstract: In the context of a new project around structured document recognition, we address the problem of designing a software architecture which is able to integrate all the necessary, but heterogeneous know-how. Starting from the new needs brought by the CIDRE project, we propose a concrete framework built upon existing software pieces, and following the multi-agent paradigm. DAFS serves as the main document management package. The computational engine is written on a distributed and multi-threaded platform, and an original coupling with the GUI is presented. To demonstrate the validity of the approach, a prototype that interactively recognizes the entire physical structure has been developed.
TL;DR: In this project, multilayer perceptrons were trained to predict the character accuracy performance of two OCR systems using the backpropagation training method, and results show that a prediction system can reduce the total cost of converting a set of documents.
Abstract: A method for predicting the accuracy achieved by an OCR system on an input image is presented. It is assumed that there is an ideal prediction function. A neural network is trained to estimate the unknown ideal function. In this project, multilayer perceptrons were trained to predict the character accuracy performance of two OCR systems using the backpropagation training method. The results show that this approach is sound. The feasibility of using an accuracy prediction system as a lter to discriminate good quality images (for OCR) from poor quality images (for manual keying) was also examined using a cost model of a large-scale document conversion process. Results show that a prediction system can reduce the total cost of converting a set of documents.
TL;DR: A holder and dispenser for razor blades of the type having double cutting edges and a central longitudinal slot and adapted to be discharged one at a time from the end of the holder to which the particular blade is oriented.
Abstract: A holder and dispenser for razor blades of the type having double cutting edges and a central longitudinal slot, the blades being arranged in a stack in alternately longitudinally offset relation and adapted to be discharged one at a time from the end of the holder to which the particular blade is oriented, the blades being mounted on three lugs or vertical ribs comprising a left blade guide lug, a right blade guide lug and an intermediate blade retaining lug, the inner ends of the blades all being looped over the blade retaining lug in overlapping relation the three lugs each being elastically mounted and depressible independently of the other two.