Journal Article10.1142/S0218001488000388
Multifont character recognition for typeset documents
15
TL;DR: The design of a multifont character recognizer which uses a binary decision tree to classify a character on the basis of 197 geometric features is described, which was highly sensitive to typeface and error rates varied between 10 percent and 0.1 percent.
read more
Abstract: An optical character reader for processing typeset documents must be able to handle proportional spacing, the presence of touching characters and a wide variety of type fonts. This paper describes the design of a multifont character recognizer which uses a binary decision tree to classify a character on the basis of 197 geometric features. The algorithm for designing the decision tree is based upon an entropy minimization procedure, and makes no assumptions on the distribution or independence of the binary features. The decision tree classifier provides confidence measures which may be used to reduce the substitution error rate at the expense of higher rejection rates. Methods of reducing the overall error rate by combining the decision tree classifier with other classifiers were examined. In particular, the paper evaluates the performance of a classifier using a combination of multiple decision trees, template matching and contextual post-processing. Error rates were highly sensitive to typeface and varied between 10 percent and 0.1 percent. Computer processing times for the various stages of the system are presented.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
At the frontiers of OCR
George Nagy
- 01 Jul 1992
TL;DR: It is argued that it is time for a major change of approach to optical character recognition (OCR) research, and new OCR systems should take advantage of the typographic uniformity of paragraphs or other layout components.
131
Multiple binary decision tree classifiers
TL;DR: This paper demonstrates that higher classification accuracies can be obtained from the same training set by using a combination of decision trees and by reaching a consensus using Dempster and Shafer's theory of evidence.
76
Document image analysis: a bibliography
Rangachar Kasturi,Lawrence O'Gorman +1 more
- 01 Jul 1992
TL;DR: Computer Vision, Graphics, and Image Processing (CVGIP) CVGIP: Graphical Models and Image processing (CVgIP GMIP) and CVgIP: Image Understanding (cvGIP IU)
49
Hidden markov models in text recognition
Julian C. Anigbogu,Abdel Belaïd +1 more
TL;DR: A multi-level multifont character recognition system that uses combinations of stochastic and dictionary verification methods for word recognition and error-correction and a majority-vote system that polls the other systems for advice before deciding on the identity of a character.
22
A binary-tree-based OCR technique for machine-printed characters
TL;DR: The proposed OCR system is trained for Latin and Greek typewritten text, but it can be easily adapted to any typewritten character set, and the recognition rate can exceed 99.5%.
12