Journal Article10.1109/TPAMI.2004.1261096
Online handwritten script recognition
Anoop M. Namboodiri,Anil K. Jain +1 more
TL;DR: A method to classify words and lines in an online handwritten document into one of the six major scripts: Arabic, Cyrillic, Devnagari, Han, Hebrew, or Roman is proposed.
read more
Abstract: Automatic identification of handwritten script facilitates many important applications such as automatic transcription of multilingual documents and search for documents on the Web containing a particular script. The increase in usage of handheld devices which accept handwritten input has created a growing demand for algorithms that can efficiently analyze and retrieve handwritten data. This paper proposes a method to classify words and lines in an online handwritten document into one of the six major scripts: Arabic, Cyrillic, Devnagari, Han, Hebrew, or Roman. The classification is based on 11 different spatial and temporal features extracted from the strokes of the words. The proposed system attains an overall classification accuracy of 87.1 percent at the word level with 5-fold cross validation on a data set containing 13,379 words. The classification accuracy improves to 95 percent as the number of words in the test sample is increased to five, and to 95.5 percent for complete text lines consisting of an average of seven words.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Script Recognition—A Review
TL;DR: An overview of the different script identification methodologies under each of the two broad categories-structure-based and visual-appearance-based techniques is given.
HMM-Based Lexicon-Driven and Lexicon-Free Word Recognition for Online Handwritten Indic Scripts
A. Bharath,Sriganesh Madhvanath +1 more
TL;DR: This paper proposes two different techniques for word recognition based on Hidden Markov Models (HMM): lexicon driven and lexicon free, which significantly outperforms either of them used in isolation on handwritten Devanagari word samples.
122
End-to-End Online Writer Identification With Recurrent Neural Network
TL;DR: This paper proposes an end-to-end framework for online text-independent writer identification by using a recurrent neural network (RNN) to represent the handwriting data of a particular writer by set of random hybrid strokes.
120
On-line Arabic handwriting recognition system based on visual encoding and genetic algorithm
TL;DR: A handwriting recognition system based on visual coding and genetic algorithm ''GA'' applied on Arabic script and the results obtained prove that the new method based on hybridization between visual codes and GA is a powerful method.
80
A Review of Research on Devnagari Character Recognition
Vikas J. Dongre,Vijay H. Mankar +1 more
TL;DR: An overview of DOCR systems is presented and the available DOCR techniques are reviewed, and the current status ofDOCR is discussed and directions for future research are suggested.
References
Pattern Classification and Scene Analysis
TL;DR: We provide a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition.
12.5K
Feature selection: evaluation, application, and small sample performance
Anil K. Jain,D. Zongker +1 more
TL;DR: This work studies the problem of choosing an optimal feature set for land use classification based on SAR satellite images using four different texture models and shows that pooling features derived from different texture Models, followed by a feature selection results in a substantial improvement in the classification accuracy.
Determination of the script and language content of document images
TL;DR: This work has developed techniques for distinguishing which language is represented in an image of text using a technique based on character shape codes, a representation of Latin text that is inexpensive to compute.
294
Automatic script identification from document images using cluster-based templates
TL;DR: An automated script identification system for typeset document images that processes thirteen scripts with minimal preprocessing and high accuracy.
218
Script line separation from Indian multi-script documents
Umapada Pal,Bidyut B. Chaudhuri +1 more
- 20 Sep 1999
TL;DR: In this paper, an automatic technique of separating the text lines using script characteristics and shape based features is presented and has an overall accuracy of about 98.5%.
138