Top 93 papers presented at Document Analysis Systems in 2012

Showing papers presented at "Document Analysis Systems in 2012"

Proceedings Article•10.1109/DAS.2012.22•

Automatic Room Detection and Room Labeling from Architectural Floor Plans

[...]

Sheraz Ahmed¹, Marcus Liwicki¹, Markus Weber¹, Andreas Dengel¹•Institutions (1)

German Research Centre for Artificial Intelligence¹

27 Mar 2012

TL;DR: An automatic system for analyzing and labeling architectural floor plans that could clearly outperform other state-of-the-art approaches for room detection and split rooms into several sub-regions if several semantic rooms share the same physical room.

...read moreread less

Abstract: This paper presents an automatic system for analyzing and labeling architectural floor plans. In order to detect the locations of the rooms, the proposed systems extracts both, structural and semantic information from given floor plans. Furthermore, OCR is applied on the text layer to retrieve the meaningful room labeling. Finally, a novel post-processing is proposed to split rooms into several sub-regions if several semantic rooms share the same physical room. Our fully automatic system is evaluated on a publicly available dataset of architectural floor plans. In our experiments, we could clearly outperform other state-of-the-art approaches for room detection.

...read moreread less

121 citations

Proceedings Article•10.1109/DAS.2012.61•

Offline handwritten English character recognition based on convolutional neural network

[...]

Aiquan Yuan¹, Gang Bai¹, Lijing Jiao¹, Yajie Liu¹•Institutions (1)

College of Information Technology¹

27 Mar 2012

TL;DR: This paper applies Convolutional Neural Networks for offline handwritten English character recognition using a modified LeNet-5 CNN model, with special settings of the number of neurons in each layer and the connecting way between some layers.

...read moreread less

Abstract: This paper applies Convolutional Neural Networks (CNNs) for offline handwritten English character recognition. We use a modified LeNet-5 CNN model, with special settings of the number of neurons in each layer and the connecting way between some layers. Outputs of the CNN are set with error-correcting codes, thus the CNN has the ability to reject recognition results. For training of the CNN, an error-samples-based reinforcement learning strategy is developed. Experiments are evaluated on UNIPEN lowercase and uppercase datasets, with recognition rates of 93.7% for uppercase and 90.2% for lowercase, respectively.

...read moreread less

102 citations

Proceedings Article•10.1109/DAS.2012.29•

Dataset, Ground-Truth and Performance Metrics for Table Detection Evaluation

[...]

Jing Fang¹, Xin Tao¹, Zhi Tang¹, Ruiheng Qiu¹, Ying Liu² - Show less +1 more•Institutions (2)

Peking University¹, KAIST²

27 Mar 2012

TL;DR: A dataset that is representative, large and most importantly, publicly available, and the compatible format of the ground truth makes evaluation independent of document medium is provided.

...read moreread less

Abstract: Table detection is an important task in the field of document analysis. It has been extensively studied since a couple of decades. Various kinds of document mediums are involved, from scanned images to web pages, from plain texts to PDF files. Numerous algorithms published bring up a challenging issue: how to evaluate algorithms in different context. Currently, most work on table detection conducts experiments on their in-house dataset. Even the few sources of online datasets are targeted at image documents only. Moreover, Precision and recall measurement are usual practice in order to account performance based on human evaluation. In this paper, we provide a dataset that is representative, large and most importantly, publicly available. The compatible format of the ground truth makes evaluation independent of document medium. We also propose a set of new measures, implement them, and open the source code. Finally, three existing table detection algorithms are evaluated to demonstrate the reliability of the dataset and metrics.

...read moreread less

102 citations

Proceedings Article•10.1109/DAS.2012.99•

Writer Retrieval and Writer Identification Using Local Features

[...]

Stefan Fiel¹, Robert Sablatnig¹•Institutions (1)

Vienna University of Technology¹

27 Mar 2012

TL;DR: The proposed method for writer retrieval and writer identification using local features and therefore the proposed method is not dependent on a binarization step and outperforms previous methods.

...read moreread less

Abstract: Writer identification determines the writer of one document among a number of known writers where at least one sample is known. Writer retrieval searches all documents of one particular writer by creating a ranking of the similarity of the handwriting in a dataset. This paper presents a method for writer retrieval and writer identification using local features and therefore the proposed method is not dependent on a binarization step. First the local features of the image are calculated and with the help of a predefined codebook an occurrence histogram can be created. This histogram is compared to determine the identity of the writer or the similarity of other handwritten documents. The proposed method has been evaluated on two datasets, namely the IAM dataset which contains 650 writers and the Trigraph Slant dataset which contains 47 writers. Experiments have shown that it can keep up with previous writer identification approaches. Regarding writer retrieval it outperforms previous methods.

...read moreread less

74 citations

Proceedings Article•10.1109/DAS.2012.6•

A New Method for Arbitrarily-Oriented Text Detection in Video

[...]

Nabin Sharma¹, Palaiahnakote Shivakumara², Umapada Pal, Michael Blumenstein¹, Chew Lim Tan² - Show less +1 more•Institutions (2)

Griffith University¹, National University of Singapore²

27 Mar 2012

TL;DR: The proposed method outperforms the existing method in terms of recall and f-measure and results in extraction of arbitrarily-oriented text from the video frame.

...read moreread less

Abstract: Text detection in video frames plays a vital role in enhancing the performance of information extraction systems because the text in video frames helps in indexing and retrieving video efficiently and accurately. This paper presents a new method for arbitrarily-oriented text detection in video, based on dominant text pixel selection, text representatives and region growing. The method uses gradient pixel direction and magnitude corresponding to Sobel edge pixels of the input frame to obtain dominant text pixels. Edge components in the Sobel edge map corresponding to dominant text pixels are then extracted and we call them text representatives. We eliminate broken segments of each text representatives to get candidate text representatives. Then the perimeter of candidate text representatives grows along the text direction in the Sobel edge map to group the neighboring text components which we call word patches. The word patches are used for finding the direction of text lines and then the word patches are expanded in the same direction in the Sobel edge map to group the neighboring word patches and to restore missing text information. This results in extraction of arbitrarily-oriented text from the video frame. To evaluate the method, we considered arbitrarily-oriented data, non-horizontal data, horizontal data, Hua's data and ICDAR-2003 competition data (Camera images). The experimental results show that the proposed method outperforms the existing method in terms of recall and f-measure.

...read moreread less

59 citations

Proceedings Article•10.1109/DAS.2012.23•

Binarization-Free Text Line Segmentation for Historical Documents Based on Interest Point Clustering

[...]

Angelika Garz¹, Andreas Fischer, Robert Sablatnig¹, Horst Bunke•Institutions (1)

Vienna University of Technology¹

27 Mar 2012

TL;DR: A novel binarization-free line segmentation method that is robust to noise and copes with overlapping and touching text lines is presented that shows promising results for real-world applications in terms of both accuracy and efficiency.

...read moreread less

Abstract: Segmenting page images into text lines is a crucial pre-processing step for automated reading of historical documents. Challenging issues in this open research field are given \eg by paper or parchment background noise, ink bleed-through, artifacts due to aging, stains, and touching text lines. In this paper, we present a novel binarization-free line segmentation method that is robust to noise and copes with overlapping and touching text lines. First, interest points representing parts of characters are extracted from gray-scale images. Next, word clusters are identified in high-density regions and touching components such as ascenders and descenders are separated using seam carving. Finally, text lines are generated by concatenating neighboring word clusters, where neighborhood is defined by the prevailing orientation of the words in the document. An experimental evaluation on the Latin manuscript images of the Saint Gall database shows promising results for real-world applications in terms of both accuracy and efficiency.

...read moreread less

56 citations

Proceedings Article•10.1109/DAS.2012.72•

Recent Advances in Video Based Document Processing: A Review

[...]

Nabin Sharma¹, Umapada Pal, Michael Blumenstein¹•Institutions (1)

Griffith University¹

27 Mar 2012

TL;DR: This paper presents a review of various state-of-the-art techniques proposed towards different stages (e.g. detection, localization, extraction, etc.) of text information processing in video frames.

...read moreread less

Abstract: Extraction and recognition of text present in video has become a very popular research area in the last decade. Generally, text present in video frames is of different size, orientation, style, etc. with complex backgrounds, noise, low resolution and contrast. These factors make the automatic text extraction and recognition in video frames a challenging task. A large number of techniques have been proposed by various researchers in the recent past to address the problem. This paper presents a review of various state-of-the-art techniques proposed towards different stages (e.g. detection, localization, extraction, etc.) of text information processing in video frames. Looking at the growing popularity and the recent developments in the processing of text in video frames, this review imparts details of current trends and potential directions for further research activities to assist researchers.

...read moreread less

49 citations

Proceedings Article•10.1109/DAS.2012.54•

Logo Retrieval in Document Images

[...]

Rajiv Jain¹, David Doermann¹•Institutions (1)

University of Maryland, College Park¹

27 Mar 2012

TL;DR: A scalable algorithm for segmentation free logo retrieval in document images using the use of the SURF feature for logo retrieval, a novel indexing algorithm for efficient retrieval and a method to filter results using the orientation of local features and geometric constraints.

...read moreread less

Abstract: This paper presents a scalable algorithm for segmentation free logo retrieval in document images. The contributions include the use of the SURF feature for logo retrieval, a novel indexing algorithm for efficient retrieval and a method to filter results using the orientation of local features and geometric constraints. Results demonstrate that logo retrieval can be performed with high accuracy and efficiently scaled to a large datasets.

...read moreread less

49 citations

Proceedings Article•10.1109/DAS.2012.86•

Text Independent Writer Identification for Oriya Script

[...]

Sukalpa Chanda¹, Katrin Franke¹, Umapada Pal²•Institutions (2)

Gjøvik University College¹, Indian Statistical Institute²

27 Mar 2012

TL;DR: A writer identification system for Oriya script is proposed which is capable of performing reasonably well even with small amount of text, and experiments with curvature feature are reported here.

...read moreread less

Abstract: Automatic identification of an individual based on his/her handwriting characteristics is an important forensic tool. In a computational forensic scenario, presence of huge amount of text/information in a questioned document cannot be ensured. Lack of data threatens system reliability in such cases. We here propose a writer identification system for Oriya script which is capable of performing reasonably well even with small amount of text. Experiments with curvature feature are reported here, using Support Vector Machine (SVM) as classifier. We got promising results of 94.00% writer identification accuracy at first top choice and 99% when considering first three top choices.

...read moreread less

46 citations

Proceedings Article•10.1109/DAS.2012.16•

An Effective Staff Detection and Removal Technique for Musical Documents

[...]

Bolan Su¹, Shijian Lu², Umapada Pal, Chew Lim Tan¹•Institutions (2)

National University of Singapore¹, Institute for Infocomm Research Singapore²

27 Mar 2012

TL;DR: An effective staff line detection and removal method that makes use of the global information of the musical document and models the staff line shape that is simple, robust, and involves few parameters is proposed.

...read moreread less

Abstract: Musical staff line detection and removal techniques detect the staff positions in musical documents and segment musical score from musical documents by removing those staff lines. It is an important preprocessing step for ensuing the Optical Music Recognition tasks. This paper proposes an effective staff line detection and removal method that makes use of the global information of the musical document and models the staff line shape. It first estimates the staff height and space, and then models the shape of the staff line by examining the orientation of the staff pixels. At last the estimated model is used to find out the location of staff lines and hence to remove those detected staff lines. The proposed technique is simple, robust, and involves few parameters. It has been tested on the dataset of the recent staff removal competition held under the International Conference of Document Analysis and Recognition(ICDAR) 2011. Experimental results show the effectiveness and robustness of our proposed technique on musical documents with various types of deformations.

...read moreread less

41 citations

Proceedings Article•10.1109/DAS.2012.20•

Arabic Handwritten Text Line Extraction by Applying an Adaptive Mask to Morphological Dilation

[...]

Muna Khayyat¹, Louisa Lam¹, Ching Y. Suen¹, Fei Yin, Cheng-Lin Liu - Show less +1 more•Institutions (1)

Concordia University¹

27 Mar 2012

TL;DR: This paper uses morphological dilation with a dynamic adaptive mask for line extraction using the CENPARMI Arabic handwritten documents database which contains multi-skewed and touching lines to demonstrate the effectiveness of this approach.

...read moreread less

Abstract: This paper presents a robust method for handwritten text line extraction. We use morphological dilation with a dynamic adaptive mask for line extraction. Line separation occurs because of the repulsion and attraction between connected components. The characteristics of the Arabic script are considered to ensure a high performance of the algorithm. Our method is evaluated on the CENPARMI Arabic handwritten documents database which contains multi-skewed and touching lines. With a matching score of 0.95, our method achieved precision and recall rates of 96:3% and 96:7% respectively, which demonstrate the effectiveness of our approach.

...read moreread less

Proceedings Article•10.1109/DAS.2012.26•

Combining Multi-scale Character Recognition and Linguistic Knowledge for Natural Scene Text OCR

[...]

Khaoula Elagouni, Christophe Garcia, Franck Mamalet, Pascale Sébillot

27 Mar 2012

TL;DR: A novel method to recognize scene texts avoiding the conventional character segmentation step is proposed, relying on a neural classification approach, to every window in order to recognize valid characters and identify non valid ones.

...read moreread less

Abstract: Understanding text captured in real-world scenes is a challenging problem in the field of visual pattern recognition and continues to generate a significant interest in the OCR (Optical Character Recognition) community. This paper proposes a novel method to recognize scene texts avoiding the conventional character segmentation step. The idea is to scan the text image with multi-scale windows and apply a robust recognition model, relying on a neural classification approach, to every window in order to recognize valid characters and identify non valid ones. Recognition results are represented as a graph model in order to determine the best sequence of characters. Some linguistic knowledge is also incorporated to remove errors due to recognition confusions. The designed method is evaluated on the ICDAR 2003 database of scene text images and outperforms state-of-the-art approaches.

...read moreread less

Proceedings Article•10.1109/DAS.2012.65•

OTCYMIST: Otsu-Canny Minimal Spanning Tree for Born-Digital Images

[...]

Deepak Kumar¹, A. G. Ramakrishnan¹•Institutions (1)

Indian Institute of Science¹

27 Mar 2012

TL;DR: Text segmentation and localization algorithms are proposed for the born-digital image dataset, where the text components are represented as nodes of a graph, where long edges are broken from the minimum spanning tree of the graph.

...read moreread less

Abstract: Text segmentation and localization algorithms are proposed for the born-digital image dataset. Binarization and edge detection are separately carried out on the three colour planes of the image. Connected components (CC's) obtained from the binarized image are thresholded based on their area and aspect ratio. CC's which contain sufficient edge pixels are retained. A novel approach is presented, where the text components are represented as nodes of a graph. Nodes correspond to the centroids of the individual CC's. Long edges are broken from the minimum spanning tree of the graph. Pair wise height ratio is also used to remove likely non-text components. A new minimum spanning tree is created from the remaining nodes. Horizontal grouping is performed on the CC's to generate bounding boxes of text strings. Overlapping bounding boxes are removed using an overlap area threshold. Non-overlapping and minimally overlapping bounding boxes are used for text segmentation. Vertical splitting is applied to generate bounding boxes at the word level. The proposed method is applied on all the images of the test dataset and values of precision, recall and H-mean are obtained using different approaches.

...read moreread less

Proceedings Article•10.1109/DAS.2012.77•

Scanning Neural Network for Text Line Recognition

[...]

Sheikh Faisal Rashid¹, Faisal Shafait², Thomas M. Breuel¹•Institutions (2)

Kaiserslautern University of Technology¹, German Research Centre for Artificial Intelligence²

27 Mar 2012

TL;DR: A segmentation free text line recognition approach using multi layer perceptron (MLP) and hidden markov models (HMMs) that achieves 98.4% character recognition accuracy that is statistically significantly better in comparison with character recognition accuracies obtained from state-of-the-art open source OCR systems.

...read moreread less

Abstract: Optical character recognition (OCR) of machine printed Latin script documents is ubiquitously claimed as a solved problem. However, error free OCR of degraded or noisy text is still challenging for modern OCR systems. Most recent approaches perform segmentation based character recognition. This is tricky because segmentation of degraded text is itself problematic. This paper describes a segmentation free text line recognition approach using multi layer perceptron (MLP) and hidden markov models (HMMs). A line scanning neural network â"trained with character level contextual information and a special garbage classâ" is used to extract class probabilities at every pixel succession. The output of this scanning neural network is decoded by HMMs to provide character level recognition. In evaluations on a subset of UNLV-ISRI document collection, we achieve 98.4% character recognition accuracy that is statistically significantly better in comparison with character recognition accuracies obtained from state-of-the-art open source OCR systems.

...read moreread less

Proceedings Article•10.1109/DAS.2012.42•

How Salient is Scene Text

[...]

Asif Shahab¹, Faisal Shafait¹, Andreas Dengel¹, Seiichi Uchida²•Institutions (2)

German Research Centre for Artificial Intelligence¹, Kyushu University²

27 Mar 2012

TL;DR: Initial results indicate that saliency maps produced by these attention models can be used for aiding scene text detection algorithms by suppressing non-text regions.

...read moreread less

Abstract: Computational models of visual attention use image features to identify salient locations in an image that are likely to attract human attention. Attention models have been quite effectively used for various object detection tasks. However, their use for scene text detection is under-investigated. As a general observation, scene text often conveys important information and is usually prominent or salient in the scene itself. In this paper, we evaluate four state-of-the-art attention models for their response to scene text. Initial results indicate that saliency maps produced by these attention models can be used for aiding scene text detection algorithms by suppressing non-text regions.

...read moreread less

Proceedings Article•10.1109/DAS.2012.10•

A Signature Verification Framework for Digital Pen Applications

[...]

Muhammad Imran Malik, Sheraz Ahmed, Andreas Dengel, Marcus Liwicki

27 Mar 2012

TL;DR: A framework that takes real-time online signature verification to every scenario where digital pens may potentially be used and a general approach to integrate the GMM-descriptions into electronic ID-cards in order to also store behavioral biometrics on these cards is proposed.

...read moreread less

Abstract: In this paper we present a framework for real-time online signature verification scenarios. The proposed framework is based on state-of-the-art feature extraction and Gaussian Mixture Model (GMM) classification. While our signature verification library is generally applicable to any input device using digital pens, we have implemented verification scenarios using the Anoto digital pen. As such our automated signature verification framework becomes an interesting commodity for industry, because the Anoto SDK is easy to apply and the GMM-based classification can be seamlessly integrated. The novelty of this work is the application of our framework that takes real-time online signature verification to every scenario where digital pens may potentially be used. In this paper we describe several scenarios where our framework has been applied, including signatures in financial contracts or ordering processes. We also propose a general approach to integrate the GMM-descriptions into electronic ID-cards in order to also store behavioral biometrics on these cards. In experiments we have measured the performance of the signature verification system when skilled forgeries were present. The interest shown by our partner financial institutions and the results of our initial evaluations indicate that our signature verification framework suits exactly the demands of our clients.

...read moreread less

Proceedings Article•10.1109/DAS.2012.32•

Effect of "Ground Truth" on Image Binarization

[...]

Elisa H. Barney Smith¹, Chang An²•Institutions (2)

Boise State University¹, Lehigh University²

27 Mar 2012

TL;DR: Three variations in pixel accurate ground truth were used to train a binarization classifier, and the performance can vary significantly depending on choice of ground truth, which can influencebinarization design choices.

...read moreread less

Abstract: Image binarization has a large effect on the rest of the document image analysis processes in character recognition. Algorithm development is still a major focus of research. Evaluation of image binarization has been done by comparison of the result of OCR systems on images binarized by different methods. That has been criticized in that the binarization alone is not evaluated, but rather how it interacts with the downstream processes. Recently pixel accurate "ground truth" images have been introduced for use in binarization algorithm evaluation. This has been shown to be open to interpretation. The choice of binarization ground truth affects the binarization algorithm design, either directly if design is by automated algorithm trying to match the provided ground truth, or indirectly if human designers adjust their designs to perform better on the provided data. Three variations in pixel accurate ground truth were used to train a binarization classifier. The performance can vary significantly depending on choice of ground truth, which can influence binarization design choices.

...read moreread less

Proceedings Article•10.1109/DAS.2012.30•

Document Classification Using Multiple Views

[...]

Albert Gordo¹, Florent Perronnin², Ernest Valveny¹•Institutions (2)

Autonomous University of Barcelona¹, Xerox²

27 Mar 2012

TL;DR: The use of Canonical Correlation Analysis is considered to leverage `expensive' views that are available only at training time to significantly improve the results in a classification task.

...read moreread less

Abstract: The combination of multiple features or views when representing documents or other kinds of objects usually leads to improved results in classification (and retrieval) tasks. Most systems assume that those views will be available both at training and test time. However, some views may be too `expensive' to be available at test time. In this paper, we consider the use of Canonical Correlation Analysis to leverage `expensive' views that are available only at training time. Experimental results show that this information may significantly improve the results in a classification task.

...read moreread less

Proceedings Article•10.1109/DAS.2012.71•

Real-Time Document Image Retrieval on a Smartphone

[...]

Kazutaka Takeda¹, Koichi Kise¹, Masakazu Iwamura¹•Institutions (1)

Osaka Prefecture University¹

27 Mar 2012

TL;DR: This paper presents a novel interface running on smart phones which is capable of seamlessly linking physical and digital worlds through paper documents, based on a real-time document image retrieval method called Locally Likely Arrangement Hashing.

...read moreread less

Abstract: This paper presents a novel interface running on smart phones which is capable of seamlessly linking physical and digital worlds through paper documents. This interface is based on a real-time document image retrieval method called Locally Likely Arrangement Hashing. By just only pointing a smart phone to a paper document, the user can obtain its corresponding electronic document. This can easily provide the user with the information associated with the retrieved document. This relevant information can be superimposed on the display of smart phones. Therefore, we consider that with the help of this interface, the user can utilize paper documents as a new medium to display various information.

...read moreread less

Proceedings Article•10.1109/DAS.2012.68•

Performance Evaluation of Mathematical Formula Identification

[...]

Xiaoyan Lin¹, Liangcai Gao¹, Zhi Tang¹, Xiaofan Lin, Xuan Hu² - Show less +1 more•Institutions (2)

Peking University¹, Beihang University²

27 Mar 2012

TL;DR: A ground-truth dataset is constructed and a tool is developed to automatically evaluate mathematical formula identification results, including the error type definitions and the scenario-adjustable scoring, based on the proposed evaluation metric.

...read moreread less

Abstract: This paper presents a performance evaluation system for mathematical formula identification. First, a ground-truth dataset is constructed to facilitate the performance comparison of different mathematical formula identification algorithms. Statistics analysis of the dataset shows the diversities of the dataset to reflect the real-world documents. Second, a performance evaluation metric for mathematical formula identification is proposed, including the error type definitions and the scenario-adjustable scoring. The proposed metric enables in-depth analysis of mathematical formula identification systems in different scenarios. Finally, based on the proposed evaluation metric, a tool is developed to automatically evaluate mathematical formula identification results. It is worth noting that the ground-truth dataset and the evaluation tool are freely available for academic purpose.

...read moreread less

Proceedings Article•10.1109/DAS.2012.60•

Off-Line Bangla Signature Verification

[...]

Srikanta Pal¹, Vu Nguyen¹, Michael Blumenstein¹, Umapada Pal²•Institutions (2)

Griffith University¹, Indian Statistical Institute²

27 Mar 2012

TL;DR: The performance of an off-line signature verification system involving Bangla signatures, whose style is distinct from Western scripts, was investigated and an encouraging accuracy of 90.4% was obtained.

...read moreread less

Abstract: In the field of information security, biometric systems play an important role. Within biometrics, automatic signature identification and verification has been a strong research area because of the social and legal acceptance and extensive use of the written signature as an individual authentication. Signature verification is a process in which the questioned signature is examined in detail in order to determine whether it belongs to the claimed person or not. Despite substantial research in the field of signature verification involving Western signatures, very few works have been dedicated to non-Western signatures such as Chinese, Japanese, Arabic, or Persian etc. In this paper, the performance of an off-line signature verification system involving Bangla signatures, whose style is distinct from Western scripts, was investigated. The Gaussian Grid feature extraction technique was employed for feature extraction and Support Vector Machines (SVMs) were considered for classification. The Bangla signature database employed in the experiments consisted of 3000 forgeries and 2400 genuine signatures. An encouraging accuracy of 90.4% was obtained from the experiments.

...read moreread less

Proceedings Article•10.1109/DAS.2012.39•

Extraction of Text Touching Graphics Using SURF

[...]

Sheraz Ahmed¹, Marcus Liwicki¹, Andreas Dengel¹•Institutions (1)

German Research Centre for Artificial Intelligence¹

27 Mar 2012

TL;DR: A novel part-based method for the extraction of text touching graphic components using the Speeded Up Robust Features (SURF) to localize the text components and distinguish them from graphics.

...read moreread less

Abstract: In this paper we propose a novel part-based method for the extraction of text touching graphic components. The Speeded Up Robust Features (SURF) are used to localize the text components and distinguish them from graphics. We introduce several post-processing steps to finally detect the text. We have tested our method on a publicly available data set of architectural floor plans and on real geographical maps. On floor plans we have located more than 95% of the text components which were not identified as text beforehand because they were touching graphic components.

...read moreread less

Proceedings Article•10.1109/DAS.2012.85•

Text Detection in Natural Scenes with Salient Region

[...]

Quan Meng¹, Yonghong Song¹•Institutions (1)

Xi'an Jiaotong University¹

27 Mar 2012

TL;DR: A novel approach to detect text in natural scenes using a type of bionic method that imitates how human beings detect text exactly and robustly and provides promising performance in comparison with existing methods.

...read moreread less

Abstract: In this paper, we present a novel approach to detect text in natural scenes. This approach is a type of bionic method, which imitates how human beings detect text exactly and robustly. Practically, human beings follow two steps to detect text: the first step is to find salient regions in a scene and the second step is to determine whether these salient regions are text or not. Therefore, two similar steps namely salient regions computation and text localization are used in our method. In the step of salient regions computation, a set of salient features including multi-sacle contrast, modified center-surround histogram, color spatial distribution and similarity of stroke width are used to describe an image, following with computation of salient regions based on the combination of Conditional Random Fields model and above features. Because sole letter rarely appear, in the step of text localization, salient regions are segmented and the connected components are grouped into text strings based on their features such as spatial relationships, color difference and stroke width. As an elementary unit, the text string is refined by connected component analysis. We tested the effectiveness of our method on the ICDAR 2003 database. The experimental results show that the proposed method provides promising performance in comparison with existing methods.

...read moreread less

Proceedings Article•10.1109/DAS.2012.98•

Writer Identification of Bangla Handwritings by Radon Transform Projection Profile

[...]

Samit Biswas¹, Amit Kumar Das²•Institutions (2)

Bengal Institute of Technology, Kolkata¹, Indian Institute of Engineering Science and Technology, Shibpur²

27 Mar 2012

TL;DR: A new approach for extracting two different sets of components (essentially fragments of characters), namely fragment set-A and fragments set-B, which uses lesser amount of information from the handwritten samples, thus saving computation time as well as memory requirement.

...read moreread less

Abstract: Writer identification is the task of determining the person whose handwritten sample is available in a set of writings, collected from multitude of writers. This has useful applications in many areas, notably in forensic analysis. The task of writer identification is quite difficult due to minimal variations found in different handwritten samples from same person/writer. Several identification algorithms have been proposed so far which are mostly for non-Indic writings. This paper presents a new approach for extracting two different sets of components (essentially fragments of characters), namely fragment set-A and fragment set-B. Features are extracted from each element of these two sets to identify the writing style of a particular person. The features are computed based on Radon transform projection profile. The proposed approach uses lesser amount of information from the handwritten samples, thus saving computation time as well as memory requirement. The condition to determine that the writer is unknown (i.e., there is no handwritten sample from that writer in reference base) is also proposed. The approach is tested on a collected dataset of Bangla writings and the experimental results are encouraging.

...read moreread less

Proceedings Article•10.1109/DAS.2012.38•

ExpressMatch: A System for Creating Ground-Truthed Datasets of Online Mathematical Expressions

[...]

Frank Dennis Julca Aguilar¹, Nina S. T. Hirata¹•Institutions (1)

University of São Paulo¹

27 Mar 2012

TL;DR: This paper presents Express Match, a system designed to help creation and management of online mathematical expression datasets with ground-truth data, and transcriptions of these expressions can be automatically annotated by matching them to the respective models.

...read moreread less

Abstract: In recognition domains, publicly available ground-truthed datasets are essential to perform effective performance evaluation and comparison of existing methods and systems. However, in the field of online handwritten mathematical expression recognition, datasets are quite scarce and their creation is one of the current challenging issues. In this paper, we present Express Match, a system designed to help creation and management of online mathematical expression datasets with ground-truth data. In this system, handwritten model expressions can be input and manually annotated with ground-truth data, transcriptions of these expressions can be automatically annotated by matching them to the respective models. Additional metadata can also be attached to each sample expression. To test the system, a dataset consisting of 56 model expressions and 910 sample expressions with a total of 20,010 symbols, written by 25 different writers, has been created. This dataset, as well as Express Match, will be made publicly available.

...read moreread less

Proceedings Article•10.1109/DAS.2012.70•

Quality Evaluation of Facsimiles of Hebrew First Temple Period Inscriptions

[...]

Arie Shaus¹, Eli Turkel¹, Eli Piasetzky¹•Institutions (1)

Tel Aviv University¹

27 Mar 2012

TL;DR: A new method is proposed, based on a measure, comparing the image of the inscription to the registered facsimile, which is relevant to quality evaluation of other types of facsimiles and binarization in general.

...read moreread less

Abstract: The discipline of First Temple Period epigraphy (the study of writing) relies heavily on manually-drawn facsimiles (black and white images) of ancient inscriptions. This practice may unintentionally mix up documentation and interpretation. The article proposes a new method for evaluating the quality of the facsimile. It is based on a measure, comparing the image of the inscription to the registered facsimile. Some empirical results, supporting the methodology, are presented. The technique is also relevant to quality evaluation of other types of facsimiles and binarization in general.

...read moreread less

Proceedings Article•10.1109/DAS.2012.45•

Improving Book OCR by Adaptive Language and Image Models

[...]

Dar-Shyang Lee¹, Ray Smith¹•Institutions (1)

Google¹

27 Mar 2012

TL;DR: This work describes a system that combines two parallel correction paths using document-specific image and language models that adapts to shapes and vocabularies within a book to identify inconsistencies as correction hypotheses, but relies on the other for effective cross-validation.

...read moreread less

Abstract: In order to cope with the vast diversity of book content and typefaces, it is important for OCR systems to leverage the strong consistency within a book but adapt to variations across books. We describe a system that combines two parallel correction paths using document-specific image and language models. Each model adapts to shapes and vocabularies within a book to identify inconsistencies as correction hypotheses, but relies on the other for effective cross-validation. Using the open source Tesseract engine as baseline, results on a large data set of scanned books demonstrate that word error rates can be reduced by 25 percent using this approach.

...read moreread less

Proceedings Article•10.1109/DAS.2012.50•

Lexicon Reduction Technique for Bangla Handwritten Word Recognition

[...]

Tapan Kumar Bhowmik, Utpal Roy¹, Swapan K. Parui²•Institutions (2)

Visva-Bharati University¹, Indian Statistical Institute²

27 Mar 2012

TL;DR: Though the proposed lexicon reduction technique is developed for recognition of Bangla handwritten words, its generalization property can easily be exploited for Recognition of handwriting in other scripts also.

...read moreread less

Abstract: In this paper we introduce a stroke based lexicon reduction technique in order to reduce the search space for recognition of handwritten words. The principle of this technique involves mainly two aspects of a word image to constitute a feature vector: one is word-length and the other is shape of the word. The length of the word image is represented by the number of specific vertical strokes present in the word image and, on the other hand, the shape of a word image is realized with the combination of both horizontal and vertical strokes. The experiment has been carried out with a database of 35,700 off-line handwritten Bangla word images. Though our proposed lexicon reduction technique is developed for recognition of Bangla handwritten words, its generalization property can easily be exploited for recognition of handwriting in other scripts also.

...read moreread less

Proceedings Article•10.1109/DAS.2012.34•

Efficient Word Retrieval Using a Multiple Ranking Combination Scheme

[...]

Georgios Louloudis, Anastasios L. Kesidis, B. Gatos

27 Mar 2012

TL;DR: A Minimum Ranking method is proposed for the efficient fusion of multiple ranking results produced by different word matching techniques and it is shown that the fusion of the ranked results outperforms the ranking efficiency of the individual systems.

...read moreread less

Abstract: Word retrieval is an important task in the area of document analysis and recognition. The selection of appropriate features is a crucial step in the word matching and retrieval process. Several efficient techniques have been proposed which use a wide range of features. This paper proposes a methodology for the efficient fusion of multiple ranking results produced by different word matching techniques. Specifically, a Minimum Ranking method is proposed for the combination of two or more ranking results. The method is compared with two state-of-the-art ranking fusion methods. The experimental results show that the fusion of the ranked results outperforms the ranking efficiency of the individual systems. Moreover, the proposed Minimum Ranking method outperforms the other two state-of-the-art fusion methods.

...read moreread less

Proceedings Article•10.1109/DAS.2012.81•

Skew Estimation of Sparsely Inscribed Document Fragments

[...]

Markus Diem¹, Florian Kleber¹, Robert Sablatnig¹•Institutions (1)

Vienna University of Technology¹

27 Mar 2012

TL;DR: Results show that the proposed skew estimation is comparable with state-of-the-art methods and outperforms them on a real dataset consisting of 658 snippets.

...read moreread less

Abstract: Document analysis is done to analyze entire forms (e.g. intelligent form analysis, table detection) or to describe the layout/structure of a document for further processing. A pre-processing step of document analysis methods is a skew estimation of scanned or photographed documents. Current skew estimation methods require the existence of large text areas, are dependent on the text type and can be limited on a specific angle range. The proposed method is gradient based in combination with a Focused Nearest Neighbor Clustering of interest points and has no limitations regarding the detectable angle range. The upside/down decision is based on statistical analysis of ascenders and descenders. It can be applied to entire documents as well as to document fragments containing only a few words. Results show that the proposed skew estimation is comparable with state-of-the-art methods and outperforms them on a real dataset consisting of 658 snippets.

...read moreread less

...

Expand