Scispace (Formerly Typeset)
  1. Home
  2. Conferences
  3. Document Analysis Systems
  4. 2014
  1. Home
  2. Conferences
  3. Document Analysis Systems
  4. 2014
Showing papers presented at "Document Analysis Systems in 2014"
Proceedings Article•10.1109/DAS.2014.19•
A Context Based Text Summarization System

[...]

Rafael Ferreira1, Frederico Luiz Gonçalves de Freitas1, Luciano Cabral2, Rafael Dueire Lins1, Rinaldo Lima1, Gabriel Franca1, Steven J. Simske3, Luciano Favaro3 •
Universidade de Pernambuco1, Rio de Janeiro State University2, Hewlett-Packard3
7 Apr 2014
TL;DR: This paper advocates the thesis that the quality of the summary obtained with combinations of sentence scoring methods depend on text subject, and evaluates the validity of the hypothesis formulated and point at which techniques are more effective in each of those contexts studied.
Abstract: Text summarization is the process of creating a shorter version of one or more text documents. Automatic text summarization has become an important way of finding relevant information in large text libraries or in the Internet. Extractive text summarization techniques select entire sentences from documents according to some criteria to form a summary. Sentence scoring is the technique most used for extractive text summarization, today. Depending on the context, however, some techniques may yield better results than some others. This paper advocates the thesis that the quality of the summary obtained with combinations of sentence scoring methods depend on text subject. Such hypothesis is evaluated using three different contexts: news, blogs and articles. The results obtained show the validity of the hypothesis formulated and point at which techniques are more effective in each of those contexts studied.

70 citations

Proceedings Article•10.1109/DAS.2014.40•
The A2iA Arabic Handwritten Text Recognition System at the Open HaRT2013 Evaluation

[...]

Theodore Bluche, Jérôme Louradour, Maxime Knibbe, Bastien Moysset, Mohamed Faouzi BenZeghiba, Christopher Kermorvant 
7 Apr 2014
TL;DR: This paper describes the Arabic handwriting recognition systems proposed by A2iA to the NIST OpenHaRT2013 evaluation, based on an optical model using Long Short-Term Memory recurrent neural networks trained to recognize the different forms of the Arabic characters directly from the image, without explicit feature extraction nor segmentation.
Abstract: This paper describes the Arabic handwriting recognition systems proposed by A2iA to the NIST OpenHaRT2013 evaluation. These systems were based on an optical model using Long Short-Term Memory (LSTM) recurrent neural networks, trained to recognize the different forms of the Arabic characters directly from the image, without explicit feature extraction nor segmentation.Large vocabulary selection techniques and n-gram language modeling were used to provide a full paragraph recognition, without explicit word segmentation. Several recognition systems were also combined with the ROVER combination algorithm. The best system exceeded 80% of recognition rate.

59 citations

Proceedings Article•10.1109/DAS.2014.58•
The Maurdor Project: Improving Automatic Processing of Digital Documents

[...]

Sylvie Brunessaux1, Patrick Giroux1, Bruno Grilheres1, Mathieu Manta2, Maylis Bodin, Khalid Choukri, Olivier Galibert, Juliette Kahn •
Airbus Defence and Space1, Direction générale de l'armement2
7 Apr 2014
TL;DR: This paper presents the achievements of an experimental project called Maurdor (Moyens AUtomatisés de reconnaissance de Documents ecRits - Automatic Processing of Digital Documents) funded by the French DGA that aims at improving processing technologies for handwritten and typewritten documents in French, English and Arabic.
Abstract: This paper presents the achievements of anexperimental project called Maurdor (Moyens AUtomatises deReconnaissance de Documents ecRits - Automatic Processingof Digital Documents) funded by the French DGA that aims atimproving processing technologies for handwritten and typewritten documents in French, English and Arabic. The first part describes the context and objectives of the project. The second part deals with the challenge of creating a realistic corpus of 10,000 annotated documents to support the efficient development and evaluation of processing modules. The third part presents the organisation, metric definition and results of the Maurdor International evaluation campaign. The last part presents the Maurdor demonstrator with a functional and technical perspective.

54 citations

Proceedings Article•10.1109/DAS.2014.46•
A Novel Learning-Free Word Spotting Approach Based on Graph Representation

[...]

Peng Wang1, Véronique Eglin2, Christophe Garcia2, Christine Largeron1, Josep Lladós3, Alicia Fornés3 •
Jean Monnet University1, Institut national des sciences Appliquées de Lyon2, Autonomous University of Barcelona3
7 Apr 2014
TL;DR: A novel handwritten word spotting approach based on graph representation that comprises both topological and morphological signatures of handwriting that outperforms the state-of-the-art structural methods.
Abstract: Effective information retrieval on handwritten documentimages has always been a challenging task. In this paper, we propose a novel handwritten word spotting approach based on graph representation. The presented model comprises both topological and morphological signatures of handwriting. Skeleton-based graphs with the Shape Context labelled vertexes are established for connected components. Each word image is represented as a sequence of graphs. In order to be robust to the handwriting variations, an exhaustive merging process based on DTW alignment result is introduced in the similarity measure between word images. With respect to the computation complexity, an approximate graph edit distance approach using bipartite matching is employed for graph matching. The experiments on the George Washington dataset and the marriage records from the Barcelona Cathedral dataset demonstrate that the proposed approach outperforms the state-of-the-art structural methods.

41 citations

Proceedings Article•10.1109/DASC.2014.6979418•
Architecture and capabilities of a data warehouse for ATM research

[...]

Michelle M. Eshow1, Max Lui, Shubha S. Ranjan•
Ames Research Center1
11 Dec 2014
TL;DR: The design, implementation, and use of a data warehouse that supports air traffic management research at NASA's Ames Research Center, dubbed Sherlock, has been in development since 2009 and is a crucial piece of the ATM research infrastructure used by Ames and its partners.
Abstract: This paper describes the design, implementation, and use of a data warehouse that supports air traffic management (ATM) research at NASA’s Ames Research Center. The data warehouse, dubbed Sherlock, has been in development since 2009 and is a crucial piece of the ATM research infrastructure used by Ames and its partners. Sherlock comprises several components, including a database, a webbased user interface, and supplementary services for query and visualization. The information stored includes raw data collected from the National Airspace System (NAS), parsed and processed data, derived data, and reports derived from pre-defined queries. The raw data include a variety of flight information from live streams of FAA operational systems, weather observations and forecasts, and NAS advisories and statistics. The modified data comprise parsed and merged data sources and metadata, enabling parameterized searches for data of interest. The derived data represent the results of research analyses deemed to be of significant interest to a wide cross-section of users. Sherlock is implemented on an Oracle 11g database, with supplemental services built on open-source packages and custom software. It contains over 20 TB of data spanning several years, and more data are added daily. It has supported several research studies, such as finding similar days in the NAS and predicting imposition of traffic flow management restrictions. Planned enhancements include integrated search across data sources and the capability for large-scale analytics.

40 citations

Proceedings Article•10.1109/DAS.2014.29•
End-to-End Text Recognition Using Local Ternary Patterns, MSER and Deep Convolutional Nets

[...]

Michael Opitz1, Markus Diem1, Stefan Fiel1, Florian Kleber1, Robert Sablatnig1 •
Vienna University of Technology1
7 Apr 2014
TL;DR: The system presented outperforms state of the art methods on the ICDAR 2003 dataset in the text-detection, dictionary-driven cropped-word recognition and Dictionary-driven end-to-end recognition tasks.
Abstract: Text recognition in natural scene images is an application for several computer vision applications like licence plate recognition, automated translation of street signs, help for visually impaired people or image retrieval. In this work an end-to-end text recognition system is presented. For detection an AdaBoost ensemble with a modified Local Ternary Pattern (LTP) feature-set with a post-processing stage build upon Maximally Stable Extremely Region (MSER) is used. The text recognition is done using a deep Convolution Neural Network (CNN) trained with backpropagation. The system presented outperforms state of the art methods on the ICDAR 2003 dataset in the text-detection (F-Score: 74.2%), dictionary-driven cropped-word recognition (F-Score: 87.1%) and dictionary-driven end-to-end recognition (F-Score: 72.6%) tasks.

40 citations

Proceedings Article•10.1109/DAS.2014.11•
Combining Focus Measure Operators to Predict OCR Accuracy in Mobile-Captured Document Images

[...]

Marçal Rusiñol1, Joseph Chazalon1, Jean-Marc Ogier1•
University of La Rochelle1
7 Apr 2014
TL;DR: This paper presents 24 focus measures, never tested on document images, which are fast to compute and require no training, and shows that a combination of those measures enables state-of-the art performance regarding the correlation with OCR accuracy.
Abstract: Mobile document image acquisition is a new trend raising serious issues in business document processing workflows. Such digitization procedure is unreliable, and integrates many distortions which must be detected as soon as possible, on the mobile, to avoid paying data transmission fees, and losing information due to the inability to re-capture later a document with temporary availability. In this context, out-of-focus blur is major issue: users have no direct control over it, and it seriously degrades OCR recognition. In this paper, we concentrate on the estimation of focus quality, to ensure a sufficient legibility of a document image for OCR processing. We propose two contributions to improve OCR accuracy prediction for mobile-captured document images. First, we present 24 focus measures, never tested on document images, which are fast to compute and require no training. Second, we show that a combination of those measures enables state-of-the art performance regarding the correlation with OCR accuracy. The resulting approach is fast, robust, and easy to implement in a mobile device. Experiments are performed on a public dataset, and precise details about image processing are given.

38 citations

Proceedings Article•10.1109/DASC.2014.6979499•
Scheduling methods for unmanned aerial vehicle based delivery systems

[...]

Hanlin Zhang1, Sixiao Wei1, Wei Yu1, Erik Blasch2, Genshe Chen, Dan Shen, Khanh Pham2 •
Towson University1, Air Force Research Laboratory2
11 Dec 2014
TL;DR: A weight-based scheduling scheme is developed, which considers the priority and delivery distance of service requests and shows that the service delivery delay and the probability of successfully handling UAV services requests can be significantly improved upon in comparison with existing baseline schemes.
Abstract: The recent Federal Aviation Administration (FAA) approval of Unmanned Aerial Vehicle (UAV) testing will transform our daily lives. Because of the inherent flexibility, ease of use, and low cost to operate; UAV-based delivery systems will become tremendously popular in the near future. In this paper, we address the issue of cyber-physical scheduling of UAV resources in order to achieve an efficient delivery service. Particularly, to reduce the overall delivery time, we develop a weight-based scheduling scheme, which considers the priority and delivery distance of service requests. To maximize the probability of effectively handling service requests with a limited number of UAVs, we first formalize the problem as an optimization problem and then use a dynamic programming approach to solve the problem. Through a simulation study, our data shows that the service delivery delay and the probability of successfully handling UAV services requests can be significantly improved upon in comparison with existing baseline schemes.

38 citations

Proceedings Article•10.1109/DAS.2014.79•
A Complete Logo Detection/Recognition System for Document Images

[...]

Alireza Alaei1, Mathieu Delalandre1•
François Rabelais University1
7 Apr 2014
TL;DR: A template based recognition approach is proposed to recognize the logo which may present in every detected logo-patch, which uses a search space reduction technique to decrease the number of template logo-models needed for the recognition of a logo in a detected logos-patch.
Abstract: In this paper, a complete logo detection/ recognition system for document images is proposed. In the proposed system, first, a logo detection method is employed to detect a few regions of interest (logo-patches), which likely contain the logo(s), in a document image. The detection method is based on the piece-wise painting algorithm (PPA) and some probability features along with a decision tree. For the logo recognition, a template based recognition approach is proposed to recognize the logo which may present in every detected logo-patch. The proposed logo recognition strategy uses a search space reduction technique to decrease the number of template logo-models needed for the recognition of a logo in a detected logo-patch. The features used for search space reduction are based on the geometric properties of a detected logo-patch. Based on our experimentations on 1290 document images of Tobacco800 dataset, 99.31% of the logos were detected as logo-patches. Among the detected logo-patches 97.90% of logos were fairly recognized. Considering both logo detection and recognition results, 97.22% of the logos in the document images could truly be detected/recognized as the overall performance of the proposed system.

37 citations

Proceedings Article•10.1109/DAS.2014.45•
Adapting Tesseract for Complex Scripts: An Example for Urdu Nastalique

[...]

Qurat ul Ain Akram1, Sarmad Hussain1, Aneta Niazi1, Umair Anjum1, Faheem Irfan1 •
University of Engineering and Technology, Lahore1
7 Apr 2014
TL;DR: Tesseract engine is analyzed and modified for the recognition of Nastalique writing style for Urdu language which is a very complex and cursive writing style of Arabic script.
Abstract: Tesseract engine supports multilingual text recognition. However, the recognition of cursive scripts using Tesseract is a challenging task. In this paper, Tesseract engine is analyzed and modified for the recognition of Nastalique writing style for Urdu language which is a very complex and cursive writing style of Arabic script. Original Tesseract system has 65.59% and 65.84% accuracies for 14 and 16 font sizes respectively, whereas the modified system, with reduced search space, gives 97.87% and 97.71% accuracies respectively. The efficiency is also improved from an average of 170 milliseconds (ms) to an average of 84 ms for the recognition of Nastalique document images.

37 citations

Proceedings Article•10.1109/DAS.2014.61•
The RWTH Large Vocabulary Arabic Handwriting Recognition System

[...]

Mahdi Hamdani1, Patrick Doetsch1, Michal Kozielski1, Amr El-Desoky Mousa1, Hermann Ney1 •
RWTH Aachen University1
7 Apr 2014
TL;DR: The RWTH system for large vocabulary Arabic handwriting recognition is described, based on Hidden Markov Models (HMMs) with state of the art methods for visual/language modeling and decoding, which allows for competitive results in previous handwriting recognition competitions.
Abstract: This paper describes the RWTH system for large vocabulary Arabic handwriting recognition The recognizer is based on Hidden Markov Models (HMMs) with state of the art methods for visual/language modeling and decoding The feature extraction is based on Recurrent Neural Networks (RNNs) which estimate the posterior distribution over the character labels for each observation Discriminative training using the Minimum Phone Error (MPE) criterion is used to train the HMMs The recognition is done with the help of n-gram Language Models (LMs) trained using in-domain text data Unsupervised writer adaptation is also performed using the Constrained Maximum Likelihood Linear Regression (CMLLR) feature adaptation The RWTH Arabic handwriting recognition system gave competitive results in previous handwriting recognition competitions The used techniques allows to improve the performance of the system participating in the OpenHaRT 2013 evaluation
Proceedings Article•10.1109/DAS.2014.63•
CERMINE -- Automatic Extraction of Metadata and References from Scientific Literature

[...]

Dominika Tkaczyk1, Pawel Szostek1, Piotr Jan Dendek1, Mateusz Fedoryszak1, Lukasz Bolikowski1 •
University of Warsaw1
7 Apr 2014
TL;DR: The paper describes the overall workflow architecture of CERMINE, provides details about individual implementations and reports evaluation methodology and results.
Abstract: CERMINE is a comprehensive open source system for extracting metadata and parsed bibliographic references from scientific articles in born-digital form. The system is based on a modular workflow, whose architecture allows for single step training and evaluation, enables effortless modifications and replacements of individual components and simplifies further architecture expanding. The implementations of most steps are based on supervised and unsupervised machine-learning techniques, which simplifies the process of adjusting the system to new document layouts. The paper describes the overall workflow architecture, provides details about individual implementations and reports evaluation methodology and results. CERMINE service is available at http://cermine.ceon.pl.
Proceedings Article•10.1109/DAS.2014.26•
Forgery Detection Based on Intrinsic Document Contents

[...]

Amr Gamal Hamed Ahmed1, Faisal Shafait•
German University in Cairo1
7 Apr 2014
TL;DR: The main idea behind the presented approaches is to automatically identify which parts of a document belong to the template and then detect distortions in those parts only, which results in an improvement up to 29% in accuracy of forgery detection.
Abstract: Nowadays, Document forgery detection is becoming increasingly important as forgery techniques are becoming available even to untrained users. Hence, documents that do not contain any extrinsic security features (e.g. invoices) have become easier to forge. We previously presented a method to detect manipulated documents based on distortions introduced during the forgery creation process. In this paper, several approaches are explored to improve accuracy and time taken to detect forgeries based on document distortions. The main idea behind the presented approaches is to automatically identify which parts of a document belong to the template (and hence would remain static across different documents originating from the same source) and then detect distortions in those parts only. An improvement up to 29% in accuracy of forgery detection is observed compared to our previous work. Furthermore, we also present an approximation of the original method that results in a reduction in run time of the method by several orders of magnitude, while having only a marginal reduction in its accuracy.
Proceedings Article•10.1109/DAS.2014.44•
Improving Classification of an Industrial Document Image Database by Combining Visual and Textual Features

[...]

Olivier Augereau, Nicholas Journet, Anne Vialard, Jean-Philippe Domenger
7 Apr 2014
TL;DR: A new method for classifying document images by combining textual features extracted with the Bag of Words (BoW) technique and visual features extracting with the BoVW technique, which significantly improves the classification performances.
Abstract: The main contribution of this paper is a new method for classifying document images by combining textual features extracted with the Bag of Words (BoW) technique and visual features extracted with the Bag of Visual Words (BoVW) technique. The BoVW is widely used within the computer vision community for scene classification or object recognition but few applications for the classification of entire document images have been submitted. While previous attempts have been showing disappointing results by combining visual and textual features with the Borda-count technique, we're proposing here a combination through learning approach. Experiments conducted on a 1925 document image industrial database reveal that this fusion scheme significantly improves the classification performances. Our concluding contribution deals with the choosing and tuning of the BoW and/or BoVW techniques in an industrial context.
Proceedings Article•10.1109/DAS.2014.39•
A Typed and Handwritten Text Block Segmentation System for Heterogeneous and Complex Documents

[...]

Panos Barlas, Sébastien Adam, Clément Chatelain, Thierry Paquet
7 Apr 2014
TL;DR: A Document Image Analysis system able to extract homogeneous typed and handwritten text regions from complex layout documents of various types based on two connected component classification stages that successively discriminate text/non text and typed/handwritten shapes.
Abstract: This paper presents a Document Image Analysis (DIA) system able to extract homogeneous typed and handwritten text regions from complex layout documents of various types. The method is based on two connected component classification stages that successively discriminate text/non text and typed/handwritten shapes, followed by an original block segmentation method based on white rectangles detection. We present the results obtained by the system during the first competition round of the MAURDOR campaign.
Proceedings Article•10.1109/DAS.2014.52•
A System for Recognizing Online Handwritten Mathematical Expressions and Improvement of Structure Analysis

[...]

Anh Duc Le1, Truyen Van Phan2, Masaki Nakagawa1•
Tokyo University of Agriculture and Technology1, University of Tokyo2
7 Apr 2014
TL;DR: A method to learn structural relations from training patterns without any heuristic decisions by using two SVM models is proposed and stroke order is employed to reduce the complexity of the parsing algorithm.
Abstract: This paper presents a system for recognizing online handwritten mathematical expressions (MEs) and improvement of structure analysis. We represent MEs in Context Free Grammars (CFGs) and employ the Cocke-Younger-Kasami (CYK) algorithm to parse 2D structure of on-line handwritten MEs and select the best interpretation in terms of symbol segmentation, recognition and structure analysis. We propose a method to learn structural relations from training patterns without any heuristic decisions by using two SVM models. We employ stroke order to reduce the complexity of the parsing algorithm. Moreover, we revise structure analysis. Even though CFG does not resolve ambiguities in some cases, our method still gives users a list of candidates that contain expecting result. We evaluate our method in the CROHME 2013 database and demonstrate the improvement of our system in recognition rate as well as processing time.
Proceedings Article•10.1109/DASC.2014.6979495•
Antenna and frequency diversity in the unmanned aircraft systems bands for the over-sea setting

[...]

David W. Matolak1, Ruoyu Sun1•
University of South Carolina1
11 Dec 2014
TL;DR: In this article, the authors report on measurements conducted as part of a project sponsored by NASA, which collected simultaneous wideband channel characteristics (impulse responses) in both L-band and C-band.
Abstract: The use of unmanned aircraft systems (UAS) is expected to grow rapidly in the next decade, and because of this, there are many UAS research, development, testing, and standardization efforts underway. A key concern in all this work is safety, and this has direct implications for the control and non-payload communication (CNPC) systems that must be used to operate UAS in the national airspace. Another key concern is sufficient (and protected) spectrum for UAS, and at present there are two bands in the United States, and potentially internationally: the L-band (960-977 MHz) and C-band (5030-5091 MHz). In any wireless system, the wireless channel can severely impair performance because of dispersion and time variation. Thus in order to design highly reliable CNPC systems, a thorough quantitative knowledge of the air-ground (AG) channel is required. Historically, AG channel research addressed simple cases and short duration, narrowband signals. Yet for UAS that may operate in more complex settings (e.g., low elevation angles, near ground clutter, etc.) and use wider bandwidth CNPC signals, more accurate representations of the AG channel are required. This paper addresses this topic via the quantification of channel characteristics for one of the simplest AG channels: the over-sea setting. We report on measurements conducted as part of a project sponsored by NASA. These measurements in the over-sea environment collected simultaneous wideband channel characteristics (impulse responses) in both L-band and C-band. This was done with a ground based transmitter, and two separate antennas in each band on the aircraft. Thus these measurements allow us to assess both spatial and frequency diversity for both straight and curved flight trajectories. We briefly describe the project and the measurements, and provide a short overview of our recently-published results for propagation path loss, delay spread, and correlations across antennas. We also quantify small-scale fading effects and the diversity attainable across both frequencies and antennas. We conclude with a short description of the statistical models being developed for the various AG channel settings.
Proceedings Article•10.1109/DAS.2014.76•
Evaluation of Texture Features for Offline Arabic Writer Identification

[...]

Chawki Djeddi1, Labiba-Souici Meslati, Imran Siddiqi2, Abdelllatif Ennaji, Haikal El Abed3, Abdeljalil Gattal2 •
University of Annaba1, Bahria University2, Braunschweig University of Technology3
7 Apr 2014
TL;DR: A handwriting-based biometric identification system using a large database of Arabic handwritten documents and a set of features including run lengths, edge-hinge and edge-direction features are used by a Multiclass SVM (Support Vector Machine) classifier.
Abstract: Biometric identification of persons has mainly been based on fingerprints, face, iris and other similar attributes. We propose a handwriting-based biometric identification system using a large database of Arabic handwritten documents. The system first extracts, from each handwritten sample, a set of features including run lengths, edge-hinge and edge-direction features. These features are used by a Multiclass SVM (Support Vector Machine) classifier. Experiments are conducted on a new large database of Arabic handwritings contributed by 1000 writers. The highest identification rate achieved by the combination of run-length and edge-hinge features stands at 84.10%.
Proceedings Article•10.1109/DAS.2014.43•
NIST 2013 Open Handwriting Recognition and Translation (Open HaRT'13) Evaluation

[...]

Audrey Tong, Mark Przybocki, Volker Märgner1, Haikal El Abed1•
Braunschweig University of Technology1
7 Apr 2014
TL;DR: The test designs pertaining to the tasks, the data used, the performance measurements, and the protocols are presented, followed by the evaluation results and some preliminary analyses.
Abstract: This paper describes the NIST 2013 Open Handwriting Recognition and Translation evaluation (OpenHaRT'13). A short background leading to the start of OpenHaRT is included. The test designs pertaining to the tasks, the data used, the performance measurements, and the protocols are presented. The participants and their submissions are mentioned followed by the evaluation results and some preliminary analyses. The paper concludes with some thoughts toward future evaluations.
Proceedings Article•10.1109/DAS.2014.24•
Over-Generative Finite State Transducer N-Gram for Out-of-Vocabulary Word Recognition

[...]

Ronaldo Messina, Christopher Kermorvant
7 Apr 2014
TL;DR: A modification of a finite-state-transducer (fst) n-gram that enables the creation of a static transducer, i.e. when it is not possible to perform on-demand composition, is presented and it is shown that this model is competitive with state-of-the-art solutions.
Abstract: Hybrid statistical grammars both at word and character levels can be used to perform open-vocabulary recognition. This is usually done by allowing the special symbol for unknown-word in the word-level grammar and dynamically replacing it by a (long) n-gramat character-level, as the full transducer does not fit in the memory of most current computers. We present a modification of a finite-state-transducer (fst) n-gram that enables the creation of a static transducer, i.e. when it is not possible to perform on-demand composition. By combining paths in the "LG" transducer (composition of lexicon and n-gram)making it over-generative with respect to the n-grams observed in the corpus, it is possible to reduce the number of actual occurrences of the character-level grammar, the resulting transducer fits the memory of practical machines. We evaluate this model for handwriting recognition using the RIMES and the IAM dabases. We study its effect on the vocabulary size and show that this model is competitive with state-of-the-art solutions.
Proceedings Article•10.1109/DAS.2014.51•
A Combined System for Text Line Extraction and Handwriting Recognition in Historical Documents

[...]

Andreas Fischer, Micheal Baechler1, Angelika Garz1, Marcus Liwicki1, Rolf Ingold1 •
University of Fribourg1
7 Apr 2014
TL;DR: A combined system for text localization and transcription in page images is presented that includes flexible learning-based methods for layout analysis and handwriting recognition, which were developed in the context of the Swiss research project HisDoc.
Abstract: Automated reading of historical handwriting is needed to search and browse ancient manuscripts in digital libraries based on their textual content. In this paper, we present a combined system for text localization and transcription in page images. It includes flexible learning-based methods for layout analysis and handwriting recognition, which were developed in the context of the Swiss research project HisDoc. A comprehensive experimental evaluation is provided for the medieval Parzival database, demonstrating a promising word recognition accuracy of 93.0% with closed vocabulary. In order to harmonize the evaluation of the two document analysis tasks, we introduce a novel evaluation measure for text line extraction that takes substitution, deletion, as well as insertion errors into account.
Proceedings Article•10.1109/DAS.2014.74•
Towards a Robust OCR System for Indic Scripts

[...]

Praveen Krishnan1, Naveen Sankaran1, Ajeet Kumar Singh1, C. V. Jawahar1•
International Institute of Information Technology, Hyderabad1
7 Apr 2014
TL;DR: A web based OCR system which follows a unified architecture for seven Indian languages, is robust against popular degradations, follows a segmentation free approach, addresses the UNICODE re-ordering issues, and can enable continuous learning with user inputs and feedbacks is proposed.
Abstract: The current Optical Character Recognition OCR systems for Indic scripts are not robust enough for recognizing arbitrary collection of printed documents. Reasons for this limitation includes the lack of resources (e.g. not enough examples with natural variations, lack of documentation available about the possible font/style variations) and the architecture which necessitates hard segmentation of word images followed by an isolated symbol recognition. Variations among scripts, latent symbol to UNICODE conversion rules, non-standard fonts/styles and large degradations are some of the major reasons for the unavailability of robust solutions. In this paper, we propose a web based OCR system which (i) follows a unified architecture for seven Indian languages, (ii) is robust against popular degradations, (iii) follows a segmentation free approach, (iv) addresses the UNICODE re-ordering issues, and (v) can enable continuous learning with user inputs and feedbacks. Our system is designed to aid the continuous learning while being usable i.e., we capture the user inputs (say example images) for further improving the OCRs. We use the popular BLSTM based transcription scheme to achieve our target. This also enables incremental training and refinement in a seamless manner. We report superior accuracy rates in comparison with the available OCRs for the seven Indian languages.
Proceedings Article•10.1109/DASC.2014.6979419•
Recent IEEE 802 developments and their relevance for the avionics industry

[...]

Wilfried Steiner, Peter Heise1, Stefan Schneele1•
Airbus Group1
11 Dec 2014
TL;DR: The need for high data-rates enabled the adoption of Ethernet in avionics systems and, indeed, today Ethernet according to the ARINC 664 part 7 standard is used in prominent airplanes such as the Airbus A380 or the Boeing 787 as well as several other aerospace programs.
Abstract: The need for high data-rates enabled the adoption of Ethernet in avionics systems and, indeed, today Ethernet according to the ARINC 664 part 7 standard is used in prominent airplanes such as the Airbus A380 or the Boeing 787 as well as several other aerospace programs. Meanwhile, Ethernet continues to evolve also in its native standardization body, the IEEE. In particular, Ethernet grows in two directions, speed and services, and we consider two specific developments to be of particular interest to the avionics industry. First, the 1000BASE-T1 PHY Task Force (IEEE P802.3bp) and the IEEE 100BASE-T1 Task Force (IEEE 802.3bw) are standardizing Ethernet PHYs considering usage in harsh environments like industrial or automotive at Gbit/sec and 100 Mbit/sec transmission speeds. Secondly, the time-sensitive networking task group (IEEE 802.1 TSN) is currently standardizing basic forms of time-triggered communication to minimize the transmission latency and jitter of Ethernet. In this paper we give an overview of the current developments in IEEE 802.3bp, IEEE 802.3bw and IEEE 802.1 TSN and formulate a perspective on the future use of Ethernet in avionics systems.
Proceedings Article•10.1109/DAS.2014.42•
Newspaper Article Extraction Using Hierarchical Fixed Point Model

[...]

Anukriti Bansal1, Santanu Chaudhury1, Sumantra Dutta Roy1, J.B. Srivastava1•
Indian Institute of Technology Delhi1
7 Apr 2014
TL;DR: A novel learning based framework to extract articles from newspaper images using a Fixed-Point Model that uses contextual information and features of each block to learn the layout of newspaper images and attains a contraction mapping to assign a unique label to every block.
Abstract: This paper presents a novel learning based framework to extract articles from newspaper images using a Fixed-Point Model. The input to the system comprises blocks of text and graphics, obtained using standard image processing techniques. The fixed point model uses contextual information and features of each block to learn the layout of newspaper images and attains a contraction mapping to assign a unique label to every block. We use a hierarchical model which works in two stages. In the first stage, a semantic label (heading, sub-heading, text-blocks, image and caption) is assigned to each segmented block. The labels are then used as input to the next stage to group the related blocks into news articles. Experimental results show the applicability of our algorithm in newspaper labeling and article extraction.
Proceedings Article•10.1109/DAS.2014.38•
Curriculum Learning for Handwritten Text Line Recognition

[...]

Jérôme Louradour, Christopher Kermorvant
7 Apr 2014
TL;DR: This article proposed to first learn to recognize short sequences before training on all available training sequences, which can significantly speed up the training of RNN for text recognition, and even significantly improve performance in some cases.
Abstract: Recurrent Neural Networks (RNN) have recently achieved the best performance in off-line Handwriting Text Recognition. At the same time, learning RNN by gradient descent leads to slow convergence, and training times are particularly long when the training database consists of full lines of text. In this paper, we propose an easy way to accelerate stochastic gradient descent in this set-up, and in the general context of learning to recognize sequences. The principle is called Curriculum Learning, or shaping. The idea is to first learn to recognize short sequences before training on all available training sequences. Experiments on three different handwritten text databases (Rimes, IAM, OpenHaRT) show that a simple implementation of this strategy can significantly speed up the training of RNN for Text Recognition, and even significantly improve performance in some cases.
Proceedings Article•10.1109/DAS.2014.33•
Historical Chinese Character Recognition Method Based on Style Transfer Mapping

[...]

Bohan Li1, Liangrui Peng1, Jingning Ji1•
Tsinghua University1
7 Apr 2014
TL;DR: Experimental results showed that supervised STM may improve the generalization ability of the classifier.
Abstract: Historical Chinese character recognition has been a challenging topic in pattern recognition field because of large character set, various writing styles and lack of training samples. In this paper, we adopted Style Transfer Mapping (STM) method to historical Chinese character recognition. Optimal selection of parameters was discussed. Two sets of experiments were conducted. The first set of experiment was designed to test the performance of STM on different font styles by using available printed traditional Chinese characters. The second set of experiment was carried out on samples extracted from practical historical Chinese documents. Experimental results showed that supervised STM may improve the generalization ability of the classifier.
Proceedings Article•10.1109/DAS.2014.20•
Separation of Graphics (Superimposed) and Scene Text in Video Frames

[...]

Palaiahnakote Shivakumara1, N.V. Kumar2, Devanur S. Guru2, Chew Lim Tan3•
University of Malaya1, University of Mysore2, National University of Singapore3
7 Apr 2014
TL;DR: A novel method to use Ring Radius Transform to identify the radius that represents the medial axis in the edge image of Canny and Sobel edge pattern to achieve good recognition rate and results in Gaussian distribution for graphics and non-Gaussian for scene text.
Abstract: The presence of both graphics and scene text in video frames makes text detection and recognition problem more challenging because the nature of the two texts differs significantly. This paper aims to propose a novel method for separation of graphics and scene text to achieve good recognition rate based on the fact that Canny and Sobel edge pattern share common property for text. We propose to use Ring Radius Transform to identify the radius that represents the medial axis in the edge image. We study the intra relationship between bins of the histograms over respective radius values, resulting in intra line graphs. In this way, the method finds intra line graphs for both Canny and Sobel edge images of the input text lines. To identify the unique distribution for separation of graphics and scene texts, we explore the inter relationship between intra line graphs of Canny and Sobel edge image with respective medial axes values. This results in Gaussian distribution for graphics and non-Gaussian for scene text. Experimental results on horizontal, non-horizontal, different scripts etc. show that the proposed method is effective for classification and the results of baseline recognition methods show that recognition rate is significantly improved after classification.
Proceedings Article•10.1109/DAS.2014.41•
A Two Level Algorithm for Text Detection in Natural Scene Images

[...]

Li Rong, Wang Suyu, Zhixin Shi
7 Apr 2014
TL;DR: A two-level method to detect text in natural scene images that not only uses information of a single CCs feature, but also uses the information of whether a CC is in a group to make final decision of whether the CC is text or non-text.
Abstract: In this paper we present a two-level method to detect text in natural scene images. In the first level, connected components (referred as CCs) are got from the images. Then candidate text lines are extracted and groups of connected components that align in horizontal or vertical direction are got. We think CCs in these groups have high probability are texts. To validate which CC is text, a SVM is trained to make an initial decision. The output of SVM is calibrated to posterior probability. Then we use the information of posterior probability of SVM and information of whether the connected component is in a group to divide the connected components into four classes: texts, non-texts, probable texts and undetermined CCs. In the second level, a conditional random field model is used to make final decision. Relationship between CCs is modeled by a network G(V, E), Vertices of the graph correspond to CCs. The determination in the first level will influence the second levels determination by giving different parameters of data term for the four classes of CCs. By this way, we not only use information of a single CCs feature, but also use the information of whether a CC is in a group to make final decision of whether the CC is text or non-text. Experiments show that the method is effective.
Proceedings Article•10.1109/DASC.2014.6979400•
Optimizing integrated terminal airspace operations under uncertainty

[...]

Christabelle S. Bosson1, Min Xue1, Shannon Zelinski2•
University of California, Santa Cruz1, Ames Research Center2
11 Dec 2014
TL;DR: This paper presents an alternate method using a machine jobshop scheduling formulation to model the integrated airspace operations using a multistage stochastic programming approach to solve sample average approximation problems with finite sample size.
Abstract: In the terminal airspace, integrated departures and arrivals have the potential to increase operations efficiency. Recent research has developed geneticalgorithm- based schedulers for integrated arrival and departure operations under uncertainty. This paper presents an alternate method using a machine jobshop scheduling formulation to model the integrated airspace operations. A multistage stochastic programming approach is chosen to formulate the problem and candidate solutions are obtained by solving sample average approximation problems with finite sample size. Because approximate solutions are computed, the proposed algorithm incorporates the computation of statistical bounds to estimate the optimality of the candidate solutions. A proof-ofconcept study is conducted on a baseline implementation of a simple problem considering a fleet mix of 14 aircraft evolving in a model of the Los Angeles terminal airspace. A more thorough statistical analysis is also performed to evaluate the impact of the number of scenarios considered in the sampled problem. To handle extensive sampling computations, a multithreading technique is introduced.
Proceedings Article•10.1109/DAS.2014.70•
Color Descriptor for Content-Based Drawing Retrieval

[...]

Christophe Rigaud, Dimosthenis Karatzas1, Jean-Christophe Burie, Jean-Marc Ogier•
Autonomous University of Barcelona1
7 Apr 2014
TL;DR: A color-based approach for comics character retrieval using content-based drawing retrieval and color palette is presented, which is an essential step towards a fully automatic comic book understanding.
Abstract: Human detection in computer vision field is an active field of research. Extending this to human-like drawings such as the main characters in comic book stories is not trivial. Comics analysis is a very recent field of research at the intersection of graphics, texts, objects and people recognition. The detection of the main comic characters is an essential steptowards a fully automatic comic book understanding. This paper presents a color-based approach for comics character retrieval using content-based drawing retrieval and color palette.
...

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve