Top 38 papers published in the topic of Automatic indexing in 2002

Showing papers on "Automatic indexing published in 2002"

Journal Article•10.1109/TMM.2002.802024•

Automatic detection and indexing of video-event shots for surveillance applications

[...]

Gian Luca Foresti¹, Lucio Marcenaro², Carlo S. Regazzoni²•Institutions (2)

University of Udine¹, University of Genoa²

01 Dec 2002-IEEE Transactions on Multimedia

TL;DR: A novel approach allowing layered content-based retrieval of video-event shots referring to potentially interesting situations is presented, which refers to potentially dangerous situations: abandoned objects and predefined human events are considered.

...read moreread less

Abstract: Increased communication capabilities and automatic scene understanding allow human operators to simultaneously monitor multiple environments. Due to the amount of data to be processed in new surveillance systems, the human operator must be helped by automatic processing tools in the work of inspecting video sequences. In this paper, a novel approach allowing layered content-based retrieval of video-event shots referring to potentially interesting situations is presented. Interpretation of events is used for defining new video-event shot detection and indexing criteria. Interesting events refer to potentially dangerous situations: abandoned objects and predefined human events are considered in this paper. Video-event shot detection and indexing capabilities are used for online and offline content-based retrieval of scenes to be detected.

...read moreread less

67 citations

Journal Article•10.1016/S0306-4573(01)00039-5•

Indexing aids at corporate websites: the use of robots.txt and META Tags

[...]

M. Carl Drott¹•Institutions (1)

Drexel University¹

01 Mar 2002-Information Processing and Management

TL;DR: An increase in the use of indexing aids, especially Meta tags, represents one way in which web robots could index sites more quickly and thus improve overall index coverage of the web.

...read moreread less

Abstract: Sixty corporate websites selected from the Fortune Global 500 companies were examined in 2000 and again in 2001 to see if they provided support for automatic indexing. In particular, use of the robots.txt and Meta tags for "keywords" and "description" was examined. Slightly fewer than half of the sites provided one or both of these aids. Among sites providing indexing aids there was a clear under-representation of Asian sites. Nearly 80% of the sites used Java, suggesting a reasonable level of technical sophistication among website creators. About one-third of the sites used cookies, raising the possibility that repeat visitors might find the navigation of the site customized to their needs. Overall an increase in the use of indexing aids, especially Meta tags, represents one way in which web robots could index sites more quickly and thus improve overall index coverage of the web.

...read moreread less

39 citations

Proceedings Article•10.1109/ICME.2002.1035400•

Compressed domain object tracking for automatic indexing of objects in MPEG home video

[...]

Radhakrishna Achanta¹, Mohan S. Kankanhalli¹, Philippe Mulhem•Institutions (1)

National University of Singapore¹

7 Nov 2002

TL;DR: This work presents an object tracker that operates directly on MPEG compressed data, which offers speed, simplicity and robustness against occlusion and camera motion, with good intra-shot tracking for shots in excess of 500 frames, as shown in the experimental results.

...read moreread less

Abstract: Object tracking is of utmost importance for automatic indexing of video content. This work presents an object tracker that operates directly on MPEG compressed data. Motion vectors and discrete cosine transform (DCT) coefficients directly available from the compressed video stream are exploited for the purpose of tracking. Tracking proceeds in two steps: motion vector based tracking in P and B frames within the groups of pictures (GOPs), and object identification in I frames. Colour, which is one of the strongest cues for tracking is used for the identification step. Such a system offers speed, simplicity and robustness against occlusion and camera motion, with good intra-shot tracking for shots in excess of 500 frames, as shown in the experimental results.

...read moreread less

30 citations

Proceedings Article•10.1109/ICASSP.2002.5745038•

A probabilistic layered framework for integrating multimedia content and context information

[...]

Radu S. Jasinschi¹, Nevenka Dimitrova¹, Thomas Mcgee¹, Lalitha Agnihotri¹, John Zimmerman¹, Dongge Li¹, J. Louie¹ - Show less +3 more•Institutions (1)

Philips¹

13 May 2002

TL;DR: A probabilistic framework that combines (a) Bayesian networks that describe both content and context and (b) hierarchical priors that describe the integration ofcontent and context is introduced.

...read moreread less

Abstract: Automatic indexing of large collections of multimedia data is important for enabling retrieval functions. Current approaches mostly draw on a single or dual modality of video content analysis. Here we describe a framework for the integration of multimedia content and context information, which generalizes and systematizes current methods. Content information in the visual, audio, and text domains, is described at different levels of granularity and abstraction. Context describes the underlying structural information that can be used to constrain the possible number of interpretations. We introduce a probabilistic framework that combines (a) Bayesian networks that describe both content and context and (b) hierarchical priors that describe the integration of content and context. We present an application that uses this framework to segment and index TV programs. We discuss experimental results on segment classification on six and a half hours of broadcast video. In our experiments we used audio context information. Classification results for financial segments yield 83% and for celebrity segments 89%.

...read moreread less

28 citations

Book Chapter•10.1007/3-540-45747-X_46•

Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content

[...]

Kalina Bontcheva¹, Diana Maynard¹, Hamish Cunningham¹, Horacio Saggion¹•Institutions (1)

University of Sheffield¹

16 Sep 2002

TL;DR: This paper shows how robust human language technology, such as the domain-independent and customisable named entity recogniser, is used for automatic content annotation and indexing in two digital library applications.

...read moreread less

Abstract: In this paper we show how we used robust human language technology, such as our domain-independent and customisable named entity recogniser, for automatic content annotation and indexing in two digital library applications. Each of these applications posed a unique challenge: one required adapting the language processing components to the non-standard written conventions of 18th century English, while the other presented the challenge of processing material in multiple modalities. This reusable technology could also form the basis for the creation of computational tools for the study of cultural heritage languages, such as Ancient Greek and Latin.

...read moreread less

27 citations

Journal Article•10.1016/S0306-4573(01)00049-8•

A feature mining based approach for the classification of text documents into disjoint classes

[...]

Salvador Nieto Sanchez¹, Evangelos Triantaphyllou¹, Donald H. Kraft¹•Institutions (1)

Louisiana State University¹

12 Jul 2002-Information Processing and Management

TL;DR: A guided strategy for the OCAT-based approach is presented for deciding which document one needs to consider next while building the training example sets and the first approach has many advantages over the VSM approach for solving this type of text document classification problem.

...read moreread less

Abstract: This paper proposes a new approach for classifying text documents into two disjoint classes. The new approach is based on extracting patterns, in the form of two logical expressions, which are defined on various features (indexing terms) of the documents. The pattern extraction is aimed at providing descriptions (in the form of two logical expressions) of the two classes of positive and negative examples. This is achieved by means of a data mining approach, called One Clause At a Time (OCAT), which is based on mathematical logic. The application of a logic-based approach to text document classification is critical when one wishes to be able to justify why a particular document has been assigned to one class versus the other class. This situation occurs, for instance, in declassifying documents that have been previously considered important to national security and thus are currently being kept as secret. Some computational experiments have investigated the effectiveness of the OCAT-based approach and compared it to the well-known vector space model (VSM). These tests also have investigated finding the best indexing terms that could be used in making these classification decisions. The results of these computational experiments on a sample of 2897 text documents from the TIPSTER collection indicate that the first approach has many advantages over the VSM approach for solving this type of text document classification problem. Moreover, a guided strategy for the OCAT-based approach is presented for deciding which document one needs to consider next while building the training example sets.

...read moreread less

26 citations

Journal Article•10.1016/S0306-4573(01)00050-4•

In vitro evaluation of a program for machine-aided indexing

[...]

Christian Jacquemin¹, Béatrice Daille², Jean Royanté¹, Xavier Polanco¹•Institutions (2)

Centre national de la recherche scientifique¹, University of Nantes²

01 Nov 2002-Information Processing and Management

TL;DR: The human evaluation of ILIAD, a program for machine-aided indexing (MAI), which consists of two language engineering modules and is designed to assist expert librarians in computer-aiding indexing and document analysis, is presented.

...read moreread less

Abstract: This article presents the human evaluation of ILIAD, a program for machine-aided indexing (MAI). It consists of two language engineering modules and is designed to assist expert librarians in computer-aided indexing and document analysis. Our aim is the expert evaluation of automatic multi-word term indexing. Evaluation is performed by documentary engineers. Cataloging and indexing are their principal tasks. They also have a good scientific knowledge of the domain to which the indexed documents belong.We first present the ILIAD program and the two systems submitted to this evaluation, the methodology (protocol) adopted, the differences between the protocol and the implementation, and the results of these evaluations. Human evaluation is divided into three parts: firstly the evaluation of controlled indexing, then free indexing and finally term variant extraction performed during controlled indexing. Finally, we analyze the relevance of this evaluation by calculating the agreement frequency and the Kappa coefficient and propose some future developments.

...read moreread less

23 citations

Patent•

Method and large syntactical analysis system of a corpus, a specialised corpus in particular

[...]

Didier Bourigault, Cécile Fabre

28 May 2002

TL;DR: This article proposed a method for large syntactical analysis based on unsupervised learning on a corpus comprising an iterative sequencing of two phases: a learning phase wherein linguistic information is acquired using unambiguous analysis cases, and a resolution phase wherein ambiguous analysis cases are resolved using information acquired during the learning phase.

...read moreread less

Abstract: The invention relates to a method for large syntactical analysis based on unsupervised learning on a corpus comprising an iterative sequencing of two phases: a learning phase wherein linguistic information is acquired using unambiguous analysis cases, and a resolution phase wherein ambiguous analysis cases are resolved using information acquired during the learning phase. The invention is used in particular for creating specialised terminological resources for an information processing system, for creating an ontology for a specialised information search engine on the web, for a terminological lexicon for an automatic translation system, or for a thesaurus for an automatic indexing system.

...read moreread less

17 citations

Book Chapter•10.1007/3-540-45747-X_35•

Alignment of Performances with Scores Aimed at Content-Based Music Access and Retrieval

[...]

Nicola Orio¹•Institutions (1)

University of Padua¹

16 Sep 2002

TL;DR: The research work reported in this paper proposes to index and retrieve music performances through an automatic alignment of acoustic recordings with the music scores, based on the use of hidden Markov models, a powerful tool that has been successfully used in many research areas, like speech recognition and molecular biology.

...read moreread less

Abstract: Music digital libraries pose interesting and challenging research problems, in particular for the development of methodologies and tools for the retrieval of music documents. One difficult aspect of content-based retrieval of musical works is that only scores can be represented by a symbolic notation, while performances, which are of interest for the majority of users, allow for access based on bibliographic values only. The research work reported in this paper proposes to index and retrieve music performances through an automatic alignment of acoustic recordings with the music scores. Alignment my allow for: automatic recognition of performances, aimed at cataloging large collections of recordings; automatic tagging of performances, aimed at an easy access to long recordings. The methodology is based on the use of hidden Markov models, a powerful tool that has been successfully used in many research areas, like speech recognition and molecular biology. The approach has been tested on a collection of acoustic and synthetic performances, showing good results in the recognition and in the tagging of performances. The proposed approach can be used to increase the functionalities of a music digital library, allowing for content-based access to scores and recordings.

...read moreread less

17 citations

Book Chapter•10.1007/3-540-36137-5_8•

Interactive Indexing and Retrieval of Multimedia Content

[...]

Marcel Worring¹, Andrew D. Bagdanov¹, Jan C. van Gemert¹, Jan-Mark Geusebroek¹, Hoang Minh¹, Guus Schreiber¹, Cees G. M. Snoek¹, Jeroen Vendrig¹, Jan Wielemaker¹, Arnold W. M. Smeulders¹ - Show less +6 more•Institutions (1)

University of Amsterdam¹

22 Nov 2002

TL;DR: This contribution considers the nature of the semantic gap in more detail and shows examples of methods that help in limiting the gap and how to employ the user's interaction for limiting the semantics gap.

...read moreread less

Abstract: The indexing and retrieval of multimedia items is difficult due to the semantic gap between the user's perception of the data and the descriptions we can derive automatically from the data using computer vision, speech recognition, and natural language processing. In this contribution we consider the nature of the semantic gap in more detail and show examples of methods that help in limiting the gap. These methods can be automatic, but in general the indexing and retrieval of multimedia items should be a collaborative process between the system and the user. We show how to employ the user's interaction for limiting the semantic gap.

...read moreread less

16 citations

Journal Article•10.1515/LIBR.2002.48•

Innovative Solutions in Automatic Classification: A Brief Summary

[...]

Erzsébet Tóth

01 Jan 2002-Libri

TL;DR: This paper presents a brief review of the various methods applied in automatic classification and describes the approaches taken in the Nordic WAIS/WWW; DESIRE II – Engineering Electronic Library System (EELS); GERHARD; and SCORPION projects.

...read moreread less

Abstract: There is a growing need for practical solutions to provide flexible access to digital documents in a structured form on the Web. The existing library classification schemes serve as good bases for achieving this goal. This paper presents a brief review of the various methods applied in automatic classification. It focuses on the main activities fulfilled within various research projects to make possible the effective automatic indexing and classification of Web sources. It describes the approaches taken in the Nordic WAIS/WWW; DESIRE II - Engineering Electronic Library System (EELS); GERHARD; and SCORPION projects. Artificial neural networks and artificial intelligence show great potential.

...read moreread less

Proceedings Article•

Extracting Information for Automatic Indexing of Multimedia Material.

[...]

Horacio Saggion¹, Hamish Cunningham, Diana Maynard, Kalina Bontcheva, Oana Hamza, Christian Ursu, Yorick Wilks - Show less +3 more•Institutions (1)

University of Sheffield¹

1 May 2002

TL;DR: The approach to IE relies on a finite state machinery provided by GATE, a General Architecture for Text Engineering, pipelined with full syntactic analysis and discourse interpretation implemented in Prolog.

...read moreread less

Abstract: This paper discusses our work on information extraction (IE) from multi-lingual, multi-media, multi-genre Language Resources, in a domain where there are many different event types This work is being carried out in the context of MUMIS, an EU-funded project that aims at the development of basic technology for the creation of a composite index from multiple and multi-lingual sources Our approach to IE relies on a finite state machinery provided by GATE, a General Architecture for Text Engineering, pipelined with full syntactic analysis and discourse interpretation implemented in Prolog

...read moreread less

Proceedings Article•10.1109/EURMIC.2002.1046145•

Advanced indexing and retrieval in present-day content management systems

[...]

T. Kunkelmann, R. Brunelli

10 Dec 2002

TL;DR: The ongoing work of integrating new automatic indexing and retrieval systems into a content management system is presented and some European-funded projects with partners from industry, broadcast organizations, and research institutes are described.

...read moreread less

Abstract: A content management system in the broadcast domain comprises a system that provides functionality for long-term preservation of continuous digital media, as well as for annotation, retrieval, and re-use of the content. In order to enrich the annotation without increasing the human workloads a system should generate as much metadata as possible automatically. In this paper, the ongoing work of integrating new automatic indexing and retrieval systems into a content management system is presented. The integration efforts described here are part of some European-funded projects with partners from industry, broadcast organizations, and research institutes.

...read moreread less

Journal Article•10.1107/S0021889802001838•

Automatic indexing of area-detector data of periodic and aperiodic crystals

[...]

Katrin Pilz¹, Michael Estermann¹, Sander van Smaalen¹•Institutions (1)

University of Bayreuth¹

01 Apr 2002-Journal of Applied Crystallography

TL;DR: An autoindexing procedure is described that produces the indexing of diffraction data of aperiodic crystals using a computer program called BAYINDEX, and very good agreement between experimental and theoretical reflection positions is found.

...read moreread less

Abstract: An autoindexing procedure is described that produces the indexing of diffraction data of aperiodic crystals. The procedure has been designed for indexing the data obtained with an area detector, but it can also be applied to data obtained with a single-point detector. The essential step in the indexing process is the ability to discriminate between reflections that fit to a reciprocal lattice, the satellite reflections and possible reflections that do not belong to this indexing. To achieve this goal, the refinement of the orientation matrix and the diffractometer parameters is made an intrinsic part of the process of indexing. The proposed autoindexing procedure has been implemented in a computer program called BAYINDEX. Successful application to data sets of three different one-dimensionally modulated structures, one two-dimensionally modulated structure and a periodic crystal is presented. Very good agreement between experimental and theoretical reflection positions is found. The indexing produced by BAYINDEX can serve as the basis for integration routines.

...read moreread less

Bilingual Indexing for Information Retrieval with AUTINDEX

[...]

Rita, Pease, Catherine, Schmidt, Paul, Maas - Show less +2 more

1 Jan 2002

TL;DR: AUTINDEX is a bilingual automatic indexing system for the two languages German and English to automatically index large quantities of abstracts of scientific and technical papers from several areas of engineering.

...read moreread less

Abstract: AUTINDEX is a bilingual automatic indexing system for the two languages German and English. It is being developed within the EU-funded BINDEX project. The aim of the system is to automatically index large quantities of abstracts of scientific and technical papers from several areas of engineering. Automatic indexing takes place using a controlled vocabulary provided in monolingual and bilingual thesauri. AUTINDEX produces for a given abstract a list of descriptors as well as a list of classification codes using these thesauri. It also allows for free indexing indexing with an unrestricted vocabulary (delivering so called 'free descriptors ). These free descriptors are used to enhance and extend the thesauri. The bilingual AUTINDEX module indexes German abstracts in English and

...read moreread less

Proceedings Article•10.1109/ICASSP.2002.5743639•

Automatic indexing of lecture speech by extracting topic-independent discourse markers

[...]

Tatsuya Kawahara¹, Masahiro Hasegawa¹•Institutions (1)

Kyoto University¹

13 May 2002

TL;DR: Experimental results show that the proposed method realizes better indexing performance (better precision at high recall rates) than the simple baseline method using pause information only, and it is shown to be robust against speech recognition errors.

...read moreread less

Abstract: Automatic detection of section (sub-topic) boundaries in lecture speech is addressed. The method makes use of the characteristic expressions used in initial utterances of sections defined as discourse makers, as well as pause and language model information. The discourse markers are derived in a totally unsupervised manner based on word statistics used in the information retrieval technique. The statistics is used to select candidates picked up by other information. Experimental results show that the proposed method realizes better indexing performance (better precision at high recall rates) than the simple baseline method using pause information only. Moreover, it is shown to be robust against speech recognition errors.

...read moreread less

Journal Article•10.1016/S0165-0114(01)00054-9•

Fuzzy tolerance relations and relational maps applied to information retrieval

[...]

László T. Kóczy, Tamás D. Gedeon¹, Judit A. Kóczy•Institutions (1)

University of New South Wales¹

16 Feb 2002-Fuzzy Sets and Systems

TL;DR: This study addresses the problem of how fuzzy tolerance and similarity relations can be generated from the occurrence frequencies, especially as these are based on possibilistic rather than probabilistic measures, and also how the Relations can be implemented by fuzzy relevance matrices.

...read moreread less

Journal Article•10.1527/TJSAI.17.398•

Automatic Indexing Based on Term Activity

[...]

Naohiro Matsumura¹, Yukio Ohsawa², Mitsuru Ishizuka¹•Institutions (2)

University of Tokyo¹, University of Tsukuba²

01 Jan 2002-Transactions of The Japanese Society for Artificial Intelligence

TL;DR: This paper proposes an automatic indexing method named PAI (Priming Activation Indexing) which extracts keywords expressing assertions of a document by employing a spreading activation model to extract keywords based on the activity of terms.

...read moreread less

Abstract: With the increasing number of electronic documents, automatic indexing from a document is an essential approach in information retrieval systems, such as search engines. This paper proposes an automatic indexing method named PAI (Priming Activation Indexing) which extracts keywords expressing assertions of a document. The basic idea is that since an author writes a document for insisting on his/her main point, impressive terms to be born in the mind of the reader could represent the asserted keywords of the document. Our approach employs a spreading activation model to extract keywords based on the activity of terms without using corpus, thesaurus, syntactic analysis, dependency relations between terms, and the other knowledge except for stop-word list. Experimental evaluations are reported by applying PAI to both papers and the archives of a mailing-list.

...read moreread less

Proceedings Article•10.1109/ICASSP.2002.1005660•

Automatic indexing of lecture speech by extracting topic-independent discourse markers

[...]

Kawahara, Hasegawa

1 Jan 2002

Journal Article•

Combining voice recognition and automatic indexing of medical reports.

[...]

André Happe, Bruno Pouliquen, Anita Burgun, Marc Cuggia, Le Beux P - Show less +1 more

01 Jan 2002-Studies in health technology and informatics

TL;DR: The methods used to evaluate existing voice recognition software programs are introduced and NOMINDEX, a system that turns a medical text into MeSH codes, using the French ADM lexical database is presented.

...read moreread less

Abstract: Medical records have been evolving from the traditional paper-based records to digital ones, from the method of dictating reports and transcription to voice recognition systems The transition to digital operations will not be complete until we have the ability to combine voice recognition with automated indexing of texts This paper introduces the methods we used to evaluate existing voice recognition software programs and presents NOMINDEX, a system that turns a medical text into MeSH codes, using the French ADM lexical database Those systems were applied to 28 patient discharge summaries in French, produced after a coronarography, and extracted from the MENELAS corpus of texts Using the best configuration for voice recognition, the rate of accurate recognition exceeds 98 percent Among the indexing concepts assigned by NOMINDEX, 25 percent were not pertinent and 12 percent of the relevant concepts were missing Most errors were related to confusion between common language and medical language, and to the coverage of the ADM lexical database Best results would be expected with a more comprehensive lexical resource In addition, only 3 percent of the errors generated by inadequate voice recognition that remained in the configuration that performed better, impacted on automatic indexing by NOMINDEX

...read moreread less

Proceedings Article•10.1109/ICPR.2002.1047419•

Image retrieval using re-segmentation driven by query rectangles

[...]

Luigi Cinque, F. G. De Rosa, F. Lecca, Stefano Levialdi, Steven L. Tanimoto¹ - Show less +1 more•Institutions (1)

University of Washington¹

11 Aug 2002

TL;DR: This method enables the construction of image retrieval systems with completely automatic indexing because the rectangles in the user's query are used to control a partial re-segmentation of each candidate image.

...read moreread less

Abstract: In this paper we address two key issues in image retrieval: (1) the use of rectangles in queries to express properties of regions in the desired target images; and (2) the use of over-segmentation to build the index of images in the database. In our method, the rectangles in the user's query are used to control a partial re-segmentation of each candidate image. These query-driven partial re-segmentations provide the features for determining the distance between the query and each candidate, so that the closest candidates can be determined and retrieved. This method enables the construction of image retrieval systems with completely automatic indexing.

...read moreread less

Posted Content•

An approach to automatic indexing of scientific publications in High Energy Physics for database SPIRES HEP

[...]

A. V. Averin, L. A. Vassilevskaya

28 Nov 2002-arXiv: Information Retrieval

TL;DR: An automatic indexing system, AUTEX, is presented, which is applied to keyword index e-prints in selected areas in high energy physics (HEP) making use of the DESY-HEPI thesaurus as a controlled vocabulary.

...read moreread less

Abstract: We introduce an approach to automatic indexing of e-prints based on a pattern-matching technique making extensive use of an Associative Patterns Dictionary (APD), developed by us Entries in the APD consist of natural language phrases with the same semantic interpretation as a set of keywords from a controlled vocabulary The method also allows to recognize within e-prints formulae written in TeX notations that might also appear as keywords We present an automatic indexing system, AUTEX, which we have applied to keyword index e-prints in selected areas in high energy physics (HEP) making use of the DESY-HEPI thesaurus as a controlled vocabulary

...read moreread less

Dissertation•

N-Gram-Based Automatic Indexing for Amharic text

[...]

Mengistum Bethelhem

1 Jun 2002

Journal Article•10.5771/0943-7444-2002-3-4-171•

Morpho-syntactic parsing for a text mining environment: An NP recognition model for knowledge visualization and information retrieval

[...]

Sahbi Sidhom, Mohamed Hassoun

15 Sep 2002-Knowledge Organization

TL;DR: Sidhom and Hassoun as discussed by the authors implemented the Cascaded Augmented Transition Network (ATN) in order to analyse French text descriptions of Multimedia documents, which is considered as an investigative tool towards the knowledge organization and management of multiform e(electronic)-documents (text, multimedia, audio, image) using their text descriptions.

...read moreread less

Abstract: Sidhom and Hassoun discuss the crucial role of NLP (Natural Language Processing) tools in Knowledge Extraction and Management as well as in the design of Information Retrieval Systems. The authors focus more specifically on the morpho-syntactic issues by describing their morpho-syntactic analysis platform, which has been implemented to cover the automatic indexing and information retrieval topics. To this end they implemented the Cascaded Augmented Transition Network (ATN). They used this formalism in order to analyse French text descriptions of Multimedia documents. An implementation of an ATN parsing automaton is briefly described. The Platform in its logical operation is considered as an investigative tool towards the knowledge organization (based on an NP -Noun Phrase- recognition model) and management of multiform e(electronic)-documents (text, multimedia, audio, image) using their text descriptions.

...read moreread less

Book•

Document Analysis Systems V: 5th International Workshop, DAS 2002, Princeton, NJ, USA, August 19-21, 2002. Proceedings

[...]

Daniel P. Lopresti, Jianying Hu, Ramanujan S. Kashi

7 Aug 2002

TL;DR: OCR Features and Systems, including a Stochastic Model Combining Discrete Symbols and Continuous Attributes and Its Application to Handwriting Recognition, and a Learning Pseudo Bayes Discriminant Method based on Difference Distribution of Feature Vectors are presented.

...read moreread less

Abstract: OCR Features and Systems.- Relating Statistical Image Differences and Degradation Features.- Script Identification in Printed Bilingual Documents.- Optimal Feature Extraction for Bilingual OCR.- Machine Recognition of Printed Kannada Text.- An Integrated System for the Analysis and the Recognition of Characters in Ancient Documents.- A Complete Tamil Optical Character Recognition System.- Distinguishing between Handwritten and Machine Printed Text in Bank Cheque Images.- Multi-expert Seal Imprint Verification System for Bankcheck Processing.- Automatic Reading of Traffic Tickets.- Handwriting Recognition.- A Stochastic Model Combining Discrete Symbols and Continuous Attributes and Its Application to Handwriting Recognition.- Top-Down Likelihood Word Image Generation Model for Holistic Word Recognition.- The Segmentation and Identification of Handwriting in Noisy Document Images.- The Impact of Large Training Sets on the Recognition Rate of Off-line Japanese Kanji Character Classifiers.- Automatic Completion of Korean Words for Open Vocabulary Pen Interface.- Using Stroke-Number-Characteristics for Improving Efficiency of Combined Online and Offline Japanese Character Classifiers.- Closing Gaps of Discontinuous Lines: A New Criterion for Choosing the Best Prolongation.- Classifiers and Leaning.- Classifier Adaptation with Non-representative Training Data.- A Learning Pseudo Bayes Discriminant Method Based on Difference Distribution of Feature Vectors.- Increasing the Number of Classifiers in Multi-classifier Systems: A Complementarity-Based Analysis.- Discovering Rules for Dynamic Configuration of Multi-classifier Systems.- Multiple Classifier Combination for Character Recognition: Revisiting the Majority Voting System and Its Variations.- Layout Analysis.- Correcting for Variable Skew.- Two Geometric Algorithms for Layout Analysis.- Text/Graphics Separation Revisited.- A Study on the Document Zone Content Classification Problem.- Logical Labeling of Document Images Using Layout Graph Matching with Adaptive Learning.- A Ground-Truthing Tool for Layout Analysis Performance Evaluation.- Simple Layout Segmentation of Gray-Scale Document Images.- Tables and Forms.- Detecting Tables in HTML Documents.- Document-Form Identification Using Constellation Matching of Keywords Abstracted by Character Recognition.- Table Detection via Probability Optimization.- Complex Table Form Analysis Using Graph Grammar.- Detection Approaches for Table Semantics in Text.- A Theoretical Foundation and a Method for Document Table Structure Extraction and Decompositon.- Text Extraction.- Fuzzy Segmentation of Characters in Web Images Based on Human Colour Perception.- Word and Sentence Extraction Using Irregular Pyramid.- Word Searching in Document Images Using Word Portion Matching.- Scene Text Extraction in Complex Images.- Text Extraction in Digital News Video Using Morphology.- Indexing and Retrieval.- Retrieval by Layout Similarity of Documents Represented with MXY Trees.- Automatic Indexing of Newspaper Microfilm Images.- Improving Document Retrieval by Automatic Query Expansion Using Collaborative Learning of Term-Based Concepts.- Spotting Where to Read on Pages - Retrieval of Relevant Parts from Page Images.- Mining Documents for Complex Semantic Relations by the Use of Context Classification.- Hairetes: A Search Engine for OCR Documents.- Text Verification in an Automated System for the Extraction of Bibliographic Data.- Document Engineering.- smartFIX: A Requirements-Driven System for Document Analysis and Understanding.- Machine Learning of Generalized Document Templates for Data Extraction.- Machine Learning of Generalized Document Templates for Data Extraction.- Configuration REcognition Model for Complex Reverse Engineering Methods: 2(CREM).- Electronic Document Publishing Using DjVu.- DAN: An Automatic Segmentation and Classification Engine for Paper Documents.- Document Reverse Engineering: From Paper to XML.- New Applications.- Human Interactive Proofs and Document Image Analysis.- Data GroundTruth, Complexity, and Evaluation Measures for Color Document Analysis.- Exploiting WWW Resources in Experimental Document Analysis Research.- An Automated Tachograph Chart Analysis System.- A Multimodal System for Accessing Driving Directions.

...read moreread less

Proceedings Article•10.1109/ITCC.2002.1000394•

Determining the usefulness of manually assigned keywords for a vector space system

[...]

Kazem Taghva, Thomas A. Nartker, Julie Borsack, Allen Condit

8 Apr 2002

TL;DR: It is concluded that query expansion using manually-assigned keywords has no advantage over expansion using terms from the text of the document.

...read moreread less

Abstract: In this paper, we report on a series of experiments involving feedback and query expansion. We conclude that query expansion using manually-assigned keywords has no advantage over expansion using terms from the text of the document.

...read moreread less

Reference Entry•10.1002/0471443395.IMG039•

Image Search and Retrieval Strategies

[...]

Yi Lu Murphey¹•Institutions (1)

University of Michigan¹

15 Jan 2002

TL;DR: The demand for systems using pictorial information combined with textual description in image retrieval is growing and automatic indexing and retrieval based on image content has become the most promising techniques for large image databases and digital image libraries.

...read moreread less

Abstract: The proliferation of computer technology and digital image-acquisition hardware has led to the widespread use of image data across a variety of applications including astronomy, art, natural resources, engineering design, military, business operation, medicine, education, etc. Major research activities in the digital image databases surged after the U.S. government's Digital Library Initiative (DLI) from 1994 to 1998. The DLI research team at the University of California at Berkeley developed a work-centered digital information system that contains 450 digital text documents, 200 air photos, and over 11,000 ground photographs. The system allowed a user or a working group to access its own collections of varying data types, and to generate new materials to be added to the collection. The DLI project at the University of California (UC) at Santa Barbara focused on the development of digital databases that contain geographically referenced materials such as maps and aerial photos. The research team developed a number of tools for browsing and retrieving map images at multiple resolution. Two other major efforts in developing digital image databases were at The National Library of Medicine (NLM), a component of the National Institutes of Health (NIH), and at Time Warner. More recently, research scientists at the University of California at Berkeley and the Fine Arts Museums of San Francisco successfully launched on the Internet the largest art image database in the world, the Thinker ImageBase. Automatic image indexing and retrieving technology is fundamental in digital image databases. There are two general strategies in image retrieval, browsing and searching. Browsing is an information retrieval strategy in which a user navigates through an ordered arrangement of images by making selections from the progressive levels of a hierarchy into which the available images have been logically grouped. Image browsing relies on the user's cognitive abilities to recognize images of interest without having to formulate a specific query. Searching is an information retrieval strategy in which the user communicates with an information retrieval system by an interface that requires the user to input a search query. Images can be indexed and retrieved by text-tag information and/or by image content. Conventionally, images are indexed using text-tag information, such as title, key words, date of the work, artist, author, photographer, legend, captions, etc. Images are often subject to a wide range of interpretations, and textual descriptions can only begin to capture the richness and complexity of the semantic content of a visual image. In many applications, the complexity of the information embedded in image content such as types of objects, object attributes, and spatial relationship of objects, etc, cannot be synthesized in a few key words. It has been reported that users querying an image collection tend to be much more specific in their requests and information needs than when querying a text database. Furthermore, text indexing requires large human effort in creating the meta-data that enables visual queries and is language dependent. Large databases containing thousands and millions of digital images that can occupy gigabytes of space are almost impossible for manual indexing and searching. The demand for systems using pictorial information combined with textual description in image retrieval is growing and automatic indexing and retrieval based on image content has become the most promising techniques for large image databases and digital image libraries. This article describes the techniques developed for content-based retrieval. Keywords: retrieval; image search; indexing color; query-by-color; texture features; shape; spatial relationships; image databases

...read moreread less

Patent•

Automatically indexing robot system and processing method using the system

[...]

Nin Mohyuku

22 Mar 2002

TL;DR: In this article, an automatically indexing robot system consisting of a server 10 for storing information such as an XML document, a WPS material, image scan, a moving video material, and photograph picking-up and a robot PC 20 for retrieving the information stored in the server 10 by using an index word retrieving machine or a character recognizing machine or for extracting vector image data.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To provide an automatically indexing robot system capable of automatically indexing a text type material such as a text or word processor data, an image and a representative screen of video, and each material obtained by developing graphic data with an image and to provide a processing method using this system. SOLUTION: This automatically indexing robot system is constituted of a server 10 for storing information such as an XML document, a WPS material, image scan, a moving video material, and photograph picking-up and an index word or an image index and a robot PC 20 for retrieving the information stored in the server 10 by using an index word retrieving machine or a character recognizing machine or for extracting vector image data. Then, processing using this automatic indexing robot system is performed by successively executing a first process 100 for automatically indexing a character resource type with the material stored in the server 10, a second process 20 for automatically indexing the scanned origin image with the material stored in the server 10, and a third process 300 for automatically indexing the photographic image with the material stored in the server 10.

...read moreread less

Journal Article•10.1080/13614560108914728•

A prototype multilingual document browser for ancient Greek texts

[...]

Jeffrey A. Rydberg-Cox¹•Institutions (1)

University of Missouri–Kansas City¹

30 Jul 2002-The New Review of Hypermedia and Multimedia

TL;DR: A threaded extension between two members forming a support housing for writing means is provided with indicia for measuring the dimension of an object.

...read moreread less

Abstract: This paper describes a prototype multilingual keyword extraction and information browsing system for texts written in Classical Greek. This system automatically extracts keywords from Greek texts using a tf x idf keyword discovery routine, clusters documents into thematically coherent groups based on these keywords, translates the keywords into English, and presents this information in two different formats so that users with limited knowledge of Ancient Greek can browse the documents and orient themselves to important concepts in the collections of a digital library.

...read moreread less

Book Chapter•10.1007/3-540-46043-8_3•

A Conceptual Model for Surveillance Video Content and Event-Based Indexing and Retrieval

[...]

Farhi Marir¹, Kamel Zerzour¹, Karim Ouazzane¹, Yong Xue¹•Institutions (1)

University of North London¹

21 Apr 2002

TL;DR: The VIGILANT conceptual model for content and event-based retrieval of video images and clips using automatic annotation and indexing of contents and events representing the extracted features and recognised objects in the images captured by a video camera in a car park environment is presented.

...read moreread less

Abstract: This paper addresses the need for a semantic videoobject approach for efficient storage and manipulation of video data to respond to the needs of several classes of potential applications when efficient management and deductions over voluminous data are involved. We present the VIGILANT conceptual model for content and event-based retrieval of video images and clips using automatic annotation and indexing of contents and events representing the extracted features and recognised objects in the images captured by a video camera in a car park environment. The underlying videoobject model combines Object-Oriented modelling (OO) techniques and Description Logics (DLs) Knowledge representation. The OO technique models the static aspects of video clips and instances and their indexes will be stored in an Object-Oriented Database. The DLs model will extend the OO model to cater for the inherent dynamic content descriptions of the video, as events tend to spread over a sequence of frames.

...read moreread less