Top 23 papers published in the topic of Automatic indexing in 2011

Showing papers on "Automatic indexing published in 2011"

Patent•

A method for indexing multimedia information

[...]

Conejero David¹, Duxans Helenca¹, Escalada Gregorio¹•Institutions (1)

12 Apr 2011

TL;DR: In this article, a speech-to-text transcription of multimedia files is performed automatically by means of an ASR process, and acoustic and language models adapted for ASR at least before the latter processes the multimedia file, i.e. "a priori".

...read moreread less

Abstract: It comprises analyzing audio content of multimedia files and performing a speech to text transcription thereof automatically by means of an ASR process, and selecting acoustic and language models adapted for the ASR process at least before the latter processes the multimedia file, i.e. "a priori". The method is particularly applicable to the automatic indexing, aggregation and clustering of news from different sources and from different types of files, including text, audio and audiovisual documents without any manual annotation.

...read moreread less

106 citations

Journal Article•10.1007/S11042-010-0544-9•

Bayesian belief network based broadcast sports video indexing

[...]

Maheshkumar H. Kolekar¹•Institutions (1)

Indian Institute of Technology Patna¹

01 Aug 2011-Multimedia Tools and Applications

TL;DR: A probabilistic Bayesian belief network (BBN) method for automatic indexing of excitement clips of sports video sequences and offers a general approach to the automatic tagging of large scale multimedia content with rich semantics.

...read moreread less

Abstract: This paper presents a probabilistic Bayesian belief network (BBN) method for automatic indexing of excitement clips of sports video sequences. The excitement clips from sports video sequences are extracted using audio features. The excitement clips are comprised of multiple subclips corresponding to the events such as replay, field-view, close-ups of players, close-ups of referees/umpires, spectators, players' gathering. The events are detected and classified using a hierarchical classification scheme. The BBN based on observed events is used to assign semantic concept-labels to the excitement clips, such as goals, saves, and card in soccer video, wicket and hit in cricket video sequences. The BBN based indexing results are compared with our previously proposed event-association based approach and found BBN is better than the event-association based approach. The proposed scheme provides a generalizable method for linking low-level video features with high-level semantic concepts. The generic nature of the proposed approach in the sports domain is validated by demonstrating successful indexing of soccer and cricket video excitement clips. The proposed scheme offers a general approach to the automatic tagging of large scale multimedia content with rich semantics. The collection of labeled excitement clips provide a video summary for highlight browsing, video skimming, indexing and retrieval.

...read moreread less

78 citations

Automatic Keywords Extraction for Punjabi Language

[...]

Vishal Gupta, Gurpreet Singh Lehal

1 Jan 2011

TL;DR: The extracted keywords are very much helpful in automatic indexing, text summarization, information retrieval, classification, clustering, topic detection and tracking and web searches etc.

...read moreread less

Abstract: Automatic keywords extraction is the task to identify a small set of words, key phrases, keywords, or key segments from a document that can describe the meaning of the document. Keywords are useful tools as they give the shortest summary of the document. This paper concentrates on Automatic keywords extraction for Punjabi language text. It includes various phases like removing stop words, Identification of Punjabi nouns and noun stemming, Calculation of Term Frequency and Inverse Sentence Frequency (TF-ISF), Punjabi keywords as nouns with high TF-ISF score and title/headline feature for Punjabi text. The extracted keywords are very much helpful in automatic indexing, text summarization, information retrieval, classification, clustering, topic detection and tracking and web searches etc.

...read moreread less

18 citations

Journal Article•10.7152/ACRO.V11I1.12773•

Automatic indexing by discipline and high-level categories: Methodology and potential applications.

[...]

Susanne M. Humphrey, Thomas C. Rindflesch, Alan R. Aronson

02 Nov 2011-Advances in Classification Research Online

TL;DR: It is suggested, with several examples, that ST's may convey a unique slant of a document's content not normally represented in standard indexing vocabularies.

...read moreread less

Abstract: This paper first describes the methodology of journal descriptor (JD) ndexing, based on human indexing at the journal level using only 127 descriptors, and applying statistical methods that associate this journal indexing with text words in a training set of MEDLINE® citations. These associations form the basis for automatic indexing of documents outside the training set. The paper then presents the new technique of semantic type (ST) indexing, based on JD indexing associated with each of 134 ST's, and applying the standard cosine coefficient measure to compare the similarity between the JD indexing of a document and the JD indexing of each ST. The ST indexing of the document is the list of ST's ranked in decreasing order of similarity between the JD indexing of the document and the JD indexing of the ST's. Discussion of the potential usefulness and application of the very general indexing provided by JD's and ST's comprises the remainder of the paper. JD's have been used for more than thirty years to search MEDLINE by discipline, and discipline-based indexing is in evidence on the Web. It is suggested, with several examples, that ST's may convey a unique slant of a document's content not normally represented in standard indexing vocabularies. Use of ST indexing to rank retrieved output is mentioned as a possible application. Notwithstanding the importance of methodology and performance issues, the intent of this paper is to explore questions of the potential utility and applicability of JD and ST indexing.

...read moreread less

17 citations

Proceedings Article•10.1145/2037342.2037352•

Automatic indexing of French handwritten census registers for probate geneaology

[...]

Cedric Sibade, Thomas Retornaz, Thibauld Nion, Romain Lerallut, Christopher Kermorvant - Show less +1 more

16 Sep 2011

TL;DR: The complete indexing process of the registers of a French census dating back to more than a hundred years is described, from image analysis to the integration into the information system, in the context of probate genealogy.

...read moreread less

Abstract: This paper describes the complete indexing process of the registers of a French census dating back to more than a hundred years, from image analysis to the integration into the information system, in the context of probate genealogy. The documents of interest are composed of a table of personal information in which the cells containing the first name, the surname and the relation to head of household must be extracted and recognized. More than 30 millions of cells were processed and their content either directly integrated into the information system or sent to keyers for manual validation, allowing an automation rate at 80% while keeping the error rate below 15% on average. Based on this project, we have started the development of a generic platform for table-based historical documents processing including new functionalities and a more generic and user-friendly table model definition interface.

...read moreread less

16 citations

Book Chapter•10.1007/978-3-642-21034-1_15•

Automatic semantic subject indexing of web documents in highly inflected languages

[...]

Reetta Sinkkilä¹, Osma Suominen¹, Eero Hyvönen¹•Institutions (1)

University of Helsinki¹

29 May 2011

TL;DR: This work has tested the state-of-the art automatic indexing tool Maui on Finnish texts using three stemming and lemmatization algorithms and tested it with documents and vocabularies of different domains.

...read moreread less

Abstract: Structured semantic metadata about unstructured web documents can be created using automatic subject indexing methods, avoiding laborious manual indexing. A succesful automatic subject indexing tool for the web should work with texts in multiple languages and be independent of the domain of discourse of the documents and controlled vocabularies. However, analyzing text written in a highly inflected language requires word form normalization that goes beyond rule-based stemming algorithms. We have tested the state-of-the art automatic indexing tool Maui on Finnish texts using three stemming and lemmatization algorithms and tested it with documents and vocabularies of different domains. Both of the lemmatization algorithms we tested performed significantly better than a rule-based stemmer, and the subject indexing quality was found to be comparable to that of human indexers.

...read moreread less

12 citations

Book Chapter•10.1007/978-3-642-23291-6_25•

Representation, indexing, and retrieval of biological cases for biologically inspired design

[...]

Bryan Wiltgen¹, Ashok K. Goel¹, Swaroop Vattam¹•Institutions (1)

Georgia Institute of Technology¹

12 Sep 2011

TL;DR: An information-processing analysis of biologically inspired design, a scheme for representing knowledge of designs of biological systems, and a computational technique for automatic indexing and retrieval of biological analogues of engineering problems are provided.

...read moreread less

Abstract: Biologically inspired design is an increasingly popular design paradigm. Biologically inspired design differs from many traditional case-based reasoning tasks because it employs cross-domain analogies. The wide differences in biological source cases and technological target problems present challenges for determining what would make good or useful schemes for case representation, indexing, and adaptation. In this paper, we provide an information-processing analysis of biologically inspired design, a scheme for representing knowledge of designs of biological systems, and a computational technique for automatic indexing and retrieval of biological analogues of engineering problems. Our results highlight some important issues that a case-based reasoning system must overcome to succeed in supporting biologically inspired design.

...read moreread less

11 citations

Patent•

Information extraction method and device

[...]

Lin Xinxin, Jianbo Xu, Ning Dong, Hui Wang

22 Jun 2011

TL;DR: In this paper, the authors proposed an information extraction method for text block information from a page file, wherein the text block is composed of page text block and manuscript text block, and judged when the default page text blocks information in the text blocks were extracted.

...read moreread less

Abstract: The embodiment of the invention discloses an information extraction method and an information extraction device, relating to the technical field of information extraction, and aiming to solve the problem that in the prior art, the default text block information can not be extracted from the page information and manuscript information of the newspaper through automatic indexing. The information extraction method disclosed by the embodiment of the invention comprises the following steps: extracting text block information from a page file, wherein the text block information comprises page text block information and manuscript text block information; judging when the default page text block information in the text block information is extracted; if the default page text block information is not extracted, extracting the default page text block information; and if the default page text block information is extracted, extracting the default manuscript text block information. By using the method and device disclosed by the embodiment of the invention, the workload of the indexing personnel can be reduced, and the accuracy of indexing can be enhanced.

...read moreread less

7 citations

Journal Article•

Construction and Application of the Chinese Unified Medical Language System

[...]

Zhu Wenyan¹•Institutions (1)

Peking Union Medical College¹

01 Jan 2011-Journal of Intelligence

TL;DR: These applications demonstrate CUMLS's praticality and validity for developing knowledge organizations and services for medical information resources in network environments.

...read moreread less

Abstract: The Chinese Unified Medical Language System(CUMLS),which consists of three components,namely medical vocabulary,semantic network and lexical tools,integrates more than ten biomedical sources such as biomedical thesauri,classifications,terminologies,and text words of biomedical literature.Based on CUMLS,the applications including automatic indexing,knowledge navigation,intelligent retrieval,etc.,are realized.These applications demonstrate CUMLS's praticality and validity for developing knowledge organizations and services for medical information resources in network environments.

...read moreread less

7 citations

Journal Article•10.7152/ACRO.V11I1.12783•

Experiments in indexing multimedia data at multiple levels.

[...]

Alejandro Jaimes¹, Ana B. Benitez¹, Corinne Joergensen², Shih-Fu Chang¹•Institutions (2)

Columbia University¹, University at Buffalo²

02 Nov 2011-Advances in Classification Research Online

TL;DR: The increasing availability of digital images, video, and audio has created exciting new research challenges on the organization of multimedia data for a variety of purposes, including the emerging MPEG-7 standard, which aims at standardizing tools for describing multimedia data.

...read moreread less

Abstract: The increasing availability of digital images, video, and audio has created exciting new research challenges on the organization of multimedia data for a variety of purposes. While some of these challenges relate to computational techniques (e.g., automatic extraction of visual features for automatic indexing of visual data), others are conceptual in nature (e.g., design of templates for manual indexing of visual data). The key issues are what to index from the data, how to perform the indexing of the data, and how to organize the indices obtained. The indices used to describe content as well as the organization of those indices have a tremendous impact on applications, particularly on large digital libraries where different types of media need to be stored and accessed. Relevant efforts in this direction include the emerging MPEG-7 standard [5], which aims at standardizing tools for describing multimedia data.

...read moreread less

6 citations

Book Chapter•10.1007/978-3-642-22191-0_20•

Building Knowledge Representation for Multiple Documents Using Semantic Skolem Indexing

[...]

Kasturi Dewi Varathan¹, Tengku Mohd Tengku Sembok¹, Rabiah Abdul Kadir², Nazlia Omar¹•Institutions (2)

National University of Malaysia¹, Universiti Putra Malaysia²

27 Jun 2011

TL;DR: A new approach in creating semantic skolem indexing for multiple documents that automatically index all the documents into single knowledge representation to retrieve the answer for users query is presented.

...read moreread less

Abstract: The rapid growth of digital data and users’ information needs have made the demands for automatic indexing to become more important than before. Indexing based on keyword has proven to be unsuccessful to cater for the current needs. Thus, this paper presents a new approach in creating semantic skolem indexing for multiple documents that automatically index all the documents into single knowledge representation. The skolem indexing matrix will then be incorporated in question answering system to retrieve the answer for users query.

...read moreread less

Semi-automatic Knowledge Extraction, Representation, and Context-Sensitive Intelligent Retrieval of Video Content Using Collateral Context Modelling with Scalable Ontological Networks.

[...]

Atta Badii¹, Chattun Lallah¹, Meng Zhu¹, Michael Crouch¹•Institutions (1)

University of Reading¹

1 Jan 2011

TL;DR: This chapter describes the architecture of a system designed to semi-automatically and intelligently index huge repositories of special effects video clips and uses a network of scalable ontologies to represent the semantic content to further enable intelligent retrieval.

...read moreread less

Abstract: Automatic indexing and retrieval of digital data poses major challenges. The main problem arises from the ever increasing mass of digital media and the lack of efficient methods for indexing and retrieval of such data based on the semantic content rather than keywords. To enable intelligent web interactions, or even web filtering, we need to be capable of interpreting the information base in an intelligent manner. For a number of years research has been ongoing in the field of ontological engineering with the aim of using ontologies to add such (meta) knowledge to information. In this paper, we describe the architecture of a system (Dynamic REtrieval Analysis and semantic metadata Management (DREAM)) designed to automatically and intelligently index huge repositories of special effects video clips, based on their semantic content, using a network of scalable ontologies to enable intelligent retrieval. The DREAM Demonstrator has been evaluated as deployed in the film post-production phase to support the process of storage, indexing and retrieval of large data sets of special effects video clips as an exemplar application domain. This paper provides its performance and usability results and highlights the scope for future enhancements of the DREAM architecture which has proven successful in its first and possibly most challenging proving ground, namely film production, where it is already in routine use within our test bed Partners' creative processes.

...read moreread less

Book Chapter•10.1007/978-3-642-21916-0_76•

High-performance music information retrieval system for song genre classification

[...]

Amanda C. Schierz¹, Marcin Budka¹•Institutions (1)

Bournemouth University¹

28 Jun 2011

TL;DR: A music genre classification system which was a winning solution in the Music Information Retrieval ISMIS 2011 contest is described, which consisted of a powerful ensemble classifier using the Error Correcting Output Coding coupled with an original, multi-resolution clustering and iterative relabelling scheme.

...read moreread less

Abstract: With the large amounts of multimedia data produced, recorded and made available every day, there is a clear need for well-performing automatic indexing and search methods. This paper describes a music genre classification system, which was a winning solution in the Music Information Retrieval ISMIS 2011 contest. The system consisted of a powerful ensemble classifier using the Error Correcting Output Coding coupled with an original, multi-resolution clustering and iterative relabelling scheme. The two approaches used together outperformed other competing solutions by a large margin, reaching the final accuracy close to 88%.

...read moreread less

A Semi-Automatic Approach of old Arabic Documents Indexing.

[...]

Abderrahmane Kefali, Chaouki Chemmam

1 Jan 2011

TL;DR: This paper proposes a semiautomatic approach of old Arabic documents images indexing and searching without resorting to recognize their contents in order to deal with the incapacity of the recognition techniques to understand the contents of old documents.

...read moreread less

Abstract: indexing is a largely used technique in retrieval systems. It has as goal to extract and to represent the meaning of a document so that it can be found by the user. We can cite two types of indexing: manual indexing, and automatic indexing. The automatic indexing requires to use character and words recognition engines which work only over the texts of contemporary documents. In this paper, we propose a semiautomatic approach of old Arabic documents images indexing and searching without resorting to recognize their contents in order to deal with the incapacity of the recognition techniques to understand the contents of old documents. The proposed approach repose on the representation of the documents according to the structural features of their indexes chosen manually from each document by an expert. The approach is tested on a sample of approximately 1100 envelopes and shows good results. Keywords-component; indexing, old documents, structural features, documents analysis

...read moreread less

Natural ontology representation based on NP's properties and semantic relations

[...]

Nabil Khemiri, Sahbi Sidhom, Malek Ghenima, Henda Ben Ghezala

17 Feb 2011

TL;DR: The ontology has been proposed to capitalize the concept of knowledge as NP and its semantic relations and indexing and information retrieval processes based on noun phrase (NP) and its semantics representation are developed.

...read moreread less

Abstract: In the context of the valorization of Tunisian patrimony, we propose an approach to represent semantic properties on contents: heterogeneous information (multimedia) concerning patrimony objects. We develop indexing and information retrieval (IR) processes based on noun phrase (NP) and its semantic representation. These processes use natural language processing (NLP) to take into account the NPs structure organization. In view of this study, the ontology has been proposed to capitalize the concept of knowledge as NP and its semantic relations.

...read moreread less

A network approach to topic summary and knowledge discovery in social tagging

[...]

Xin Xiang

25 Aug 2011

TL;DR: This doctoral research focuses on studying the semantic relations between social tags, items and content creators through co-occurrence analysis, social network analysis and information visualization, thus revealing the role played by social tags in representing and classifying contents and creators, and implications they might have for facilitating information seeking practice.

...read moreread less

Abstract: As evidenced by the growing popularity of collaborative tagging sites like librarything, last.fm and del.icio.us, social tagging has provided a social and information organizing platform that warrants public attention and academic investigation alike. This doctoral research focuses on studying the semantic relations between social tags, items and content creators through co-occurrence analysis, social network analysis and information visualization, thus revealing the role played by social tags in representing and classifying contents and creators, and implications they might have for facilitating information seeking practice, particularly knowledge discovery and information summary, and as a result, helping the design of information retrieval and browsing interface. User-oriented studies are conducted to evaluate the advantage of visual and presentational features based on tagging analysis over existing constructs such as tag clouds in performing high-level information seeking tasks. The social tagging paradigm is widely considered an extension beyond keyword-based indexing and hierarchical classification schemes. The new massive manual indexing method characterized by social tagging differs from automatic indexing that lays the foundation of modern information retrieval in that its manual nature obviates the common pitfalls of computer-based automatic indexing. It also complements traditional manual indexing since tag word distribution reflects the opinions of a large number of people with various background and knowledge instead of a limited number of domain experts who are dominant in the classification and cataloging undertakings. Parallel to the observation that an individual’s social identity is defined by the collectivities to which the individual belongs, the topical, temporal, geographic, and stylistic features

...read moreread less

LOHAI: Providing a baseline for KOS based automatic indexing

[...]

Kai Eckert¹•Institutions (1)

University of Mannheim¹

1 Jan 2011

TL;DR: This work proposes a straight-forward linguistic indexer, that can be used as a basis for own developments and for experiments and analyses to explore own documents and KOSs; it uses state-of-the- art information retrieval techniques and hence forms a suitable baseline for evaluations.

...read moreread less

Abstract: Automatic KOS based indexing { i.e. indexing based on a restricted, controlled vocabulary, a thesaurus or a classication { can play an important role to close the gap between the intellectually, high quality indexed publications and the mass of unindexed publications. Especially for unknown, heterogeneous publications, like web publications, simple processes that do not rely on manually created training data are needed. With this contribution, we propose a straight-forward linguistic indexer, that can be used as a basis for own developments and for experiments and analyses to explore own documents and KOSs; it uses state-of-the- art information retrieval techniques and hence forms a suitable baseline for evaluations. Finally, it is free and open source.

...read moreread less

Fuzzy Recognition Method for Fish Ontology Retrieving

[...]

郑西涛, 张永伟

1 Jan 2011

TL;DR: This paper presents a new method based on ontology formation and fuzzy recognition of digital pictures that will be able to make automatic creation of the fish geometric ontology and automatic indexing to existing Semantic Web.

...read moreread less

Abstract: This paper presents a new method based on ontology formation and fuzzy recognition of digital pictures.Ontology creation and document indexing are well-known bottlenecks for integrating semantic services and for the Semantic Web,and thus the new method will be able to make automatic creation of the fish geometric ontology and automatic indexing to existing Semantic Web.Fuzzy set and fuzzy recognition are used to decide wheter a new fish picture belongs to an existing training set,here with the carp as an example.Training samples are used to set up fuzzy set and membership functions.The existing way of fish ontology formation can be integrated with the new method and the existing work for fish web can be used.

...read moreread less

Journal Article•10.5539/CIS.V4N2P125•

Study on the Optimization Design of the Subject Indexing Based on the Word-frequency Statistics

[...]

Huafeng Xie, Fang Wu, Xuying Lu

11 Feb 2011-Computer and Information Science

TL;DR: A new weighting function is established in this article, comprehensively combining with four important factors such as the weight value of subject words, the classes, the specificity, and the cohesion relation to standardize the indexing of the subject words of the official document.

...read moreread less

Abstract: Based on the traditional word frequency statistical function, the new weighting function is established in this article, comprehensively combining with four important factors such as the weight value of subject words, the classes, the specificity, and the cohesion relation This new method could standardize the indexing of the subject words of the official document, enhance the work efficiency, realize the automatic indexing, and reduce the mistakes because of personal factors In addition, the program design and the implementation of the computer language of this method are also introduced in this article

...read moreread less

10.34917/2255264•

Topic detection and tracking using hidden Markov models

[...]

Aditya Sowmya Tatavarty

1 Jan 2011

Journal Article•10.4028/WWW.SCIENTIFIC.NET/AMR.403-408.817•

Automatic Language-Independent Indexing of Documents Using Image Processing

[...]

Aishanou Osha Rait¹, K.S. Venkatesh²•Institutions (2)

Birla Institute of Technology and Science¹, Indian Institutes of Technology²

01 Nov 2011-Advanced Materials Research

TL;DR: Using the horizontal and vertical white-spaces present in any document,herent characteristic disparities were used to distinguish pictures from text, and section-headings from the explanations that follow them and it was verified that the method implemented was language independent.

...read moreread less

Abstract: Image processing techniques have been used over the years to convert printed material into electronic form. In our work we exploit the fact that some applications may find such conversions redundant and yet satisfactorily meet the demands of the end user. Using the horizontal and vertical white-spaces present in any document, independent regions of text, pictures, tables etc. could be identified. Inherent characteristic disparities were then used to distinguish pictures from text, and section-headings from the explanations that follow them. A table of contents, showing the heading and the associated page number, was generated and displayed on the browser. Each heading was hyperlinked to the corresponding page of the original document. HTML code was written dynamically, using file handling techniques in MATLAB to accommodate for variable number of headings obtained for different documents and also from different pages of a single document. The platform thus developed was tested on various languages and it was verified that the method implemented was language independent.

...read moreread less

Journal Article•10.9717/KMMS.2011.14.8.1050•

Efficient Storage and Retrieval for Automatic Indexing of Persons in Videos

[...]

Jinseung Kim, Yongkoo Han, Young-Koo Lee

31 Aug 2011-Journal of Korea Multimedia Society

TL;DR: An efficient storage method for storing posting lists efficiently and a novel ranking technique of ordering relevant videos for efficient retrieval are proposed.

...read moreread less

Abstract: With increasing need for indexing of persons in a large video database, automatic indexing has been attracting great interest which takes advantage of automatic tagging instead of the time-consuming and costly manual tagging. However, automatic indexing approach should provide a degree of recognition proximity because it cannot identify the persons with accuracy of 100%. In this paper, we propose an efficient storage method for storing posting lists efficiently and a novel ranking technique of ordering relevant videos for efficient retrieval. Through experiment evaluations we have shown that our storage method exhibits good performance in compressing the posting list. We have also shown that the proposed ranking method is effective for finding relevant videos.

...read moreread less

Journal Article•10.1002/ASI.21451•

The importance of theories of knowledge: Indexing and information retrieval as an example

[...]

Birger Hjørland

01 Jan 2011-Journal of the Association for Information Science and Technology

TL;DR: The present article uses L&E as the point of departure for demonstrating in what way more social and interpretative understandings may provide fruitful improvements for research in indexing, knowledge organization, and information retrieval.

...read moreread less

Abstract: A recent study in information science (IS), Lykke and Eslau (2010; hereafter L&E), raises important issues concerning the value of human indexing and basic theories of indexing and information retrieval, as well as the use of quantitative and qualitative approaches in IS and the underlying theories of knowledge informing the field. The present article uses L&E as the point of departure for demonstrating in what way more social and interpretative understandings may provide fruitful improvements for research in indexing, knowledge organization, and information retrieval. The artcle is motivated by the observation that philosophical contributions tend to be ignored in IS if they are not directly formed as criticisms or invitations to dialogs. It is part of the author's ongoing publication of articles about philosophical issues in IS and it is intended to be followed by analyzes of other examples of contributions to core issues in IS. Although it is formulated as a criticism of a specific paper, it should be seen as part of a general discussion of the philosophical foundation of IS and as a support to the emerging social paradigm in this field. © 2011 Wiley Periodicals, Inc.

...read moreread less