Top 35 papers published in the topic of Automatic indexing in 2003

Showing papers on "Automatic indexing published in 2003"

Book Chapter•10.1007/978-3-540-39718-2_53•

KIM: semantic annotation platform

[...]

Borislav Popov¹, Atanas Kiryakov¹, Angel Kirilov¹, Dimitar Manov¹, Damyan Ognyanoff¹, Miroslav Goranov¹ - Show less +2 more•Institutions (1)

Ontotext¹

20 Oct 2003

TL;DR: The KIM platform allows KIM-based applications to use it for automatic semantic annotation, content retrieval based on semantic restrictions, and querying and modifying the underlying ontologies and knowledge bases.

...read moreread less

Abstract: The KIM platform provides a novel Knowledge and Information Management infrastructure and services for automatic semantic annotation, indexing, and retrieval of documents. It provides mature infrastructure for scaleable and customizable information extraction (IE) as well as annotation and document management, based on GATE. In order to provide basic level of performance and allow easy bootstrapping of applications, KIM is equipped with an upper-level ontology and a knowledge base providing extensive coverage of entities of general importance. The ontologies and knowledge bases involved are handled using cutting edge Semantic Web technology and standards, including RDF(S) repositories, ontology middleware and reasoning. From technical point of view, the platform allows KIM-based applications to use it for automatic semantic annotation, content retrieval based on semantic restrictions, and querying and modifying the underlying ontologies and knowledge bases. This paper presents the KIM platform, with emphasize on its architecture, interfaces, tools, and other technical issues.

...read moreread less

322 citations

Patent•

Method and apparatus for fast metadata generation, delivery and access for live broadcast program

[...]

Sanghoon Sull, Hyeokman Kim, Ja-Cheon Yoon, Min Gyo Chung

19 Feb 2003

TL;DR: In this article, a technique for fast indexing of live video broadcasts is provided which incorporate both efficient manual processing and automatic indexing steps to generate semantically meaningful and practically usable highlight hierarchy of broadcast television programs in real-time.

...read moreread less

Abstract: Techniques for fast indexing of live video broadcasts are provided which incorporate both efficient manual processing and automatic indexing steps to generate semantically meaningful and practically usable highlight hierarchy of broadcast television programs in real-time. In one technique, a list of predefined keywords is provided, describing the highlights, and the manual marking process can be implemented by just a few mouse clicks. A technique is provided for grouping highlights into a semantic hierarchy in real-time. A technique is provided for efficiently generating highlight metadata on live broadcast programs, using a coarse-to-fine indexing methodology in order for a operator to quickly generate highlight summaries of live broadcast programs.

...read moreread less

307 citations

Journal Article•10.1109/JOE.2003.819314•

Automatic indexing of underwater survey video: algorithm and benchmarking method

[...]

K. Lebart¹, Caleb Smith, Emanuele Trucco, David M. Lane•Institutions (1)

Heriot-Watt University¹

01 Oct 2003-IEEE Journal of Oceanic Engineering

TL;DR: A methodology for evaluating the performance of a system that automatically detects critical parts of underwater video, online or during post-mission tape analysis, on real data is presented and its performance is studied and benchmarked on real underwater data.

...read moreread less

Abstract: It is often the case that only a few sparse sequences of long videos from scientific underwater surveys actually contain important information for the expert. Locating such sequences is time consuming and tedious. A system that automatically detects those critical parts, online or during post-mission tape analysis, would alleviate the expert workload and improve data exploitation. In this paper, a methodology for evaluating the performance of such a system on real data is presented. Interesting sequences are started by changes of visual context. An algorithm to detect significant context changes in benthic videos in real time has been presented by Lebart et al. in 2000. It is used as an illustration for this methodology - its performance is studied and benchmarked on real underwater data, ground truthed by an expert biologist. Various issues relating to the complexity of the problems of automatically analyzing underwater video are also discussed.

...read moreread less

52 citations

Journal Article•10.1016/S1386-5056(03)00055-8•

Automatic concept extraction from spoken medical reports.

[...]

André Happe, Bruno Pouliquen¹, Anita Burgun, Marc Cuggia, Pierre Le Beux - Show less +1 more•Institutions (1)

International Practical Shooting Confederation¹

01 Jul 2003-International Journal of Medical Informatics

TL;DR: A combination of speech recognition and automated indexing methods substitute for current transcription and indexing practices and shows the potential benefits of combining speech recognition techniques and automatic indexing.

...read moreread less

44 citations

Journal Article•10.1093/BIOINFORMATICS/BTG010•

Identification of key concepts in biomedical literature using a modified Markov heuristic.

[...]

William H. Majoros¹, Gangadharan Subramanian, Mark Yandell²•Institutions (2)

J. Craig Venter Institute¹, Howard Hughes Medical Institute²

12 Feb 2003-Bioinformatics

TL;DR: A method of improving the quality of automatically extracted noun phrases by employing prior knowledge during the HMM training procedure for the tagger can greatly improve the quality and relevance of the extracted phrases, thereby enabling greater accuracy in downstream literature mining tasks.

...read moreread less

Abstract: Motivation: The recent explosion of interest in mining the biomedical literature for associations between defined entities such as genes, diseases and drugs has made apparent the need for robust methods of identifying occurrences of these entities in biomedical text. Such concept-based indexing is strongly dependent on the availability of a comprehensive ontology or lexicon of biomedical terms. However, such ontologies are very difficult and expensive to construct, and often require extensive manual curation to render them suitable for use by automatic indexing programs. Furthermore, the use of statistically salient noun phrases as surrogates for curated terminology is not without difficulties, due to the lack of high-quality part-of-speech taggers specific to medical nomenclature. Results: We describe a method of improving the quality of automatically extracted noun phrases by employing prior knowledge during the HMM training procedure for the tagger. This enhancement, when combined with appropriate training data, can greatly improve the quality and relevance of the extracted phrases, thereby enabling greater accuracy in downstream literature mining tasks.

...read moreread less

26 citations

Book Chapter•10.1007/978-3-540-45115-0_6•

Reducing Information Variation in Text

[...]

Agata Savary, Christian Jacquemin¹•Institutions (1)

Centre national de la recherche scientifique¹

01 Jan 2003-Lecture Notes in Computer Science

TL;DR: A review of natural language processing techniques existing in these two areas and an in-depth presentation of FASTR, a corpus processor for the recognition, normalization, and acquisition of multi-word terms are done.

...read moreread less

Abstract: We discuss the nature and the scope of linguistic (morphological, syntactic and semantic) variation of terms and its impact on two information retrieval tasks: term acquisition and automatic indexing. A review of natural language processing techniques existing in these two areas is done, along with an in-depth presentation of FASTR, a corpus processor for the recognition, normalization, and acquisition of multi-word terms.

...read moreread less

24 citations

Journal Article•10.1016/S0306-4573(02)00081-X•

Experiments in discourse analysis impact on information classification and retrieval algorithms

[...]

Jorge Morato¹, Juan Llorens¹, Gonzalo Génova¹, José Antonio Moreiro¹•Institutions (1)

Charles III University of Madrid¹

01 Nov 2003-Information Processing and Management

TL;DR: To check whether discourse variables have an impact on modern information retrieval and classification algorithms, a functional framework for information analysis in an automated environment has been proposed and results demonstrate that n-grams does not appear to have a clear dependence on discourse variables, though the k-means classification algorithm does, but only on domain terminology and document structure.

...read moreread less

Abstract: Researchers in indexing and retrieval systems have been advocating the inclusion of more contextual information to improve results. The proliferation of full-text databases and advances in computer storage capacity have made it possible to carry out text analysis by means of linguistic and extralinguistic knowledge. Since the mid 80s, research has tended to pay more attention to context, giving discourse analysis a more central role. The research presented in this paper aims to check whether discourse variables have an impact on modern information retrieval and classification algorithms. In order to evaluate this hypothesis, a functional framework for information analysis in an automated environment has been proposed, where the n-grams (filtering) and the k-means and Chen's classification algorithms have been tested against sub-collections of documents based on the following discourse variables: "Genre", "Register", "Domain terminology", and "Document structure". The results obtained with the algorithms for the different sub-collections were compared to the MeSH information structure. These demonstrate that n-grams does not appear to have a clear dependence on discourse variables, though the k-means classification algorithm does, but only on domain terminology and document structure, and finally Chen's algorithm has a clear dependence on all of the discourse variables. This information could be used to design better classification algorithms, where discourse variables should be taken into account. Other minor conclusions drawn from these results are also presented.

...read moreread less

22 citations

Journal Article•10.1080/02286203.2003.11442267•

Arabic Text Data Mining: a Root-Based Hierarchical Indexing Model

[...]

Taisir Eldos¹•Institutions (1)

Jordan University of Science and Technology¹

01 Jan 2003-International Journal of Modelling and Simulation

TL;DR: This article focuses on speeding up the information retrieval process in Arabic document base by using a root-based hierarchical indexing model, and results demonstrated that speed gain in the range of 50-100 can be achieved for typical queries.

...read moreread less

Abstract: The world has recently witnessed a tremendous growth in the volume of text documents available on the Internet, digital libraries, news sources, and company-wide intranets. Text data mining, as a m...

...read moreread less

22 citations

Journal Article•10.1109/MIS.2003.1179194•

Intelligent indexing of crime scene photographs

[...]

Katerina Pastra¹, Horacio Saggion¹, Yorick Wilks¹•Institutions (1)

University of Sheffield¹

01 Jan 2003-IEEE Intelligent Systems

TL;DR: The Scene of Crime Information System's automatic image-indexing prototype goes beyond extracting keywords and syntactic relations from captions and applies advanced natural language processing techniques to text-based image indexing and retrieval to tackle crime investigation needs effectively and efficiently.

...read moreread less

Abstract: The Scene of Crime Information System's automatic image-indexing prototype goes beyond extracting keywords and syntactic relations from captions. The semantic information it gathers gives investigators an intuitive, accurate way to search a database of cases for specific photographic evidence. Intelligent, automatic indexing and retrieval of crime scene photographs is one of the main functions of SOCIS, our research prototype developed within the Scene of Crime Information System project. The prototype, now in its final development and evaluation phase, applies advanced natural language processing techniques to text-based image indexing and retrieval to tackle crime investigation needs effectively and efficiently.

...read moreread less

20 citations

Conceptual structures and computational methods for indexing and organization of visual information

[...]

Shih-Fu Chang, Alejandro Jaimes

1 Jan 2003

TL;DR: The approach is based on a multiple strategy that combines knowledge about the geometry of multiple views of the same scene, the extraction of low-level features, the detection of objects using the VA and domain knowledge.

...read moreread less

Abstract: We address the problem of automatic indexing and organization of visual information through user interaction at multiple levels. Our work focuses on the following three important areas: (1) understanding of visual content and the way users search and index it; (2) construction of flexible computational methods that learn how to automatically classify images and videos from user input at multiple levels; (3) integration of generic visual detectors in solving practical tasks in the specific domain of consumer photography. In particular, we present the following: (1) novel conceptual structures for classifying visual attributes (the Multi-Level Indexing Pyramid ); (2) a novel framework for learning structured visual detectors from user input (the Visual Apprentice); (3) a new study of human eye movements in observing images of different visual categories; (4) a new framework for the detection of non-identical duplicate consumer photographs in an interactive consumer image organization system; (5) detailed study of duplicate consumer photographs. In the Visual Apprentice (VA), first a user defines a model via a multiple-level definition hierarchy (a scene consists of objects, object-parts, etc.). Then, the user labels example images or videos based on the hierarchy (a handshake image contains two faces and a handshake) and visual features are extracted from each example. Finally, several machine learning algorithms are used to learn classifiers for different nodes of the hierarchy. The best classifiers and features are automatically selected to produce a Visual Detector (e.g., for a handshake), which is applied to new images or videos. In the human eye tracking experiments we examine variations in the way people look at images within and across different visual categories and explore ways of integrating eye tracking analysis with the VA framework. Finally, we present a novel framework for the detection of non-identical duplicate consumer images for systems that help users automatically organize their collections. Our approach is based on a multiple strategy that combines knowledge about the geometry of multiple views of the same scene, the extraction of low-level features, the detection of objects using the VA and domain knowledge.

...read moreread less

20 citations

Proceedings Article•

Cross-language MeSH indexing using morpho-semantic normalization.

[...]

Kornél G. Markó¹, Philipp Daumke, Stefan Schulz, Udo Hahn•Institutions (1)

University of Freiburg¹

1 Jan 2003

TL;DR: The morphological segmentation and normalization procedures, as well as the mappings from subwords to MeSH terms, are described, and results from an evaluation carried out on a German-language corpus are discussed.

...read moreread less

Abstract: We consider three alternative procedures for the automatic indexing of medical documents using MeSH thesaurus identifiers as target units (document descriptors). Rather than considering complete words as the starting point of the indexing procedure, we here propose morphologically plausible subwords as basic units from which MeSH terms are derived. We describe the morphological segmentation and normalization procedures, as well as the mappings from subwords to MeSH terms, and discuss results from an evaluation carried out on a German-language corpus.

...read moreread less

Proceedings Article•10.3115/1067737.1067769•

NLP for indexing and retrieval of captioned photographs

[...]

Katerina Pastra¹, Horacio Saggion¹, Yorick Wilks¹•Institutions (1)

University of Sheffield¹

12 Apr 2003

TL;DR: The research prototype, SOCIS, goes beyond keyword-based approaches and methods that extract syntactic relations from captions; it relies on advanced Natural Language Processing techniques in order to extract relational facts.

...read moreread less

Abstract: We present a text-based approach for the automatic indexing and retrieval of digital photographs taken at crime scenes Our research prototype, SOCIS, goes beyond keyword-based approaches and methods that extract syntactic relations from captions; it relies on advanced Natural Language Processing techniques in order to extract relational facts These relational facts consist of a "pragmatic relation" and the entities this relation connects (triples of the form: ARG1-REL- ARG2) In SOCIS, the triples are used as complex image indexing terms; however, the extraction mechanism is used not only for indexing purposes but also for image retrieval using free text queries The retrieval mechanism computes similarity scores between query-triples and indexing-triples making use of a domain-specific ontology

...read moreread less

Journal Article•10.1007/BF03042324•

PAI: automatic indexing for extracting asserted keywords from a document

[...]

Naohiro Matsumura¹, Yukio Ohsawa², Mitsuru Ishizuka¹•Institutions (2)

University of Tokyo¹, University of Tsukuba²

01 Feb 2003-New Generation Computing

TL;DR: An automatic indexing method named PAI (Priming Activation Indexing) that extracts keywords expressing the author’s main point from a document based on the priming effect without using corpus, thesaurus, syntactic analysis, dependency relations between terms or any other knowledge except for stop-word list is proposed.

...read moreread less

Abstract: This paper proposes an automatic indexing method named PAI (Priming Activation Indexing) that extracts keywords expressing the author's main point from a document based on the priming effect. The basic idea is that since the author writes a document emphasizing his/her main point, impressive terms born in the mind of the reader could represent the asserted keywords. Our approach employs a spreading activation model without using corpus, thesaurus, syntactic analysis, dependency relations between terms or any other knowledge except for stop-word list. Experimental evaluations are reported by applying PAI to journal/conference papers.

...read moreread less

Proceedings Article•10.1109/ASRU.2003.1318508•

Automatic indexing of multimedia content by integration of audio, spoken language, and visual information

[...]

Katsutoshi Ohtsuki¹, Katsuji Bessho¹, Matsuo Yoshihiro¹, Shoichi Matsunaga¹, Yoshihiko Hayashi¹ - Show less +1 more•Institutions (1)

Nippon Telegraph and Telephone¹

30 Nov 2003

TL;DR: Experimental results show that topic segmentation using word conceptual vectors is superior to the conventional method using local word co-occurrence frequencies, and that the integrated segmentation provides better news story structures than would be possible with any single type of information.

...read moreread less

Abstract: This paper describes an automatic multimedia content indexing system that includes acoustic segmentation, automatic speech recognition, topic segmentation, and video indexing features. The system is intended for indexing of multimedia news programs. Speech segments extracted from news content are delivered to the speech recognition module. The speech recognition result is segmented into topics using a segmentation algorithm based on word conceptual vectors. The indexing results derived from audio and speech information are integrated with video indexing results to extract the story structure. Experimental results show that topic segmentation using word conceptual vectors is superior to the conventional method using local word co-occurrence frequencies, and that the integrated segmentation provides better news story structures than would be possible with any single type of information.

...read moreread less

Audio Indexing on the Web: a Preliminary Study of Some Audio Descriptors

[...]

Nathalie Parlangeau-Vallès, Jérôme Farinas, Dominique Fohr, Irina Illina, Ivan Magrin-Chagnolleau¹, Odile Mella, Julien Pinquier, Jean-Luc Rouas, Christine Sénac - Show less +5 more•Institutions (1)

Laboratoire Dynamique du Langage¹

1 Jul 2003

TL;DR: This paper presents an overview and recent results of the RAIVES project, a French research project on audio indexing, and presents speech/music segmentation, speaker tracking, and keywords detection.

...read moreread less

Abstract: The "Invisible Web" is composed of documents which can not be currently accessed by Web search engines, because they have a dynamic URL or are not textual, like video or audio documents. For audio documents, one solution is automatic indexing. It consists in finding good descriptors of audio documents which can be used as indexes for archiving and search. This paper presents an overview and recent results of the RAIVES project, a French research project on audio indexing. We present speech/music segmentation, speaker tracking, and keywords detection. We also give a few perspectives of the RAIVES project.

...read moreread less

Journal Article•

Language interpretation and generation for football commentary

[...]

Anton Nijholt, H.J.A. op den Akker, F.M.G. (Franciska) de Jong

20 Jan 2003-CTIT technical reports series

TL;DR: The survey surveyed a number of research efforts that deal all with football commentary but for which the technology focus differs: retrieval, interpretation or generation of commentary and related (but not necessarily language oriented) research.

...read moreread less

Abstract: Our interest in the computer processing of football commentary was at first given in by the EU/IST funded project MUMIS (Multimedia Indexing and Searching) that started in 2000 [11, 18, 19]. This project's objective is to develop technology for automatic indexing of multimedia programme material (texts, news streams, speech transcripts) and to develop a user interface that supports the conceptual querying and browsing of related video content over the internet. One of the innovative features of MUMIS is that it aims at the disclosure of video archives by applying information extraction techniques, originally developed for the textual domain. Information extraction is a technique that is typically suited for content in specific domains. In MUMIS the extraction technology is applied to the domain of football. The project requires the integration of lexicons, ontology and information extraction tools for this domain, and the development of merging algorithms to integrate the (incomplete) information coming from different sources. For example, the information extraction components should be able to extract some thirty different event types, using methods such as part of speech tagging, syntactic parsing, semantic tagging, and discourse analysis Typical football events to be detected are: kick-off, penalty, goal, halftime, free-kick, etc. The user interface should help users to formulate queries that can be matched on the annotations generated by the extraction component and linked to the time-codes of the corresponding video fragments. In parallel to the MUMIS project we surveyed a number of research efforts that deal all with football commentary but for which the technology focus differs: retrieval, interpretation or generation of commentary and related (but not necessarily language oriented) research. There were two reasons for this investigation. One obvious reason is to find out how the new technology and tools can be adapted to similar applications. The other is to see how the domain knowledge obtained for extraction purposes can be employed for other intelligent applications in the same domain. Since at the same time several of our M.Sc. students got interested in learning multi-agent systems for developing teams for the RoboCup leagues we decided to stick close to the football domain in this survey as well. In this paper we start with the survey of the domain specific research. The aim of the survey was to obtain a comprehensive view of the field could guide the selection of new research themes. In the second part of the paper we give a short introduction to the MUMIS project as it can be embedded in the general football related language technology research.

...read moreread less

Archival moving imagery in the digital environment

[...]

C.J. Sandom, P.G.B. Enser¹•Institutions (1)

University of Brighton¹

1 Jan 2003

TL;DR: It may be determined that digitisation and automatic indexing and retrieval techniques do not at present offer an alternative to the textual subject descriptive process necessary for access to information stored in the form of moving imagery.

...read moreread less

Abstract: Moving image media record much of the history of the twentieth century, and as such form an important aspect of our cultural heritage. Although potentially of great importance to both the education and commercial sectors, much of this store of knowledge is not accessible, because its content is not documented. Digitisation is being considered as a means of making historic footage more accessible by allowing moving imagery to be displayed via the Internet. Further, digitisation of still and moving imagery opens the possibility of relieving the time-consuming and expensive process of descriptive cataloguing, by using automated indexing and retrieval techniques, based on the physical attributes present in the imagery, such as colour, texture, shapes, spatial and spatiotemporal distribution. These techniques, developed by the computer science community, are generically known as Content Based Image Retrieval (CBIR). But will this type of image retrieval answer moving image archive users' information requirements? A project is being undertaken which researches the information needs of users of such archives; one of the objectives of this project is determine whether CBIR techniques can be used to answer these requirements. An analysis of requests for moving image footage received by eleven representative film collections determined that nearly 70% of the requests were for footage of a uniquely named person, group, place, event or time, and in many cases a combination of several of these facets. These are data that require to be documented in words. From this and other analyses, it may be determined that digitisation and automatic indexing and retrieval techniques do not at present offer an alternative to the textual subject descriptive process necessary for access to information stored in the form of moving imagery.

...read moreread less

Journal Article•10.1515/ABITECH.2003.23.4.305•

OPAC-Erweiterung durch automatische Indexierung: Empirische Untersuchungen mit Daten aus dem Österreichischen Verbundkatalog

[...]

Otto Oberhauser, Josef Labner

01 Jan 2003-ABI Technik

TL;DR: An empirical investigation was conducted that aimed at assessing and evaluating the use of automatic indexing for the OPACs of the Austrian Library Network and the results include an increase of relevant hits at only moderately lower precision, the reduction of zero-hit results and insights into the role of existing subject headings.

...read moreread less

Abstract: In the 1990s the German MILOS projects examined the suitability of an automatic linguistic indexing technique for library OPACs. Following this approach, an empirical investigation was conducted that aimed at assessing and evaluating the use of automatic indexing for the OPACs of the Austrian Library Network. As most users prefer to do their OPAC searches in the basic index the study focused on the effects of enriching this index with automatically generated terms. For this purpose an Aleph 500 OPAC consisting of a representative random sample of records drawn from the Austrian Union Catalogue was used for searching 100 queries in the basic index before and after adding the new index terms. The results include an increase of relevant hits at only moderately lower precision, the reduction of zero-hit results and insights into the role of existing subject headings.

...read moreread less

Patent•

Indexer for automatic indexing milling machine

[...]

Wang Tienan

25 Jun 2003

TL;DR: In this paper, the indexing device of an automatic indexing milling machine is described, consisting of a pair of drive incomplete gear and driven incomplete gear which are mutually engaged.

...read moreread less

Abstract: The utility model discloses the indexing device of an automatic indexing milling machine, mainly consisting of a pair of drive incomplete gear and driven incomplete gear which are mutually engaged. A drive shaft and a driven shaft are extended and exposed out of an indexing box of the milling machine; the driven incomplete gear is sleeved on the external end of the drive end; the driven incomplete gear is sleeved on the external end of the driven shaft. Owning to the externally hanging of the indexing device and the indexing implemented by the matching of incomplete gears, the whole indexing device of the utility model has the advantages of simple structure, precise and reliable indexing, extremely easy replacement and maintenance, being able to conveniently replace the incomplete gears with corresponding specifications according to the types screwdriver heads to be processed, being suitable for processing screwdriver heads with a plurality of types, implementing multipurpose with one machine, wide applicable range, and reducing manufacturing cost.

...read moreread less

Proceedings Article•10.1109/ASRU.2003.1318418•

Automatic indexing of key sentences for lecture archives

[...]

Tatsuya Kawahara¹, Kazuya Shitaoka¹, T. Kitade¹, Hiroaki Nanjo¹•Institutions (1)

Kyoto University¹

30 Nov 2003

TL;DR: A statistical method for inserting periods into raw speech transcriptions for improving the readability and the effectiveness of the automatic extraction of key sentences from lecture audio archives is presented.

...read moreread less

Abstract: Automatic extraction of key sentences from lecture audio archives is addressed. The method makes use of the characteristic expressions used in initial utterances of sections, which are defined as discourse markers and derived in an unsupervised manner based on word statistics. The statistics of the discourse markers is then used to define the importance of the sentences. It is also combined with the conventional tf-idf measure for content words. Experimental results confirm the effectiveness of the method using the discourse markers and its combination with the keyword-based method. We also present a statistical method for inserting periods into raw speech transcriptions for improving the readability.

...read moreread less

Proceedings Article•10.1117/12.531469•

Design and implementation of a concept-based image retrieval system with edge description templates

[...]

Jae-Hun Choi¹, Seong-hee Park¹, Soo-Jun Park¹•Institutions (1)

Electronics and Telecommunications Research Institute¹

18 Dec 2003-electronic imaging

TL;DR: This paper designs and implements a concept-based image retrieval system using feature information, more specifically, edge histogram description and demonstrates that this approach makes a favorable comparison with an approach based on color or edge features.

...read moreread less

Abstract: In this paper, we design and implement a concept-based image retrieval system using feature information, more specifically, edge histogram description. The general edge histogram framework is a novel index mechanism which allows us to describe a content of images. However, there is a significant drawback in the framework that it can not accommodate a concept-based retrieval. Even if images are only conceptually related with user queries, it may be capable of proving them to be irrelevant since their features can be different each other. Our system adapts an edge histogram descriptor and includes a knowledge used for capturing concepts from images. In the knowledge base, a concept is expressed as some of templates, which can be described by common edge histograms for the images to represent the concept well. The templates can be generated by clustering the training images related with a concept. Consequently, since an image can also be matched with some of the templates, our system is able to support an automatic mechanism for indexing the image with the concept. The indexing mechanism enables users to retrieve the images related with a query which is formulated with their intended concepts. In addition, we also demonstrate that our concept-based approach makes a favorable comparison with an approach based on color or edge features.

...read moreread less

Patent•

Chinese information automatic indexing system based on network environment

[...]

Zhang Mingsheng

16 Jul 2003

TL;DR: An automatic Chinese information index system based on network environment is composed of subject table for whole profession, univeral Chinese splitting rule library, special splitting rule libraries for each professions, universal obsolete character library, and special obsolete character libraries for every professions as discussed by the authors.

...read moreread less

Abstract: An automatic Chinese information index system based on network environment is composed of subject table for whole profession, univeral Chinese splitting rule library, special splitting rule librariesfor each professions, universal obsolete character library, special obsolete character libraries for each professions, Chinese geographic name library, geographic name splitting rule library, index inference rule library, and special index inference rule libraries for each professions.

...read moreread less

Book Chapter•10.1007/3-540-36618-0_48•

Automatic construction of theme melody index from music database for fast content-based retrievals

[...]

Chang-Hwan Shin¹, Kyong-I Ku¹, Ki-Chang Kim¹, Yoo-Sung Kim¹•Institutions (1)

Inha University¹

14 Apr 2003

TL;DR: An automatic mechanism for constructing the theme melody index from large music database is suggested and it is shown how the theme melodies index can be used for content-based music retrievals by implementing a prototype system.

...read moreread less

Abstract: In traditional content-based music information retrieval systems, users may face with longer response time, since the traditional systems mostly do syntactic processing to match query melody and whole melodies of the underlying music database. Hence, there has been a growing need for theme melody index that can support to quick retrieve the relevant music to user's query melody. In this paper, we suggested an automatic mechanism for constructing the theme melody index from large music database and also showed how the theme melody index can be used for content-based music retrievals by implementing a prototype system.

...read moreread less

Journal Article•10.1076/EPRI.12.6.430.19775•

Indización automática de vídeo

[...]

Toni Navarrete, Josep Blat

01 Nov 2003-Profesional De La Informacion

TL;DR: The article concludes by pointing out that the use of standards, like Mpeg-7, can pro- mote the development of new and richer applications based on video.

...read moreread less

Abstract: After an initial discussion of the problems presented by image and video indexing as compared to text indexing, the authors describe some of the ba- sic techniques for automatic video indexing. The content-based retrieval paradigm and some automatic methods for segmentation and key-frame identification are further described. Certain low-level parameters for identifying an image are also introduced. The authors discuss the drawbacks of such automatic methods based solely on the image and give examples from projects using accompanying information as well, such as audio and captions. The article concludes by pointing out that the use of standards, like Mpeg-7, can pro- mote the development of new and richer applications based on video.

...read moreread less

Book Chapter•10.1007/978-3-540-39592-8_19•

Representing Audio Data by FS-Trees and Adaptable TV-Trees

[...]

Alicja Wieczorkowska¹, Zbigniew W. Raś², Zbigniew W. Raś³, Li-Shiang Tsay³•Institutions (3)

Polish-Japanese Academy of Information Technology¹, Polish Academy of Sciences², University of North Carolina at Chapel Hill³

28 Oct 2003

TL;DR: Spectro-temporal sound representation is used for the purpose of automatic musical instrument recognition and Telescopic vector trees are used jointly with FS-trees to construct a new Query Answering System (QAS) for audio data.

...read moreread less

Abstract: An automatic content extraction from multimedia files based both on manual and automatic indexing is extensively explored. However, in the domain of musical data, an automatic content description of musical sounds has not been broadly investigated yet and still needs an intensive research. In this paper, spectro-temporal sound representation is used for the purpose of automatic musical instrument recognition. Assuming that musical instruments can be learned in terms of a group of features and also based on them either automatic or manual indexing of an audio file is done, Frame Segment Trees (FS-trees) can be used to identify segments of an audio marked by the same indexes. Telescopic vector trees (TV-trees) are known from their applications in text processing and recently in data clustering algorithms. In this paper, we use them jointly with FS-trees to construct a new Query Answering System (QAS) for audio data. Audio segments are returned by QAS as answers to user queries. Heuristic strategy to build adaptable TV-trees is proposed.

...read moreread less

Book Chapter•10.1007/3-540-36456-0_56•

Using natural language processing for semantic indexing of scene-of-crime photographs

[...]

Horacio Saggion¹, Katerina Pastra¹, Yorick Wilks¹•Institutions (1)

University of Sheffield¹

16 Feb 2003

TL;DR: A new approach to the automatic semantic indexing of digital photographs based on the extraction of logic relations from their textual descriptions using an ontology for the domain of application is presented.

...read moreread less

Abstract: In this paper we present a new approach to the automatic semantic indexing of digital photographs based on the extraction of logic relations from their textual descriptions. The method is based on shallow parsing and propositional analysis of the descriptions using an ontology for the domain of application. We describe the semantic representation formalism, the ontology, and the algorithms involved in the automatic derivation of semantic indexes from texts linked to images. The method has been integrated into the Scene of the Crime Information System, a crime management system for storing, indexing and retrieval of crime information.

...read moreread less

Dissertation•10.26756/TH.2003.3•

Extensibility in Arabic full-text indexing. (c2003)

[...]

Saeed Mohammad Raheel

1 Jan 2003

Journal Article•10.1002/SCJ.10417•

Compilation of dictionaries for semantic attribute analysis of television news captions

[...]

Ichiro Ide¹, Reiko Hamada², Shuichi Sakai², Hidehiko Tanaka²•Institutions (2)

National Institute of Informatics¹, University of Tokyo²

15 Nov 2003-Systems and Computers in Japan

TL;DR: The process by which words are extracted from text corpora and a thesaurus for storage on the basis of specified conditions is described and it is concluded that the compiled dictionaries are of practical use for indexing since the recall is more important in that case.

...read moreread less

Abstract: With the increase in the amount of video that is broadcast daily, there is an increasing need for storage of video in a systematic way for future reuse and retrieval. In particular, from the viewpoint of importance and usability, it is desirable to index news videos. For adequate automatic indexing based on the text information in the video, it is not sufficient to apply the simple index extraction and annotation methods which have been widely used in conventional methods. It is important to select index candidates with reference to semantic attributes. The purpose of this study is to compile dictionaries which are needed for analyzing the semantic attributes of captions (noun phrases) in TV news videos. We describe the process by which words are extracted from text corpora and a thesaurus for storage on the basis of specified conditions. The quality of the dictionaries is examined by analysis of the semantic attributes of the words appearing in actual news videos, and the results are presented. In evaluation experiments in which an existing proper noun dictionary and temporal noun dictionary were combined and used, a recall of 79 to 93% and a precision of 41 to 71% were obtained. Although the precision is low in this result, it is concluded that the compiled dictionaries are of practical use for indexing since the recall is more important in that case. © 2003 Wiley Periodicals, Inc. Syst Comp Jpn, 34(12): 32–44, 2003; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.10417

...read moreread less

Patent•

Automatic indexing of audio-textual documents based on their comprehension difficulty

[...]

Michel Plu¹•Institutions (1)

Orange S.A.¹

5 Feb 2003

TL;DR: In this article, the authors automatically index audio-textual (DAT) and audio-visual digital documents with a difficulty of comprehensibility index (IDC) of each specific document based on an index (IVE) resulting from a comparison of the elocution speed of the assessed document with at least a threshold.

...read moreread less

Abstract: The server (SD) automatically indexes audio-textual (DAT) and audio-visual digital documents in particular with a difficulty of comprehensibility index (IDC) of each specific document based on an index (IVE) resulting from a comparison of the elocution speed of the assessed document with at least a threshold. The difficulty of comprehensibility index may also depend on the numbers (nbFG) of predetermined grammatical structures included in each document as well as a vocabulary index (IVC) determined relative to predetermined glossaries. Each document is thus associated with a label assembling all the indices thereof to enable in particular students and teachers of a foreign language in front of their terminals (TE) to search for the documents in a base (SGBD) depending on their comprehension and their knowledge of the language of the documents.

...read moreread less

Book Chapter•10.1007/978-3-540-39907-0_12•

Text Categorization prior to Indexing for the CISMEF Health Catalogue

[...]

Alexandrina Rogozan¹, Aurélie Névéol¹, Stéfan Jacques Darmoni¹•Institutions (1)

Institut national des sciences appliquées de Rouen¹

18 Oct 2003

TL;DR: Preliminary results show that although this method is not as precise as others in terms of resource categorization, it can significantly benefit indexing.

...read moreread less

Abstract: This paper is positioned within the development of an automated indexing system for the CISMeF quality controlled health gateway. For disambiguation purposes, we wish to perform text categorization prior to indexing. Hence, a global approach contrasting with the classical analytical methods based on the analysis of keyword counts extracted from the text is necessary. The use of statistical compression models enables us to proceed avoiding keyword extraction at this stage. Preliminary results show that althought this method is not as precise as others in terms of resource categorization, it can significantly benefit indexing.

...read moreread less