Scispace (Formerly Typeset)
  1. Home
  2. Topics
  3. Automatic indexing
  4. 2012
  1. Home
  2. Topics
  3. Automatic indexing
  4. 2012
Showing papers on "Automatic indexing published in 2012"
Proceedings Article•10.1117/12.908542•
Automatic indexing of scanned documents: a layout-based approach

[...]

Daniel Esser, Daniel Schuster, Klemens Muthmann, Michael Stübert Berger, Alexander Schill 
22 Jan 2012
TL;DR: This work presents a novel approach to handle automatic indexing of documents based on generic positional extraction of index terms based on document templates stored in a common full text search index to find index positions that were successfully extracted in the past.
Abstract: Archiving official written documents such as invoices, reminders and account statements in business and private area gets more and more important. Creating appropriate index entries for document archives like sender's name, creation date or document number is a tedious manual work. We present a novel approach to handle automatic indexing of documents based on generic positional extraction of index terms. For this purpose we apply the knowledge of document templates stored in a common full text search index to find index positions that were successfully extracted in the past.

58 citations

Journal Article•10.5626/JCSE.2012.6.2.151•
A One-Size-Fits-All Indexing Method Does Not Exist: Automatic Selection Based on Meta-Learning

[...]

Antonio Jimeno-Yepes, James G. Mork1, Dina Demner-Fushman1, Alan R. Aronson1•
National Institutes of Health1
30 Jun 2012-Journal of computing science and engineering
TL;DR: Results show that this methodology that automatically selects indexing algorithms for each heading in Medical Subject Headings (MeSH), National Library of Medicine’s vocabulary for indexing MEDLINE, can be automated, based on previously indexed MEDLINE citations.
Abstract: We present a methodology that automatically selects indexing algorithms for each heading in Medical Subject Headings (MeSH), National Library of Medicine’s vocabulary for indexing MEDLINE. While manually comparing indexing methods is manageable with a limited number of MeSH headings, a large number of them make automation of this selection desirable. Results show that this process can be automated, based on previously indexed MEDLINE citations. We find that AdaBoostM1 is better suited to index a group of MeSH hedings named Check Tags, and helps improve the micro F-measure from 0.5385 to 0.7157, and the macro F-measure from 0.4123 to 0.5387 (both p < 0.01). Category: Convergence computing

37 citations

Journal Article•10.1108/00330331211221882•
Interactive Information Seeking, Behaviour and Retrieval

[...]

Maja Žumer
20 Apr 2012

32 citations

Journal Article•10.1002/ASI.22653•
Social tagging is no substitute for controlled indexing: A comparison of Medical Subject Headings and CiteULike tags assigned to 231,388 papers

[...]

Danielle H. Lee1, Titus Schleyer2•
University UCINF1, University of Pittsburgh2
01 Sep 2012-Journal of the Association for Information Science and Technology
TL;DR: The results show that CiteULike tags and MeSH terms are quite distinct lexically, reflecting different viewpoints/processes between social tagging and controlled indexing.
Abstract: Social tagging and controlled indexing both facilitate access to information resources. Given the increasing popularity of social tagging and the limitations of controlled indexing (primarily cost and scalability), it is reasonable to investigate to what degree social tagging could substitute for controlled indexing. In this study, we compared CiteULike tags to Medical Subject Headings (MeSH) terms for 231,388 citations indexed in MEDLINE. In addition to descriptive analyses of the data sets, we present a paper-by-paper analysis of tags and MeSH terms: the number of common annotations, Jaccard similarity, and coverage ratio. In the analysis, we apply three increasingly progressive levels of text processing, ranging from normalization to stemming, to reduce the impact of lexical differences. Annotations of our corpus consisted of over 76,968 distinct tags and 21,129 distinct MeSH terms. The top 20 tags/MeSH terms showed little direct overlap. On a paper-by-paper basis, the number of common annotations ranged from 0.29 to 0.5 and the Jaccard similarity from 2.12% to 3.3% using increased levels of text processing. At most, 77,834 citations (33.6%) shared at least one annotation. Our results show that CiteULike tags and MeSH terms are quite distinct lexically, reflecting different viewpoints/processes between social tagging and controlled indexing. © 2012 Wiley Periodicals, Inc.

31 citations

Journal Article•10.5120/8418-1092•
Efficient Content based Image Retrieval System using Mpeg-7 Features

[...]

Swapnalini Pattanaik, D. G. Bhalke
25 Sep 2012-International Journal of Computer Applications
TL;DR: This paper gives an overview idea of efficient retrieval of images using different Mpeg-7 Features and used Color Structure Descriptor for color and Edge Histogram Descriptors for texture to increase the performance of CBIR Systems.
Abstract: This paper gives an overview idea of efficient retrieval of images using different Mpeg-7 Features. Content Based Image Retrieval is a technique of automatic indexing and retrieving of images from a large data base. Feature Extraction and Similarity Matching are the two major steps for CBIR Systems. Color, Texture and Shape represent the three visual features for any image. Mpeg-7 Stands for Multimedia Content Description Interface. The main objective of Mpeg-7 is to provide a standardized set of technologies for describing multimedia content. It has allowed quick and efficient content identification, and addressing a large range of applications. The visual descriptors are classified according to the feature such as color, shape, texture, etc. This paper has used Color Structure Descriptor for color and Edge Histogram Descriptor for texture.These two features are also integrated to increase the performance of CBIR Systems. The efficiency of all methods are demonstrated with the help of results.

17 citations

Proceedings Article•10.1109/FSKD.2012.6234350•
Domain-specific term extraction from free texts

[...]

Chunxia Zhang1, Zhendong Niu1, Peng Jiang1, Hongping Fu1•
Beijing Institute of Technology1
29 May 2012
TL;DR: This paper proposes an iterative bootstrapping approach to extracting domain-specific terms from un-annotated Chinese free texts by identifying strings whose internal components are with more probabilities being inside domain- specific terms.
Abstract: Domain-specific term extraction is a subtask of domain-specific ontology construction, and has been applied into text classification, information retrieval, question answering, automatic indexing, and machine translation and so on. In this paper, we propose an iterative bootstrapping approach to extracting domain-specific terms from un-annotated Chinese free texts. The strings whose internal components are with more probabilities being inside domain-specific terms are identified as candidate terms. Experimental results on three domains of computer, military, and archaeology demonstrate the effectiveness and domain independent nature of our approach.

12 citations

Patent•
Automatic indexing method of quotations

[...]

Jinzhi Shen, Yang Shen, Chengeng Tian
30 May 2012
TL;DR: In this paper, an automatic indexing method of quotations is proposed, which is characterized by comprising the following steps: step 1: cutting a submitted document to obtain text blocks, and extracting characteristic expression strings or information fingerprints from the text blocks; and then subscribing the characteristic expressionstrings or the information fingerprints to a search engine; step 2: as for the submitted characteristic expression string or the submitted information fingerprints, recording search results as quotation sources of a corresponding text block, the ending position of the text block in the document and the correlation between the quotation sources and the relationship between the correlation
Abstract: The invention provides an automatic indexing method of quotations. The automatic indexing method is characterized by comprising the following steps: step 1: cutting a submitted document to obtain text blocks, and extracting characteristic expression strings or information fingerprints from the text blocks; and then subscribing the characteristic expression strings or the information fingerprints to a search engine; step 2: as for the submitted characteristic expression strings or the submitted information fingerprints, recording search results as quotation sources of a corresponding text block, the ending position of the text block in the document and the correlation between the quotation sources and the ending position of the text block when the search engine returns to the search results corresponding to the characteristic expression strings or the information fingerprints; and step 3: eliminating repeated quotation sources by quotation indexes and the search results in the submitted document, and indexing various ordered quotation sources according to the front-back position relation in the submitted document. The automatic indexing method helps overcome the disadvantage of extremely low efficiency in the existing manual method, and improve the indexing speed and accuracy.

11 citations

Journal Article•10.4403/JLIS.IT-5474•
The Nuovo soggettario as a service for the linked data world

[...]

Giovanni Bergamin, Anna Lucarelli
09 Jun 2012-JLIS.it
TL;DR: Three working areas have been taken into account to improve accessibility and usability of the NS in the linked data environment: SPARQL endpoint, mapping to other datasets and address the costs of bibliographic control starting from a project of automatic indexing (quality controlled) using NS in SKOS /RDF format and open source software tools.
Abstract: Nuovo soggettario (NS), edited by the National Central Library of Florence, is the Italian subject indexing tool for various types of resources. It has been developed in compliance with the IFLA recommendations, and other international standards in the field of subject indexing. This tool has been created for general and specialized Italian libraries, and for museums, multimedia libraries, archives and documentation centres. The main component of the NS is a general thesaurus available on the web since 2007 ( http://thes.bncf.firenze.sbn.it/ricerca.php ). The thesaurus comprises nowadays approximately 46.000 terms and is updated. It supports the new subject indexing practices and manages terminology deriving from collaboration between the BNCF and other libraries. The project is evolving in many directions and supporting interoperability. The main goal of the availability – since November 2010 – of the NS dataset in SKOS/RDF format, is to promote the use of this tool also beyond the traditional library environment. In this context three working areas have been taken into account: 1) improve accessibility and usability of the NS in the linked data environment: SPARQL endpoint, mapping to other datasets (including LCSH, RAMEAU, AGROVOC, EUROVOC, DBpedia); address the costs of bibliographic control starting from a project of automatic indexing (quality controlled) using NS in SKOS /RDF format and open source software tools; 3) cooperate with other institutions that are publishing linked open data.

9 citations

Proceedings Article•10.1145/2382636.2382688•
Automatic annotation of tagged content using predefined semantic concepts

[...]

Marcelo G. Manzato1, Rudinei Goularte1•
University of São Paulo1
15 Oct 2012
TL;DR: An automatic technique for semantic annotation of multimedia content based on collaborative user tags that is able to predict semantic concepts for new items without the need of complex multimedia indexing techniques is proposed.
Abstract: User tags are an important source of information that can be used to gather semantic data about the content, reducing the semantic gap and the restrictive domain of automatic indexing approaches. In this paper, we propose an automatic technique for semantic annotation of multimedia content based on collaborative user tags. Our technique faces some of the challenges of using user-generated terms, such as noise and incompleteness. Based on the actual context of a multimedia item and the co-occurrence of concepts and tags from the training set, we are able to predict semantic concepts for new items without the need of complex multimedia indexing techniques. We describe the results of our approach with an evaluation of our algorithm using a large scale dataset composed of images and user tags.

7 citations

Article original Indexation automatique de documents en santé : évaluation et analyse de sources d'erreurs Automatic indexing of health documents in French: Evaluating and analysing errors

[...]

Wiem Chebil, Lina Fatima Soualmia, Badisse Dahamna, S. J. Darmoni, Litis-Tibs Ea 
1 Jan 2012
TL;DR: The automatic index generated is compared with the manual one which is considered as the “gold standard”, and the automatic indexing of short titles and subtitles associated is analyzed.
Abstract: Catalogue and Index of French Medical Sites (CISMeF) is developed for retrieving the relevant medical information in the Internet for health professionals, the patients and students in medicine. The gathered resources are manually indexed, semi-automatically indexed or automatically indexed. Actually, the function indexing of CISMeF indexes only a part of resources that are judged the less important. Objectives. – The objective of this work is to evaluate the indexing function developed for CISMeF, and analyse generated errors. Material and method. – We used 500 clinical guidelines for the evaluation of the indexing function, based since his implementation, on the “bag of words” algorithm. The automatic index generated is compared with the manual one which is considered as the “gold standard”. We analyze the automatic indexing of short titles and subtitles associated, the automatic indexing of long titles and subtitles associated, the automatic indexing of long and short titles and subtitles associated and the automatic indexing of abstracts. The measures used for the evaluation are Precision, Recall and F-measure.

6 citations

Proceedings Article•10.1145/2132176.2132297•
The HIVE impact: contributing to consistency via automatic indexing

[...]

Hollie White1, Craig Willis2, Jane Greenberg2•
Duke University1, University of North Carolina at Chapel Hill2
7 Feb 2012
TL;DR: The results of an exploratory experiment examining consistency stemming from a machine-aided indexing approach were presented, and a framework for further exploration of automatic indexing in manual workflows was provided.
Abstract: Research has shown that automatic subject indexing is more efficient and consistent than manual indexing; yet many organizations continue to use manual indexing because of the unacceptable quality of automatically produced results. This poster presents the results of an exploratory experiment examining consistency stemming from a machine-aided indexing approach. The HIVE vocabulary server was used to present concepts to 31 workshop participants. The presentation of terms via an automatic sequence reduced the indexer burden and contributed to increased consistency. This poster reports initial results and provides a framework for further exploration of automatic indexing in manual workflows.
Patent•
Indexing method of patent document

[...]

Xiaoshan Jiang
10 Oct 2012
TL;DR: In this paper, an indexing method of a patent document is described, and the method comprises the following steps of: supplying patent document database of a relative technology subject; establishing a technological category of the technology subject and corresponding key characters/keywords thereof; dividing the technological category according to a content generally included in the patent document; and in addition, classifying technological means, wherein if necessary, further classifying the technological means and selecting a part of or all the patent documents, carrying out indexing, establishing a corresponding relationship of each patent document with the key characters and the
Abstract: The invention discloses an indexing method of a patent document, and the method comprises the following steps of: supplying a patent document database of a relative technology subject; establishing a technological category of the technology subject and corresponding key characters/keywords thereof; dividing the technological category according to a content generally included in the patent document; and in addition, classifying technological means, wherein if necessary, further classifying the technological means, selecting a part of or all the patent documents, carrying out indexing, establishing a corresponding relationship of each patent document with the key characters/keywords and the technological category, and in the step, revising the technological category or the key characters corresponding to the technological category according to the indexed patent documents In the indexing process, a part of the patent documents can be indexed only, and the rest part of patent documents without indexing is indexed by an automatic indexing method By virtue of the indexing method, the reading and indexing speeds of the patent documents can be accelerated, and the beneficial effects are quite obvious when the number of the patent documents needing reading and indexing is large
Posted Content•
Detecting multiword phrases in mathematical text corpora

[...]

Winfried Gödert
26 Mar 2012-arXiv: Computation and Language
TL;DR: An approach for detecting multiword phrases in mathematical text corpora based on characteristic features of mathematical terminology using a software tool named Lingo which allows to identify words by means of previously defined dictionaries for specific word classes as adjectives, personal names or nouns.
Abstract: We present an approach for detecting multiword phrases in mathematical text corpora. The method used is based on characteristic features of mathematical terminology. It makes use of a software tool named Lingo which allows to identify words by means of previously defined dictionaries for specific word classes as adjectives, personal names or nouns. The detection of multiword groups is done algorithmically. Possible advantages of the method for indexing and information retrieval and conclusions for applying dictionary-based methods of automatic indexing instead of stemming procedures are discussed.
Patent•
Method for retrieval of arabic historical manuscripts

[...]

Mohammad Husni Najib Yahia1, Wasfi G. Al-Khatib1•
King Fahd University of Petroleum and Minerals1
12 Dec 2012
TL;DR: In this paper, a method for retrieval of Arabic historical manuscripts using Latent Semantic Indexing (LSI) approaches the problem of manuscripts indexing and retrieval by automatic indexing of historical Arabic manuscripts through word spotting, using text image similarity of keywords.
Abstract: The method for retrieval of Arabic historical manuscripts using Latent Semantic Indexing approaches the problem of manuscripts indexing and retrieval by automatic indexing of Arabic historical manuscripts through word spotting, using “Text Image” similarity of keywords. The similarity is computed using Latent Semantic Indexing (LSI). The method involves a manuscript page preprocessing step, a segmentation step, and a feature extraction step. Feature extraction utilizes a circular polar grid feature set. Once the salient features have been extracted, indexing of historical Arabic manuscripts using LSI is performed in support of content-based image retrieval (CBIR).
Book Chapter•10.1007/978-3-642-36137-1_28•
Design and Realization of Agricultural Information Intelligent Processing and Application Platform

[...]

Dan Wang
19 Oct 2012
TL;DR: The software product is an information processing platform integrated intelligent gathering, automatic indexing, and search and utilization of agriculture information that is tested to be accurate and can substitute people to process information to some extent.
Abstract: The work of this paper aims to provide a practical software product, the intelligently gathering, processing and utilizing for the leaders of agricultural departments, research institutions, library and information systems and information consultancy departments, etc. The software product is an information processing platform integrated intelligent gathering, automatic indexing, and search and utilization of agriculture information. This software is tested to be accurate and can substitute people to process information to some extent.
Patent•
Rapid typesetting system and method

[...]

Xu Qian, Yu Dazhou, Liang Xun, Yuan Renhui
26 Dec 2012
TL;DR: In this article, a rapid typesetting system consisting of an automatic typesetting module, a check module, and a typesetting error correction and management module is described, where the automatic type-setting module is used for indexing a pre-processed document, and the check module was used for checking the indexing result and correcting wrong index terms.
Abstract: The invention discloses a rapid typesetting system and a method. The system comprises: an automatic typesetting module, a check module and a typesetting error correction and management module, wherein the automatic typesetting module is used for indexing a pre-processed document, carrying out a knowledge-based automatic indexing algorithm for the indexed document so as to obtain an indexing result, and processing the indexing result for regulating contents; the check module is used for checking the indexing result and correcting wrong index terms; and the typesetting error correction and management module is used for typesetting according to the indexing result and correcting the wrong typesetting. The rapid typesetting system has the advantages of high typesetting speed, high quality and high indexing correct rate, wherein the correct rate for the main index terms is more than 95%, the correct rate for the secondary index terms is more than 90%, and the overall typesetting cost of an automatic typesetting system based on the automatic indexing algorithm is reduced by 30%.
Automatic indexing in e-government: Improved access to administrative documents for professional users?

[...]

Tanja Svarre Jonasen
18 Oct 2012
Book Chapter•10.1007/978-3-642-34478-7_25•
Approaches for the detection of the keywords in spoken documents application for the field of e-libraries

[...]

Bendib Issam, Laouar Mohamed Ridda
12 Nov 2012
TL;DR: The goal of this paper is to propose an approach for document management based multimedia indexing techniques to detect speech and keywords, based on the combinations of two techniques (PSPL, S-PSPL and CN, like on technique LVR).
Abstract: Automatic indexing of multimedia documents across several different application tasks, including searching for words spoken, the detection of keywords and audio information retrieval. Thus, despite the changes made in the field of indexing speech, much remains to be done particularly for the key word search in spontaneous speech. Although the research areas of spoken words and audio retrieval has been well addressed, but still significant limitations to achieve, especially in terms of resource available today on the web. The goal of this paper is to propose an approach for document management based multimedia indexing techniques to detect speech and keywords. We present in this article the various methods of indexing with the techniques of detection of key words. These methods derive three principal approaches from vocal indexing: the detection of key word, the detection of key words on phonetic flow (PSPL, CN,...) and the indexing containing the recognition with great vocabulary (LVR). We present, thereafter the step suggested for an approach based on the combinations of two techniques (PSPL, S-PSPL and CN, like on technique LVR. A validation of this approach of indexing and information retrieval is in the course of validation for the field of the E-libraries.
Journal Article•
Korean Document Indexing and Evaluating Based on N-GRAM

[...]

Jiang Ding-de, Dpr Korea
01 Jan 2012-Journal of Chinese Computer Systems
TL;DR: A new Korean automatic indexing method is suggested using N-gram method for processing unregistered words and using rules of statement types,icles are separated from the statements, the statements which are failed during morphemic analysis are separated.
Abstract: When Korean documents are indexed in information retrieval,generally nouns are extracted as index words after statement and morphemic analysis.But during morphemic analysis,due to the fuzz of analysis it's very difficult to extract unregistered words as index words correctly which are not on reference dictionary.As for N-gram,linguistic analysis is not needed,so indexing speed is quick and it's very effective for the analysis of unregistered words which are not on morphemic analysis dictionary.And it's also effective for analysis of compound nouns.But if N-gram method is compared with other indexing methods,index words are extracted too much relatively and use the storage space ineffectively.And it also has a disadvantage of lowering the efficiency of the index.In this paper,in order to cope with these disadvantages of N-gram,a new Korean automatic indexing method has been suggested.In this method,first substantives and terms are extracted as index words and using rules of statement types,particles are separated from the statements,the statements which are failed during morphemic analysis.And finally,N-gram indexing method is used for processing unregistered words.Comparative analysis and performance evaluation have shown that the proposed indexing method is effective.
Patent•
Novel semi-automatic indexing method of Chinese scientific and technical documents

[...]

Liu Wei
19 Dec 2012
TL;DR: Wang et al. as discussed by the authors proposed a semi-automatic indexing method for Chinese scientific and technical documents, which comprises the following steps: acquiring cited documents of a documents collection to be labeled by users, so as to obtain a cited document collection.
Abstract: The invention provides a novel semi-automatic indexing method of Chinese scientific and technical documents The method comprises the following steps: acquiring cited documents of a documents collection to be labeled by users, so as to obtain a cited document collection; labeling all documents in the cited document collection to obtain labeled cited documents; constructing a network of citing relations among Chinese documents in the cited document collection, to obtain the network of citing relations among the Chinese documents in the cited document collection; and performing iterative labeling on the documents in the documents collection to be labeled by users until each document in the documents collection to be labeled by users is labeled By adopting the method, the shortcomings of low indexing efficiency and low accuracy existing in the current automatic indexing method of Chinese scientific and technical documents can be effectively overcome
Proceedings Article•10.1145/2393347.2396390•
Toward next generation coaching tools for court based racquet sports

[...]

Damien Connaghan1, Noel E. O'Connor1•
Dublin City University1
29 Oct 2012
TL;DR: An automatic event indexing and event retrieval system for tennis, which can be used to coach from beginners upwards, is presented to allow coaches to build advanced queries which existing sports coaching solutions cannot facilitate without an inordinate amount of manual indexing.
Abstract: Even with today's advances in automatic indexing of multimedia content, existing coaching tools for court sports lack the ability to automatically index a competitive match into key events. This paper proposes an automatic event indexing and event retrieval system for tennis, which can be used to coach from beginners upwards. Event indexing is possible using either visual or inertial sensing, with the latter potentially providing system portability. To achieve maximum performance in event indexing, multi-sensor data integration is implemented, where data from both sensors is merged to automatically index key tennis events. A complete event retrieval system is also presented to allow coaches to build advanced queries which existing sports coaching solutions cannot facilitate without an inordinate amount of manual indexing.
Journal Article•10.4018/IJIRR.2012100105•
Outline Shape Retrieval Using Textual Descriptors and Geometric Features

[...]

Saliha Aouat1, Slimane Larabi1•
University of Science and Technology Houari Boumediene1
1 Oct 2012
TL;DR: Geometric features extracted from Textual Description of Outline Shapes are used in this paper to perform the retrieval process and select the best model for a query silhouette.
Abstract: Content_based image retrieval is a promising approach because of its automatic indexing, recognition and retrieval. This paper is a contribution in the field of the content Based Image Retrieval (CBIR). Objects are represented by their outlines shapes (silhouettes) and described following the XLWDOS Textual Description (Larabi et al., 2003). Textual Descriptors are sensitive to noise. The authors have already developed an approach to smooth the outlines at different scales (Aouat & Larabi, 2010). The smoothing is performed by applying a convolution using the Gaussian Filter to process noisy shapes in order to match shapes descriptors. The authors have also applied an indexing process after silhouettes smoothing (Aouat & Larabi, 2009). The approaches (Aouat & Larabi, 2010; Aouat & Larabi, 2009) are very interesting for shape matching and indexing, but unfortunately, they are not appropriate to the recognition and retrieval processes because there is no use of similarity measures. In order to perform the retrieval process and select the best model for a query silhouette, the authors use in this paper Geometric features extracted from Textual Description of Outline Shapes. Outline Shape Retrieval Using Textual Descriptors and Geometric Features
Patent•
Automatic indexing device

[...]

Chengbing Yu
2 May 2012
TL;DR: In this paper, an automatic indexing device is used for automatically indexing processing circular and planar parts with punched teeth, such as abrasion-resistant disk chucks, brake pads, bevel gears and the like.
Abstract: An automatic indexing device is used for automatically indexing processing circular and planar parts with punched teeth, such as abrasion-resistant disk chucks, brake pads, bevel gears and the like. A support is mounted on a transverse guide rail of a planer, an indexing disc and an indexing head are mounted on a plane of a worktable, a lateral shaft transmission disc of the planer is connected with a first transmission hinge pin, and the first transmission hinge pin is connected with an elastic hinge pin of a handle of the indexing head by a connecting rod mechanism. When a full-cycle operation of the double housing planer is completed, a lateral rotation shaft of the double housing planer drives a connecting rod to correspondingly move in a full cycle in a linkage manner, the connecting rod drives the handle of the indexing head to rotate to a certain graduation, the elastic hinge pin at the end of the handle drives the indexing disc to realize automatic indexing, and the purpose of automatic processing in a reciprocating manner is achieved. The automatic indexing device is simple and convenient in operation, the problem that a handle of an existing indexing device needs to be troublesomely operated manually by a certain graduation when each tooth is processed is solved, the automatic indexing device realizes automatic indexing and one-step forming processing, and is accurate in indexing, indexing angles can be flexibly changed, labor intensity can be effectively reduced, and production efficiency and yield are increased.
Book Chapter•10.1016/B978-1-84334-292-2.50002-7•
Automatic indexing versus manual indexing

[...]

Pierre de Keyser
1 Jan 2012
TL;DR: This chapter gives an overview of the arguments used in the discussion between the supporters of manual indexing and those of automatic indexing.
Abstract: This chapter gives an overview of the arguments used in the discussion between the supporters of manual indexing and those of automatic indexing. The arguments against manual indexing are that it is slow, expensive, not detailed enough, that it does not lead to better retrieval, that it is outdated and document centred and that there is no consistency between indexers. The arguments against automatic indexing are that it does not provide an overview of the index terms, that it does not solve the problem of synonyms and variants, that it does not take the context into account, that it does not allow browsing related terms, that orthography may be an impediment and, finally, that it is too complex for computers. The end of the chapter gives an overview of the six most popular misconceptions about automatic indexing.
Book Chapter•10.1016/B978-1-84334-292-2.50003-9•
Techniques applied in automatic indexing of text material

[...]

Pierre de Keyser
1 Jan 2012
TL;DR: This chapter discusses automatic indexing of text material, which normally begins with lexical analysis and can imply the use of stop word lists, stemming techniques, the extraction of meaningful word combinations or statistical term weighting.
Abstract: Automatic indexing of text material can be very basic, or it can involve some advanced techniques. It normally begins with lexical analysis and it can imply the use of stop word lists, stemming techniques, the extraction of meaningful word combinations or statistical term weighting. Sometimes word combinations are linked to controlled vocabularies or classifications. For two decades now the Text REtrieval Conferences (TREC) have been the laboratory for specialists in this field.
Automatic Indexing and Information Visualization: A Study Based on Paraconsistent Logic

[...]

Carlos Alberto Correa, Nair Yumiko Kobashi
1 Jan 2012
TL;DR: The hypothesis that the use of the para-analyser under the conditions of the experiment has the ability to generate more effective clusters of similar documents is confirmed, since the constitution of more significant clusters can be used to enhance information indexing and retrieval.
Abstract: This paper reports a research to evaluate the potential and the effects of use of annotated Paraconsistent logic in automatic indexing. This logic attempts to deal with contradictions, concerned with studying and developing inconsistency-tolerant systems of logic. This logic, being flexible and containing logical states that go beyond the dichotomies yes and no, permits to advance the hypothesis that the results of indexing could be better than those obtained by traditional methods. Interactions between different disciplines, as information retrieval, automatic indexing, information visualization, and nonclassical logics were considered in this research. From the methodological point of view, an algorithm for treatment of uncertainty and imprecision, developed under the Paraconsistent logic, was used to modify the values of the weights assigned to indexing terms of the text collections. The tests were performed on an information visualization system named Projection Explorer (PEx), created at Institute of Mathematics and Computer Science (ICMC – USP São Carlos), with available source code. PEx uses traditional vector space model to represent documents of a collection. The results were evaluated by criteria built in the information visualization system itself, and demonstrated measurable gains in the quality of the displays, confirming the hypothesis that the use of the para-analyser under the conditions of the experiment has the ability to generate more effective clusters of similar documents. This is a point that draws attention, since the constitution of more significant clusters can be used to enhance information indexing and retrieval. It can be argued that the adoption of non-dichotomous (non-exclusive) parameters provides new possibilities to relate similar information.
Book Chapter•10.1016/B978-1-84334-292-2.50005-2•
The black art of indexing moving images

[...]

Pierre de Keyser
1 Jan 2012
TL;DR: Keyframe indexing extracts the most meaningful images from video fragments and can be compared with the automatic generation of a table of contents of a written text.
Abstract: Moving images are still indexed manually, but the overwhelming amount of video fragments makes it necessary to look for automatic indexing techniques Because many videos contain spoken texts, text recognition can be applied to extract text, which then can be indexed using methods designed for automatic text indexing Keyframe indexing extracts the most meaningful images from video fragments It can be compared with the automatic generation of a table of contents of a written text
Journal Article•
A method for improving the accuracy of automatic indexing of Chinese-English mixed documents

[...]

Zhao Yan, Shi Hui
25 Dec 2012-Journal of Data and Information Science
TL;DR: This method distinguishes Chinese and English documents in grammatical structures and word formation rules, and through the implementation of this method in the three phases of automatic indexing for the Chinese-English mixed documents, the results were encouraging.
Abstract: Purpose: The thrust of this paper is to present a method for improving the accuracy of automatic indexing of Chinese-English mixed documents. Design/methodology/approach: Based on the inherent characteristics of Chinese-English mixed texts and the cybernetics theory, we proposed an integrated control method for indexingn documents. It consists of qfeed-forward controlq, qin-progress controlq and qfeed-back controlq, aiming at improving the accuracy of automatic indexing of Chinese-English mixed documents. An experiment was conducted to investigate the effect of our proposed method. Findings: This method distinguishes Chinese and English documents in grammatical structures and word formation rules. Through the implementation of this method in the three phases of automatic indexing for the Chinese-English mixed documents, the results were encouraging. The precision increased from 88.54% to 97.10% and recall improved from 97.37% to 99.47%. Research limitations: The indexing method is relatively complicated and the whole indexing process requires substantial human intervention. Due to pattern matching based on a bruteforce (BF) approach, the indexing efficiency has been reduced to some extent. Practical implications: The research is of both theoretical signifi cance and practical value in improving the accuracy of automatic indexing of multilingual documents (not confined to Chinese-English mixed documents). The proposed method will benefit not only the indexing of life science documents but also the indexing of documents in other subject areas. Originality/value: So far, few studies have been published about the method for increasing the accuracy of multilingual automatic indexing. This study will provide insights into the automatic indexing of multilingual documents, especially Chinese-English mixed documents.
Proceedings Article•10.1109/IJCNN.2012.6252796•
OntoHop: An information filtering agent using hopfield nets and ontologies

[...]

Juan Manuel Adán-Coello1, Carlos Miguel Tobar1•
Pontifícia Universidade Católica de Campinas1
10 Jun 2012
TL;DR: The architecture of an information filtering agent based on an implementation of a Hopfield neural network (HNN) is presented, which shows that ontology use tends to favor recall over precision and that this bias can be adjusted by setting the minimum level of similarity required to consider a document and a network term similar.
Abstract: The size of the Web and its dynamic nature in addition to the fact that stored documents are written in natural language, and therefore intended to be read by people and not to be processed by computers, present major challenges to build automatic personalized information filtering systems. This article presents the architecture of an information filtering agent based on an implementation of a Hopfield neural network (HNN). Network nodes (neurons) represent relevant terms in the domain of interest and neuronal links represent asymmetric probabilities of term co-occurrences in the domain, or the relevance weight between a pair of terms. Relevant terms are automatically derived from a corpus related to the domain of interest using automatic indexing and an ontology. Co-occurrence probabilities are computed by a cluster function that produces asymmetric links between terms. At the moment of document filtering, input neurons are activated on the basis of the presence of terms in the document that are identical or semantically similar to the terms stored in the net. The semantic similarity between terms is calculated using a hierarchical ontology that describes concepts that exist in the domain of interest. Experiments conducted to evaluate the precision and recall of the agent with and without the use of ontologies show that ontology use tends to favor recall over precision. The degree to which this bias occurs can be adjusted by setting the minimum level of similarity required to consider a document and a network term similar.
Book Chapter•10.1016/B978-1-84334-292-2.50004-0•
Automatic indexing of images

[...]

Pierre de Keyser
1 Jan 2012
TL;DR: The engineering behind the three techniques of automatic image indexing may be quite advanced and difficult to understand, but the essence of each one can be grasped by a few examples that are freely available on the web.
Abstract: Basic techniques of automatic image indexing are discussed: context-based indexing, content-based indexing and automatic image annotation. Context-based indexing relies on the words surrounding the image and assumes that they express the content of the image. Content-based indexing uses one or more aspects of the image itself, e.g. the colour, the texture, etc. to make the image retrievable. Automatic image annotation compares some characteristics of the image to those of the images in a sample database which has been indexed manually. It assumes that images that have certain features in common express the same content. Although the engineering behind the three techniques may be quite advanced and difficult to understand, the essence of each one can be grasped by a few examples that are freely available on the web.

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve