Automatic indexing

Topic Tools

Papers published on a yearly basis

1 / 2

Papers

Journal Article•10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9•

Indexing by Latent Semantic Analysis

[...]

Scott Deerwester¹, Susan T. Dumais², George W. Furnas², Thomas K. Landauer², Richard A. Harshman³ - Show less +1 more•Institutions (3)

University of Chicago¹, Telcordia Technologies², University of Western Ontario³

01 Sep 1990-Journal of the Association for Information Science and Technology

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.

...read moreread less

Abstract: A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singular-value decomposition, in which a large term by document matrix is decomposed into a set of ca. 100 orthogonal factors from which the original matrix can be approximated by linear combination. Documents are represented by ca. 100 item vectors of factor weights. Queries are represented as pseudo-document vectors formed from weighted combinations of terms, and documents with supra-threshold cosine values are returned. initial tests find this completely automatic method for retrieval to be promising.

...read moreread less

13,504 citations

Journal Article•10.1016/0306-4573(88)90021-0•

Term Weighting Approaches in Automatic Text Retrieval

[...]

Gerard Salton¹, Chris Buckley¹•Institutions (1)

Cornell University¹

01 Aug 1988-Information Processing and Management

TL;DR: This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.

...read moreread less

Abstract: The experimental evidence accumulated over the past 20 years indicates that textindexing systems based on the assignment of appropriately weighted single terms produce retrieval results that are superior to those obtainable with other more elaborate text representations. These results depend crucially on the choice of effective term weighting systems. This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.

...read moreread less

10,566 citations

Journal Article•10.1145/361219.361220•

A vector space model for automatic indexing

[...]

Gerard Salton¹, A. Wong¹, C. S. Yang¹•Institutions (1)

Cornell University¹

01 Nov 1975-Communications of The ACM

TL;DR: An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents, demonstating the usefulness of the model.

...read moreread less

Abstract: In a document retrieval, or other pattern matching environment where stored entities (documents) are compared with each other or with incoming patterns (search requests), it appears that the best indexing (property) space is one where each entity lies as far away from the others as possible; in these circumstances the value of an indexing system may be expressible as a function of the density of the object space; in particular, retrieval performance may correlate inversely with space density. An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents. Typical evaluation results are shown, demonstating the usefulness of the model.

...read moreread less

7,995 citations

Journal Article•10.1137/1037127•

Using linear algebra for intelligent information retrieval

[...]

Michael W. Berry, Susan T. Dumais, Gavin W. O'Brien

01 Dec 1995-Siam Review

TL;DR: A lexical match between words in users’ requests and those in or assigned to documents in a database helps retrieve textual materials from scientific databases.

...read moreread less

Abstract: Currently, most approaches to retrieving textual materials from scientific databases depend on a lexical match between words in users’ requests and those in or assigned to documents in a database. ...

...read moreread less

1,729 citations

Journal Article•10.1109/TPAMI.2007.70847•

Real-Time Computerized Annotation of Pictures

[...]

Jia Li¹, James Z. Wang¹•Institutions (1)

Pennsylvania State University¹

01 Jun 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: New optimization and estimation techniques to address two fundamental problems in machine learning are developed, which serve as the basis for the Automatic Linguistic Indexing of Pictures - Real Time (ALIPR) system of fully automatic and high speed annotation for online pictures.

...read moreread less

Abstract: Developing effective methods for automated annotation of digital pictures continues to challenge computer scientists. The capability of annotating pictures by computers can lead to breakthroughs in a wide range of applications, including Web image search, online picture-sharing communities, and scientific experiments. In this work, the authors developed new optimization and estimation techniques to address two fundamental problems in machine learning. These new techniques serve as the basis for the automatic linguistic indexing of pictures - real time (ALIPR) system of fully automatic and high-speed annotation for online pictures. In particular, the D2-clustering method, in the same spirit as K-Means for vectors, is developed to group objects represented by bags of weighted vectors. Moreover, a generalized mixture modeling technique (kernel smoothing as a special case) for nonvector data is developed using the novel concept of hypothetical local mapping (HLM). ALIPR has been tested by thousands of pictures from an Internet photo-sharing site, unrelated to the source of those pictures used in the training process. Its performance has also been studied at an online demonstration site, where arbitrary users provide pictures of their choices and indicate the correctness of each annotation word. The experimental results show that a single computer processor can suggest annotation terms in real time and with good accuracy.

...read moreread less

700 citations

...

Expand

Year	Papers
2024	2
2023	11
2022	8
2021	5
2020	11
2019	12

Topic Tools

Papers published on a yearly basis

Papers

Indexing by Latent Semantic Analysis

Term Weighting Approaches in Automatic Text Retrieval

A vector space model for automatic indexing

Using linear algebra for intelligent information retrieval

Real-Time Computerized Annotation of Pictures

Related Topics (5)

Performance Metrics