Multistep Sparse Approximation Technology in Information Retrieval

doi:10.5120/17406-7991

Open AccessJournal Article10.5120/17406-7991

Multistep Sparse Approximation Technology in Information Retrieval

Chi Shen, +2 more

- 20 Aug 2014

- International Journal of Computer Applic...

- Vol. 99, Iss: 10, pp 1-8

TL;DR: A class of multistep spare matrix strategies for concept decomposition matrix approximation with advantage in terms of storage costs and query time compared with the least-squares based approach while maintaining comparable retrieval quality is presented.

Abstract: With large sets of text documents increasing rapidly, being able to efficiently utilize this vast volume of new information and service resource presents challenges to computational scientists. Text documents are usually modeled as a term-document matrix which has high dimensional and space vectors. To reduce the high dimensions, one of the various dimensionality reduction methods, concept decomposition, has been developed by some researchers. This method is based on document clustering techniques and leastsquare matrix approximation to approximate the matrix of vectors. However the numerical computation is expensive, as an inverse of a dense matrix formed by the concept vector matrix is required. In this paper we presented a class of multistep spare matrix strategies for concept decomposition matrix approximation. In this approach, a series of simple sparse matrices are used to approximate the decompositions. Our numerical experiments on both small and large datasets show the advantage of such an approach in terms of storage costs and query time compared with the least-squares based approach while maintaining comparable retrieval quality.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

References

•Journal Article•10.1023/A:1007612920971

Concept Decompositions for Large Sparse Text Data Using Clustering

Inderjit S. Dhillon, +1 more

- 01 Jan 2001

- Machine Learning

TL;DR: The concept vectors produced by the spherical k-means algorithm constitute a powerful sparse and localized “basis” for text data sets and are localized in the word space, are sparse, and tend towards orthonormality.

...read moreread less

1.5K

Journal Article•10.1145/65943.65945

A critical investigation of recall and precision as measures of retrieval system performance

Vijay V. Raghavan, +2 more

- 01 Jul 1989

- ACM Transactions on Information Systems

TL;DR: This paper systematically investigates the various problems and issues associated with the use of recall and precision as measures of retrieval system performance and provides a comparative analysis of methods available for defining precision in a probabilistic sense to promote a better understanding of the various issues involved in retrieval performance evaluation.

...read moreread less

519

•Book

Understanding Search Engines: Mathematical Modeling and Text Retrieval

Michael W. Berry, +1 more

- 01 Jul 1999

TL;DR: In this paper, the authors bridge the gap between applied mathematics and information retrieval and discuss some of the current problems in information retrieval that may not be familiar to applied mathematicians and computer scientists.

...read moreread less

405

Journal Article•10.1137/S106482759833913X

A Priori Sparsity Patterns for Parallel Sparse Approximate Inverse Preconditioners

Edmond Chow

- 11 Dec 1999

- SIAM Journal on Scientific Computing

TL;DR: This paper demonstrates that, for PDE problems, the patterns of powers of sparsified matrices (PSMs) can be used a priori as effective approximate inverse patterns, and that the additional effort of adaptive sparsity pattern calculations may not be required.

...read moreread less

240

Proceedings Article•10.1145/1008992.1009014

On scaling latent semantic indexing for large peer-to-peer systems

Chunqiang Tang, +2 more

- 25 Jul 2004

TL;DR: This paper reduces the size of its input matrix through document clustering and term selection, and retains the retrieval quality of LSI but is several orders of magnitude more efficient.

...read moreread less

84

...

Expand

Multistep Sparse Approximation Technology in Information Retrieval

Chat with Paper

AI Agents for this Paper

References

Concept Decompositions for Large Sparse Text Data Using Clustering

A critical investigation of recall and precision as measures of retrieval system performance

Understanding Search Engines: Mathematical Modeling and Text Retrieval

A Priori Sparsity Patterns for Parallel Sparse Approximate Inverse Preconditioners

On scaling latent semantic indexing for large peer-to-peer systems

Related Papers (5)

Construction of data-sparse H2-matrices by hierarchical compression

Hybrid sparse-matrix methods

Parallel solution of unstructured, sparse systems of linear equations

Fast and Low Memory Cost Matrix Factorization: Algorithm, Analysis, and Case Study

SCED: A General Framework for Sparse Tensor Decomposition with Constraints and Elementwise Dynamic Learning