Multistep Sparse Approximation Technology in Information Retrieval
TL;DR: A class of multistep spare matrix strategies for concept decomposition matrix approximation with advantage in terms of storage costs and query time compared with the least-squares based approach while maintaining comparable retrieval quality is presented.
read more
Abstract: With large sets of text documents increasing rapidly, being able to efficiently utilize this vast volume of new information and service resource presents challenges to computational scientists. Text documents are usually modeled as a term-document matrix which has high dimensional and space vectors. To reduce the high dimensions, one of the various dimensionality reduction methods, concept decomposition, has been developed by some researchers. This method is based on document clustering techniques and leastsquare matrix approximation to approximate the matrix of vectors. However the numerical computation is expensive, as an inverse of a dense matrix formed by the concept vector matrix is required. In this paper we presented a class of multistep spare matrix strategies for concept decomposition matrix approximation. In this approach, a series of simple sparse matrices are used to approximate the decompositions. Our numerical experiments on both small and large datasets show the advantage of such an approach in terms of storage costs and query time compared with the least-squares based approach while maintaining comparable retrieval quality.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
References
Concept Decompositions for Large Sparse Text Data Using Clustering
TL;DR: The concept vectors produced by the spherical k-means algorithm constitute a powerful sparse and localized “basis” for text data sets and are localized in the word space, are sparse, and tend towards orthonormality.
A critical investigation of recall and precision as measures of retrieval system performance
TL;DR: This paper systematically investigates the various problems and issues associated with the use of recall and precision as measures of retrieval system performance and provides a comparative analysis of methods available for defining precision in a probabilistic sense to promote a better understanding of the various issues involved in retrieval performance evaluation.
519
•Book
Understanding Search Engines: Mathematical Modeling and Text Retrieval
Michael W. Berry,Murray Browne +1 more
- 01 Jul 1999
TL;DR: In this paper, the authors bridge the gap between applied mathematics and information retrieval and discuss some of the current problems in information retrieval that may not be familiar to applied mathematicians and computer scientists.
405
A Priori Sparsity Patterns for Parallel Sparse Approximate Inverse Preconditioners
TL;DR: This paper demonstrates that, for PDE problems, the patterns of powers of sparsified matrices (PSMs) can be used a priori as effective approximate inverse patterns, and that the additional effort of adaptive sparsity pattern calculations may not be required.
240
On scaling latent semantic indexing for large peer-to-peer systems
Chunqiang Tang,Sandhya Dwarkadas,Zhichen Xu +2 more
- 25 Jul 2004
TL;DR: This paper reduces the size of its input matrix through document clustering and term selection, and retains the retrieval quality of LSI but is several orders of magnitude more efficient.