Query expansion using random walk models
Kevyn Collins-Thompson,Jamie Callan +1 more
- 31 Oct 2005
- pp 704-711
TL;DR: A Markov chain framework that combines multiple sources of knowledge on term associations is described and the effectiveness of the model is evaluated by examining the accuracy and robustness of the expansion methods, and the relative effectiveness of various sources of term evidence is investigated.
read more
Abstract: It has long been recognized that capturing term relationships is an important aspect of information retrieval. Even with large amounts of data, we usually only have significant evidence for a fraction of all potential term pairs. It is therefore important to consider whether multiple sources of evidence may be combined to predict term relations more accurately. This is particularly important when trying to predict the probability of relevance of a set of terms given a query, which may involve both lexical and semantic relations between the terms.We describe a Markov chain framework that combines multiple sources of knowledge on term associations. The stationary distribution of the model is used to obtain probability estimates that a potential expansion term reflects aspects of the original query. We use this model for query expansion and evaluate the effectiveness of the model by examining the accuracy and robustness of the expansion methods, and investigate the relative effectiveness of various sources of term evidence. Statistically significant differences in accuracy were observed depending on the weighting of evidence in the random walk. For example, using co-occurrence data later in the walk was generally better than using it early, suggesting further improvements in effectiveness may be possible by learning walk behaviors.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A Survey of Automatic Query Expansion in Information Retrieval
TL;DR: This survey presents a unified view of a large number of recent approaches to AQE that leverage various data sources and employ very different principles and techniques.
Discovering key concepts in verbose queries
Michael Bendersky,W. Bruce Croft +1 more
- 20 Jul 2008
TL;DR: This paper develops and evaluates a technique that uses query-dependent, corpus- dependent, and corpus-independent features for automatic extraction of key concepts from verbose queries, and shows that this method achieves higher accuracy in the identification of key concept than standard weighting methods such as inverse document frequency.
Query expansion techniques for information retrieval: A survey
TL;DR: This paper surveys QE techniques in IR from 1960 to 2017 with respect to core techniques, data sources used, weighting and ranking methodologies, user participation and applications – bringing out similarities and differences.
•Book
Statistical Language Models for Information Retrieval
ChengXiang Zhai
- 30 Nov 2008
TL;DR: A great deal of recent work has shown that statistical language models not only achieve superior empirical performance, but also facilitate parameter tuning and provide a more principled way for modeling various kinds of complex and non-traditional retrieval problems.
295
Information Retrieval by Semantic Similarity
Angelos Hliaoutakis,Giannis Varelas,Epimenidis Voutsakis,Euripides G. M. Petrakis,Evangelos E. Milios +4 more
TL;DR: This work proposes the Semantic Similarity based Retrieval Model (SSRM), a novel information retrieval method capable for discovering similarities between documents containing conceptually similar terms and demonstrates promising performance improvements over classic information retrieval methods utilizing plain lexical matching.
References
The anatomy of a large-scale hypertextual Web search engine
Sergey Brin,Lawrence Page +1 more
- 01 Apr 1998
TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
Indexing by Latent Semantic Analysis
TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
•Journal Article
The Anatomy of a Large-Scale Hypertextual Web Search Engine.
Sergey Brin,Lawrence Page +1 more
TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.
13.3K
Authoritative sources in a hyperlinked environment
TL;DR: This work proposes and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of “hub pages” that join them together in the link structure, and has connections to the eigenvectors of certain matrices associated with the link graph.
•Proceedings Article
The Anatomy of Large-scale Hypertextual Web Search Engine
S. Brin
- 01 Jan 1998
TL;DR: We present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext to produce better search results.
9.7K
Related Papers (5)
Victor Lavrenko,W. Bruce Croft +1 more
- 01 Sep 2001
Jinxi Xu,W. Bruce Croft +1 more
- 18 Aug 1996
Jay Ponte,W. Bruce Croft +1 more
- 01 Aug 1998
Christopher D. Manning,Prabhakar Raghavan,Hinrich Schütze +2 more
- 01 Jan 2008