Journal Article10.1145/2795403.2795405
Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval
ChengXiang Zhai,William W. Cohen,John Lafferty +2 more
- 28 Jul 2003
- Vol. 49, Iss: 1, pp 10-17
TL;DR: A framework for evaluating subtopic retrieval is proposed which generalizes the traditional precision and recall metrics by accounting for intrinsic topic difficulty as well as redundancy in documents and a maximal marginal relevance (MMR) ranking strategy is proposed.
read more
Abstract: We present a non-traditional retrieval problem we call subtopic retrieval. The subtopic retrieval problem is concerned with finding documents that cover many different subtopics of a query topic. In such a problem, the utility of a document in a ranking is dependent on other documents in the ranking, violating the assumption of independent relevance which is assumed in most traditional retrieval methods. Subtopic retrieval poses challenges for evaluating performance, as well as for developing effective algorithms. We propose a framework for evaluating subtopic retrieval which generalizes the traditional precision and recall metrics by accounting for intrinsic topic difficulty as well as redundancy in documents. We propose and systematically evaluate several methods for performing subtopic retrieval using statistical language models and a maximal marginal relevance (MMR) ranking strategy. A mixture model combined with query likelihood relevance ranking is shown to modestly outperform a baseline relevance ranking on a data set used in the TREC interactive track.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Jointly optimising relevance and diversity in image retrieval
Thomas Deselaers,Tobias Gass,Philippe Dreuw,Hermann Ney +3 more
- 08 Jul 2009
TL;DR: A method to jointly optimise the relevance and the diversity of the results in image retrieval using techniques inspired by dynamic programming algorithms and it is observed that the diverse results are more attractive to an average user.
Diversifying Citation Recommendations
TL;DR: In this paper, the problem of result diversification in citation-based bibliographic search, assuming that the citation graph itself is the only information available and no categories or intents are known, is addressed.
Explicit search result diversification through sub-queries
Rodrygo L. T. Santos,Jie Peng,Craig Macdonald,Iadh Ounis +3 more
- 28 Mar 2010
TL;DR: This paper introduces xQuAD, a novel framework for search result diversification that builds such a diversified ranking by explicitly accounting for the relationship between documents retrieved for the original query and the possible aspects underlying this query, in the form of sub-queries.
What makes a query difficult
David Carmel,Elad Yom-Tov,Adam Darlow,Dan Pelleg +3 more
- 06 Aug 2006
TL;DR: This work addresses a novel model that captures the main components of a topic and the relationship between those components and topic difficulty and demonstrates the applicability of the difficulty model for several uses such as predicting query difficulty, predicting the number of topic aspects expected to be covered by the search results, and analyzing the findability of a specific domain.
Ranking with submodular functions on a budget
TL;DR: In this paper , a max-submodular ranking problem with cardinality and knapsack-type budget constraints is studied, where each submodular function is associated with a budget, and the problem is to find a ranking of the set of items that maximizes the sum of values achieved by all functions under the budget constraints.
References
Latent dirichlet allocation
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
•Proceedings Article
Latent Dirichlet Allocation
David M. Blei,Andrew Y. Ng,Michael I. Jordan +2 more
- 03 Jan 2001
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
A threshold of ln n for approximating set cover
TL;DR: It is proved that (1 - o(1) ln n setcover is a threshold below which setcover cannot be approximated efficiently, unless NP has slightlysuperpolynomial time algorithms.
The Use of MMR and Diversity-Based Reranking for Reodering Documents and Producing Summaries
Jaime G. Carbonell,Jade Goldstein +1 more
- 01 Jan 1998
TL;DR: The MaximalMarginal Relevance (MMR) criterion as mentioned in this paper aims to reduce redundancy while maintaining query relevance in retrieving retrieved documents and selecting appropriate passages for text summarization.
The use of MMR, diversity-based reranking for reordering documents and producing summaries
Jaime Carbinell,Jade Goldstein +1 more
- 01 Aug 1998
TL;DR: A method for combining query-relevance with information-novelty in the context of text retrieval and summarization and preliminary results indicate some benefits for MMR diversity ranking in document retrieval and in single document summarization.
Related Papers (5)
Rakesh Agrawal,Sreenivas Gollapudi,Alan Halverson,Samuel Ieong +3 more
- 09 Feb 2009
Olivier Chapelle,Donald Metlzer,Ya Zhang,Pierre Grinspan +3 more
- 02 Nov 2009