A formal study of information retrieval heuristics
Hui Fang,Tao Tao,ChengXiang Zhai +2 more
- 25 Jul 2004
- pp 49-56
TL;DR: A formal study of retrieval heuristics is presented and it is found that the empirical performance of a retrieval formula is tightly related to how well it satisfies basic desirable constraints.
read more
Abstract: Empirical studies of information retrieval methods show that good retrieval performance is closely related to the use of various retrieval heuristics, such as TF-IDF weighting. One basic research question is thus what exactly are these "necessary" heuristics that seem to cause good retrieval performance. In this paper, we present a formal study of retrieval heuristics. We formally define a set of basic desirable constraints that any reasonable retrieval function should satisfy, and check these constraints on a variety of representative retrieval functions. We find that none of these retrieval functions satisfies all the constraints unconditionally. Empirical results show that when a constraint is not satisfied, it often indicates non-optimality of the method, and when a constraint is satisfied only for a certain range of parameter values, its performance tends to be poor when the parameter is out of the range. In general, we find that the empirical performance of a retrieval formula is tightly related to how well it satisfies these constraints. Thus the proposed constraints provide a good explanation of many empirical observations and make it possible to evaluate any existing or new retrieval formula analytically.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Information-based models for ad hoc IR
Stéphane Clinchant,Eric Gaussier +1 more
- 19 Jul 2010
TL;DR: A long-standing hypothesis in IR, namely the fact that the difference in the behaviors of a word at the document and collection levels brings information on the significance of the word for the document, is shown to lead to simpler and better models.
H-ERNIE: A Multi-Granularity Pre-Trained Language Model for Web Search
Xiaokai Chu,Jiashu Zhao,Lixin Zou,Dawei Yin +3 more
- 06 Jul 2022
TL;DR: This paper proposes a novel H-ERNIE framework, which includes a query-document analysis component and a hierarchical ranking component, and discusses the time complexity of the proposed framework, and shows that it can be efficiently implemented in real applications.
A constraint to automatically regulate document-length normalisation
Ronan Cummins,Colm O'Riordan +1 more
- 29 Oct 2012
TL;DR: This paper formally describes the interaction between query-terms and document length normalisation using a constraint, and develops a general pre-retrieval approach to adapt a number of state-of-the-art ranking functions so that they adhere to the constraint.
Efficient processing of complex features for information retrieval
W. B. Croft,Trevor Strohman +1 more
- 01 Jan 2008
TL;DR: The TupleFlow framework, an extension of MapReduce, provides a basis for custom binned indexes, which efficiently store feature data, and work in binning probabilities shows how to effectively map language model probabilities into the space of small positive integers.
•Posted Content
Evaluation of Information Retrieval Systems Using Structural Equation Modelling
TL;DR: The use of Structural Equation Modeling (SEM) is discussed in providing an in-depth explanation of evaluation results and an explanation of failures and successes of a system; in particular, the case of evaluation of Information Retrieval systems.
References
•Book
Introduction to Modern Information Retrieval
Gerard Salton,Michael J. McGill +1 more
- 01 Jan 1983
TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
12.6K
Term Weighting Approaches in Automatic Text Retrieval
Gerard Salton,Chris Buckley +1 more
TL;DR: This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.
A language modeling approach to information retrieval
Jay Ponte,W. Bruce Croft +1 more
- 01 Aug 1998
TL;DR: It will be shown that probabilistic methods can be used to predict topic changes in the context of the task of new event detection and provide further proof of concept for the use of language models for retrieval tasks.