A formal study of information retrieval heuristics

doi:10.1145/1008992.1009004

Open AccessProceedings Article10.1145/1008992.1009004

A formal study of information retrieval heuristics

Hui Fang, +2 more

- 25 Jul 2004

- pp 49-56

389

TL;DR: A formal study of retrieval heuristics is presented and it is found that the empirical performance of a retrieval formula is tightly related to how well it satisfies basic desirable constraints.

Abstract: Empirical studies of information retrieval methods show that good retrieval performance is closely related to the use of various retrieval heuristics, such as TF-IDF weighting. One basic research question is thus what exactly are these "necessary" heuristics that seem to cause good retrieval performance. In this paper, we present a formal study of retrieval heuristics. We formally define a set of basic desirable constraints that any reasonable retrieval function should satisfy, and check these constraints on a variety of representative retrieval functions. We find that none of these retrieval functions satisfies all the constraints unconditionally. Empirical results show that when a constraint is not satisfied, it often indicates non-optimality of the method, and when a constraint is satisfied only for a certain range of parameter values, its performance tends to be poor when the parameter is out of the range. In general, we find that the empirical performance of a retrieval formula is tightly related to how well it satisfies these constraints. Thus the proposed constraints provide a good explanation of many empirical observations and make it possible to evaluate any existing or new retrieval formula analytically.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Proceedings Article•10.1145/1835449.1835490

Information-based models for ad hoc IR

Stéphane Clinchant, +1 more

- 19 Jul 2010

TL;DR: A long-standing hypothesis in IR, namely the fact that the difference in the behaviors of a word at the document and collection levels brings information on the significance of the word for the document, is shown to lead to simpler and better models.

...read moreread less

Proceedings Article•10.1145/3477495.3531986

H-ERNIE: A Multi-Granularity Pre-Trained Language Model for Web Search

Xiaokai Chu, +3 more

- 06 Jul 2022

TL;DR: This paper proposes a novel H-ERNIE framework, which includes a query-document analysis component and a hierarchical ranking component, and discusses the time complexity of the proposed framework, and shows that it can be efficiently implemented in real applications.

...read moreread less

Proceedings Article•10.1145/2396761.2398662

A constraint to automatically regulate document-length normalisation

Ronan Cummins, +1 more

- 29 Oct 2012

TL;DR: This paper formally describes the interaction between query-terms and document length normalisation using a constraint, and develops a general pre-retrieval approach to adapt a number of state-of-the-art ranking functions so that they adhere to the constraint.

...read moreread less

Efficient processing of complex features for information retrieval

W. B. Croft, +1 more

- 01 Jan 2008

TL;DR: The TupleFlow framework, an extension of MapReduce, provides a basis for custom binned indexes, which efficiently store feature data, and work in binning probabilities shows how to effectively map language model probabilities into the space of small positive integers.

...read moreread less

•Posted Content

Evaluation of Information Retrieval Systems Using Structural Equation Modelling

Massimo Melucci

- 25 Jun 2018

- arXiv: Information Retrieval

TL;DR: The use of Structural Equation Modeling (SEM) is discussed in providing an in-depth explanation of evaluation results and an explanation of failures and successes of a system; in particular, the case of evaluation of Information Retrieval systems.

...read moreread less

...

Expand

References

•Book

Introduction to Modern Information Retrieval

Gerard Salton, +1 more

- 01 Jan 1983

TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.

...read moreread less

12.6K

•Journal Article•10.1016/0306-4573(88)90021-0

Term Weighting Approaches in Automatic Text Retrieval

Gerard Salton, +1 more

- 01 Aug 1988

- Information Processing and Management

TL;DR: This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.

...read moreread less

10.5K

Journal Article•10.1016/0306-4573(83)90062-6

Introduction to modern information retrieval: G. Salton and M. McGill. McGraw-Hill, New York (1983). xv + 448 pp., $32.95 ISBN 0-07-054484-0

Martin Dillon

- 01 Jan 1983

- Information Processing and Management

5.4K

•Book

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Gerard Salton

- 03 Jan 1989

3.8K

Journal Article•10.1145/3130348.3130368

A language modeling approach to information retrieval

Jay Ponte, +1 more

- 01 Aug 1998

TL;DR: It will be shown that probabilistic methods can be used to predict topic changes in the context of the task of new event detection and provide further proof of concept for the use of language models for retrieval tasks.

...read moreread less

2.8K