A formal study of information retrieval heuristics
Hui Fang,Tao Tao,ChengXiang Zhai +2 more
- 25 Jul 2004
- pp 49-56
TL;DR: A formal study of retrieval heuristics is presented and it is found that the empirical performance of a retrieval formula is tightly related to how well it satisfies basic desirable constraints.
read more
Abstract: Empirical studies of information retrieval methods show that good retrieval performance is closely related to the use of various retrieval heuristics, such as TF-IDF weighting. One basic research question is thus what exactly are these "necessary" heuristics that seem to cause good retrieval performance. In this paper, we present a formal study of retrieval heuristics. We formally define a set of basic desirable constraints that any reasonable retrieval function should satisfy, and check these constraints on a variety of representative retrieval functions. We find that none of these retrieval functions satisfies all the constraints unconditionally. Empirical results show that when a constraint is not satisfied, it often indicates non-optimality of the method, and when a constraint is satisfied only for a certain range of parameter values, its performance tends to be poor when the parameter is out of the range. In general, we find that the empirical performance of a retrieval formula is tightly related to how well it satisfies these constraints. Thus the proposed constraints provide a good explanation of many empirical observations and make it possible to evaluate any existing or new retrieval formula analytically.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions
Ronan Cummins,Colm O'Riordan +1 more
TL;DR: A new axiom is introduced and empirically validate it by modifying the standard BM25 scheme and it is found that one learned term-weighting approach is consistent with more axioms than any of the other schemes.
Predicting Query Performance by Query-Drift Estimation
Anna Shtok,Oren Kurland,David Carmel +2 more
- 03 Sep 2009
TL;DR: It is argued that query-drift can potentially be estimated by measuring the diversity of the retrieval scores of top-retrieved documents, and the prediction success is better, over most tested TREC corpora, than that of state-of-the-art prediction methods.
Towards Best Practices of Axiomatic Activation Patching in Information Retrieval
Gregory Polyakov,Carsten Eickhoff +1 more
Development and empirical user-centered evaluation of semantically-based query recommendation for an electronic health record search engine
David A. Hanauer,Danny T. Y. Wu,Lei Yang,Qiaozhu Mei,Katherine B. Murkowski-Steffy,V. G. Vinod Vydiswaran,Kai Zheng +6 more
TL;DR: Challenges persist for users to construct effective search queries when retrieving information from biomedical documents including those from EHRs, and this study demonstrates that semantically-based query recommendation is a viable solution to addressing this challenge.
Query aspect based term weighting regularization in information retrieval
Wei Zheng,Hui Fang +1 more
- 28 Mar 2010
TL;DR: This paper develops a general strategy that can systematically integrate a term weighting regularization function into existing retrieval functions, and proposes two specific regularization functions based on the guidance provided by constraint analysis.
References
•Book
Introduction to Modern Information Retrieval
Gerard Salton,Michael J. McGill +1 more
- 01 Jan 1983
TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
12.6K
Term Weighting Approaches in Automatic Text Retrieval
Gerard Salton,Chris Buckley +1 more
TL;DR: This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.
A language modeling approach to information retrieval
Jay Ponte,W. Bruce Croft +1 more
- 01 Aug 1998
TL;DR: It will be shown that probabilistic methods can be used to predict topic changes in the context of the task of new event detection and provide further proof of concept for the use of language models for retrieval tasks.