A formal study of information retrieval heuristics
Hui Fang,Tao Tao,ChengXiang Zhai +2 more
- 25 Jul 2004
- pp 49-56
TL;DR: A formal study of retrieval heuristics is presented and it is found that the empirical performance of a retrieval formula is tightly related to how well it satisfies basic desirable constraints.
read more
Abstract: Empirical studies of information retrieval methods show that good retrieval performance is closely related to the use of various retrieval heuristics, such as TF-IDF weighting. One basic research question is thus what exactly are these "necessary" heuristics that seem to cause good retrieval performance. In this paper, we present a formal study of retrieval heuristics. We formally define a set of basic desirable constraints that any reasonable retrieval function should satisfy, and check these constraints on a variety of representative retrieval functions. We find that none of these retrieval functions satisfies all the constraints unconditionally. Empirical results show that when a constraint is not satisfied, it often indicates non-optimality of the method, and when a constraint is satisfied only for a certain range of parameter values, its performance tends to be poor when the parameter is out of the range. In general, we find that the empirical performance of a retrieval formula is tightly related to how well it satisfies these constraints. Thus the proposed constraints provide a good explanation of many empirical observations and make it possible to evaluate any existing or new retrieval formula analytically.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Book
Relevance Ranking for Vertical Search Engines
TL;DR: This reference book for professionals covers concepts and theories from the fundamental to the advanced, such as relevance, query intention, location-based relevance ranking, and cross-property ranking.
23
Improving term frequency normalization for multi-topical documents and application to language modeling approaches
Seung-Hoon Na,In-Su Kang,Jong-Hyeok Lee +2 more
- 30 Mar 2008
TL;DR: A novel TF normalization method is proposed which is a type of partially-axiomatic approach and modified language modeling approaches to better satisfy two formal constraints that the retrieval model should satisfy for documents having verbose and multitopicality characteristic, respectively.
VIRLab: a web-based virtual lab for learning and studying information retrieval models
Hui Fang,Hao Wu,Peilin Yang,ChengXiang Zhai +3 more
- 03 Jul 2014
TL;DR: Unlike existing command line based IR toolkits, the VIRLab system provides a more interactive tool that enables easy implementation of retrieval functions with only a few lines of codes, simplified evaluation process over multiple data sets and parameter settings and straightforward result analysis interface through operational search engines and pair-wise comparisons.
Patent
Power management contracts for accessory devices
Gene Robert Obie,Heng Huang,Yi He,Duane Martin Evans +3 more
- 16 May 2015
TL;DR: In this paper, power management contracts for accessory devices are described, which define operating constraints for power exchange between components of the system, including at least a power exchange direction and current limits.
21
Is document frequency important for PRF
Stéphane Clinchant,Eric Gaussier +1 more
- 12 Sep 2011
TL;DR: Analysis of state-of-the-art PRF models according to their relation with a new heuristic constraint, referred to as the Document Frequency (DF) constraint, reveals that the standard mixture model for PRF in the language modeling family does not satisfy the DF constraint on the contrary to several recently proposed models.
21
References
•Book
Introduction to Modern Information Retrieval
Gerard Salton,Michael J. McGill +1 more
- 01 Jan 1983
TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
12.6K
Term Weighting Approaches in Automatic Text Retrieval
Gerard Salton,Chris Buckley +1 more
TL;DR: This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.
A language modeling approach to information retrieval
Jay Ponte,W. Bruce Croft +1 more
- 01 Aug 1998
TL;DR: It will be shown that probabilistic methods can be used to predict topic changes in the context of the task of new event detection and provide further proof of concept for the use of language models for retrieval tasks.