A formal study of information retrieval heuristics
Hui Fang,Tao Tao,ChengXiang Zhai +2 more
- 25 Jul 2004
- pp 49-56
TL;DR: A formal study of retrieval heuristics is presented and it is found that the empirical performance of a retrieval formula is tightly related to how well it satisfies basic desirable constraints.
read more
Abstract: Empirical studies of information retrieval methods show that good retrieval performance is closely related to the use of various retrieval heuristics, such as TF-IDF weighting. One basic research question is thus what exactly are these "necessary" heuristics that seem to cause good retrieval performance. In this paper, we present a formal study of retrieval heuristics. We formally define a set of basic desirable constraints that any reasonable retrieval function should satisfy, and check these constraints on a variety of representative retrieval functions. We find that none of these retrieval functions satisfies all the constraints unconditionally. Empirical results show that when a constraint is not satisfied, it often indicates non-optimality of the method, and when a constraint is satisfied only for a certain range of parameter values, its performance tends to be poor when the parameter is out of the range. In general, we find that the empirical performance of a retrieval formula is tightly related to how well it satisfies these constraints. Thus the proposed constraints provide a good explanation of many empirical observations and make it possible to evaluate any existing or new retrieval formula analytically.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Posted Content
Using temporal IDF for efficient novelty detection in text streams
TL;DR: A resource-aware mechanism that is able to handle massive text streams such as the ones present today thanks to the burst of social media and the emergence of the Web as the main source of information is described.
25
Multilingual information retrieval in the language modeling framework
TL;DR: A direct MLIR approach is proposed by using the language modeling framework that includes a novel multilingual language model estimation for documents, and a new way to globally estimate word statistics to enable ranking documents in multiple languages in one retrieval phase without having the problems of the previous direct methods.
25
Patent
Reversible connector for accessory devices
Heng Huang,Yi He,Duane Martin Evans,Gene Robert Obie +3 more
- 11 Jun 2015
TL;DR: In this article, reversible connectors for accessory devices are described, where a connector cable for an accessory of a host computing device is configured such that a head of the connector cable may be plugged into a corresponding port of the host in either orientation (straight or reverse).
25
An outranking approach for information retrieval
TL;DR: This paper proposes a multiple criteria framework using a new aggregation mechanism based on decision rules identifying positive and negative reasons for judging whether a document should get a better ranking than another, and the resulting procedure also handles imprecision in criteria design.
24
Semantic annotation of frequent patterns
TL;DR: The goal is to discover the hidden meanings of a frequent pattern by annotating it with in-depth, concise, and structured information by constructing its context model, selecting informative context indicators, and extracting representative transactions and semantically similar patterns.
References
•Book
Introduction to Modern Information Retrieval
Gerard Salton,Michael J. McGill +1 more
- 01 Jan 1983
TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
12.6K
Term Weighting Approaches in Automatic Text Retrieval
Gerard Salton,Chris Buckley +1 more
TL;DR: This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.
A language modeling approach to information retrieval
Jay Ponte,W. Bruce Croft +1 more
- 01 Aug 1998
TL;DR: It will be shown that probabilistic methods can be used to predict topic changes in the context of the task of new event detection and provide further proof of concept for the use of language models for retrieval tasks.