Proceedings Article10.1145/1835449.1835490
Information-based models for ad hoc IR
Stéphane Clinchant,Eric Gaussier +1 more
- 19 Jul 2010
- pp 234-241
165
TL;DR: A long-standing hypothesis in IR, namely the fact that the difference in the behaviors of a word at the document and collection levels brings information on the significance of the word for the document, is shown to lead to simpler and better models.
read more
Abstract: We introduce in this paper the family of information-based models for ad hoc information retrieval. These models draw their inspiration from a long-standing hypothesis in IR, namely the fact that the difference in the behaviors of a word at the document and collection levels brings information on the significance of the word for the document. This hypothesis has been exploited in the 2-Poisson mixture models, in the notion of eliteness in BM25, and more recently in DFR models. We show here that, combined with notions related to burstiness, it can lead to simpler and better models.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Accurate and Effective Latent Concept Modeling for Ad Hoc Information Retrieval
Romain Deveaud,Eric SanJuan,Patrice Bellot +2 more
- 01 Jun 2014
TL;DR: Nous proposons une methode non supervisee pour the modelisation of concepts implicites d’une requete, dans le but of recreer la representation conceptuelle du besoin d‘information initial.
A Similarity Measure for Text Classification and Clustering
TL;DR: The proposed measure is extended to gauge the similarity between two sets of documents and shows that the performance obtained is better than that achieved by other measures.
327
Anserini: Reproducible Ranking Baselines Using Lucene
Peilin Yang,Hui Fang,Jimmy Lin +2 more
TL;DR: Anserini is described, an information retrieval toolkit built on Lucene that allows researchers to easily reproduce results with modern bag-of-words ranking models on diverse test collections and demonstrates that Lucene provides a suitable framework for supporting information retrieval research.
275
Lower-bounding term frequency normalization
Yuanhua Lv,ChengXiang Zhai +1 more
- 24 Oct 2011
TL;DR: This paper proposes a general and efficient method to introduce a sufficiently large lower bound for TF normalization which can be shown analytically to fix or alleviate the problem of very long documents being overly penalized.
172
References
Emergence of Scaling in Random Networks
TL;DR: A model based on these two ingredients reproduces the observed stationary scale-free distributions, which indicates that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems.
39.1K
•Book
Introduction to Modern Information Retrieval
Gerard Salton,Michael J. McGill +1 more
- 01 Jan 1983
TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
12.6K
An Introduction To Probability Theory And Its Applications
Feller William
- 01 Jan 1950
TL;DR: A First Course in Probability (8th ed.) by S. Ross is a lively text that covers the basic ideas of probability theory including those needed in statistics.
10.2K