Journal Article10.1016/J.IPM.2003.10.003
Choosing document structure weights
TL;DR: Analysis suggests BM25 cannot be improved using structure weighting, and vector space, probability, and Okapi BM25 ranking are extended to include structure Weighting.
read more
Abstract: Existing ranking schemes assume all term occurrences in a given document are of equal influence. Intuitively, terms occurring in some places should have a greater influence than those elsewhere. An occurrence in an abstract may be more important than an occurrence in the body text. Although this observation is not new, there remains the issue of finding good weights for each structure.Vector space, probability, and Okapi BM25 ranking are extended to include structure weighting. Weights are then selected for the TREC WSJ collection using a genetic algorithm. The learned weights are then tested on an evaluation set of queries. Structure weighted vector space inner product and structure weighted probabilistic retrieval show an about 5% improvement in mean average precision over their unstructured counterparts. Structure weighted BM25 shows nearly no improvement. Analysis suggests BM25 cannot be improved using structure weighting.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Patent
System and Method for a Vector-Space Search Engine
Aurelian Dumitru,Jimmy Doyle Pike +1 more
- 19 Jan 2010
TL;DR: In this paper, a system and method for a search engine is described, which includes calculating a plurality of document vectors, receiving a search request, calculating a search vector, and returning a list of documents that are within a predetermined distance of the search request vector.
145
Wiki-based rapid prototyping for teaching-material design in e-Learning grids
TL;DR: Experimental results indicate that teaching materials can be rapidly generated with the proposed WARP (Wiki-based Authoring by Rapid Prototyping), which is composed of five phases: requirement verification, query expansion, teaching-material retrieval, draft generation and Wiki-based revision.
87
A Web page classification system based on a genetic algorithm using tagged-terms as features
TL;DR: Using both HTML tags and terms in each tag as separate features improves accuracy of classification, and the number of documents in the training dataset affects the accuracy such that the classification accuracy of the system increases up to 95% and becomes higher than the well known Naive Bayes and k nearest neighbor classifiers.
78
Field-weighted XML retrieval based on BM25
Wei Lu,Stephen Robertson,Andrew MacFarlane +2 more
- 28 Nov 2005
TL;DR: In the first year of INEX 2004, the Centre for Interactive Systems Research (CISR) participated in the INEX competition and proposed a field-weighted BM25F for document retrieval to element level retrieval function BM25E.
•Proceedings Article
Clinical Information Retrieval using Document and PICO Structure
Florian Boudin,Jian-Yun Nie,Martin Dawes +2 more
- 02 Jun 2010
TL;DR: A method that extends the language modeling approach to incorporate both document structure and PICO query formulation and an analysis of the distribution of PICO elements in medical abstracts that motivates the use of a location-based weighting strategy are presented.
54
References
Genetic algorithms in search, optimization and machine learning
David E. Goldberg
- 01 Jan 1989
TL;DR: This book brings together the computer techniques, mathematical tools, and research results that will enable both students and practitioners to apply genetic algorithms to problems in many fields.
58.6K
•Book
Genetic algorithms in search, optimization, and machine learning
David E. Goldberg
- 01 Sep 1988
TL;DR: In this article, the authors present the computer techniques, mathematical tools, and research results that will enable both students and practitioners to apply genetic algorithms to problems in many fields, including computer programming and mathematics.
•Book
Adaptation in natural and artificial systems
John H. Holland
- 01 Jan 1975
TL;DR: Names of founding work in the area of Adaptation and modiication, which aims to mimic biological optimization, and some (Non-GA) branches of AI.
A vector space model for automatic indexing
Gerard Salton,A. Wong,C. S. Yang +2 more
TL;DR: An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents, demonstating the usefulness of the model.
Related Papers (5)
Ricardo Baeza-Yates,Berthier Ribeiro-Neto +1 more
- 15 May 1999
Gerard Salton,Michael J. McGill +1 more
- 01 Jan 1983
Stephen Robertson,Hugo Zaragoza,Michael J. Taylor +2 more
- 13 Nov 2004
John H. Holland
- 01 Jan 1975
Ross Wilkinson
- 01 Aug 1994