Open Access
Efficient processing of complex features for information retrieval
W. B. Croft,Trevor Strohman +1 more
- 01 Jan 2008
24
TL;DR: The TupleFlow framework, an extension of MapReduce, provides a basis for custom binned indexes, which efficiently store feature data, and work in binning probabilities shows how to effectively map language model probabilities into the space of small positive integers.
read more
Abstract: Text search systems research has primarily focused on simple occurrences of query terms within documents to compute document relevance scores. However, recent research shows that additional document features are crucial for improving retrieval effectiveness.
We develop a series of techniques for efficiently processing queries with feature-based models. Our TupleFlow framework, an extension of MapReduce, provides a basis for custom binned indexes, which efficiently store feature data. Our work in binning probabilities shows how to effectively map language model probabilities into the space of small positive integers, which helps improve speeds without reducing query effectiveness. We also show new efficient query processing results for both document-sorted and score-sorted indexes. All of our work is evaluated using the largest available research dataset.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Book
Search Engines: Information Retrieval in Practice
W. Bruce Croft,Donald Metzler,Trevor Strohman +2 more
- 16 Feb 2009
TL;DR: This text provides the background and tools needed to evaluate, compare and modify search engines and numerous programming exercises make extensive use of Galago, a Java-based open source search engine.
1.1K
Building a high-level dataflow system on top of Map-Reduce: the Pig experience
Alan Gates,Olga Natkovich,Shubham Chopra,Pradeep Kamath,Shravan Narayanamurthy,Christopher Olston,Benjamin Reed,Santhosh Srinivasan,Utkarsh Srivastava +8 more
- 01 Aug 2009
TL;DR: Pig is a high-level dataflow system that aims at a sweet spot between SQL and Map-Reduce, and performance comparisons between Pig execution and raw Map- Reduce execution are reported.
•Proceedings Article
CHI '01 Extended Abstracts on Human Factors in Computing Systems
Marilyn Tremaine
- 31 Mar 2001
TL;DR: The CHI Conference provides a forum for people to meet both formally and informally, to share and to learn as discussed by the authors, and we trust that you will find here the intellectually exciting and personally rewarding experiences that bring people back to this conference year after year.
399
•Book
Faceted Search
Daniel Tunkelang
- 29 Jun 2009
TL;DR: This lecture explores the history, theory, and practice of faceted search, and offers a self-contained treatment of the topic, with an extensive bibliography for those who would like to pursue particular aspects in more depth.
365
Synthesis Lectures on Information Concepts, Retrieval, and Services
Daniel Tunkelang,Mike Thelwall +1 more
- 01 Jan 2009
TL;DR: This book takes a horizontal approach gathering the foundations of TF-IDF, PRF, BIR, Poisson, BM25, LM, probabilistic inference networks (PIN’s), and divergence-based models to create a consolidated and balanced view on the main models.
231
References
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
- 06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
The anatomy of a large-scale hypertextual Web search engine
Sergey Brin,Lawrence Page +1 more
- 01 Apr 1998
TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
•Proceedings Article
The PageRank Citation Ranking : Bringing Order to the Web
Lawrence Page,Sergey Brin,Rajeev Motwani,Terry Winograd +3 more
- 11 Nov 1999
TL;DR: This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them, and shows how to efficiently compute PageRank for large numbers of pages.
16.4K
•Journal Article
The Anatomy of a Large-Scale Hypertextual Web Search Engine.
Sergey Brin,Lawrence Page +1 more
TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.
13.3K