Flexible and efficient IR using array databases

doi:10.1007/S00778-007-0071-0

Open AccessJournal Article10.1007/S00778-007-0071-0

Flexible and efficient IR using array databases

Roberto Cornacchia, +4 more

- 01 Jan 2008

- Vol. 17, Iss: 1, pp 151-168

65

TL;DR: It turns out that these optimization rules enable SRAM to automatically translate the BM25 array queries into the relational equivalent of inverted list processing including compression, score materialization and quantization, such as employed by custom-built IR systems.

Abstract: The Matrix Framework is a recent proposal by Information Retrieval (IR) researchers to flexibly represent information retrieval models and concepts in a single multi-dimensional array framework. We provide computational support for exactly this framework with the array database system SRAM (Sparse Relational Array Mapping), that works on top of a DBMS. Information retrieval models can be specified in its comprehension-based array query language, in a way that directly corresponds to the underlying mathematical formulas. SRAM efficiently stores sparse arrays in (compressed) relational tables and translates and optimizes array queries into relational queries. In this work, we describe a number of array query optimization rules. To demonstrate their effect on text retrieval, we apply them in the TREC TeraByte track (TREC-TB) efficiency task, using the Okapi BM25 model as our example. It turns out that these optimization rules enable SRAM to automatically translate the BM25 array queries into the relational equivalent of inverted list processing including compression, score materialization and quantization, such as employed by custom-built IR systems. The use of the high-performance MonetDB/X100 relational backend, that provides transparent database compression, allows the system to achieve very fast response times with good precision and low resource usage.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.14778/3025111.3025117

The TileDB array data storage manager

Stavros Papadopoulos, +3 more

- 01 Nov 2016

TL;DR: This work presents a novel storage manager for multi-dimensional arrays that arise in scientific applications, which is part of a larger scientific data management system called TileDB, and shows that TileDB delivers comparable performance to the HDF5 dense array storage manager, while providing much faster random writes.

...read moreread less

145

Balancing vectorized query execution with bandwidth-optimized storage

M. Żukowski

- 01 Jan 2009

TL;DR: A new database system architecture is presented, realized in the MonetDB/X100 prototype, that combines a coherent set of new architecture-conscious techniques that are designed to work well together and achieves in-memory performance often one or two orders of magnitude higher than the existing approaches.

...read moreread less

132

Journal Article•10.1007/S10707-009-0087-2

The OGC web coverage processing service (WCPS) standard

Peter Baumann

- 01 Oct 2010

- Geoinformatica

TL;DR: This contribution reports on the WCPS standard by giving an introduction to its coverage model and processing language and design rationales are discussed, as well as background and relation to other OGC standards.

...read moreread less

125

•Journal Article

Vectorwise: Beyond Column Stores

Marcin Zukowski, +1 more

- 01 Jan 2012

- IEEE Data(base) Engineering Bulletin

TL;DR: This paper tells the story of Vectorwise, a high-performance analytical database system, from multiple perspectives: its history from academic project to commercial product, the evolution of its technical architecture, customer reactions to the product and its future research and development roadmap.

...read moreread less

99

•Proceedings Article•10.1145/1966895.1966896

SciQL, a query language for science applications

Martin L. Kersten, +3 more

- 25 Mar 2011

TL;DR: SciQL1 provides a seamless symbiosis of array-, set-, and sequence- interpretation using a clear separation of the mathematical object from its underlying implementation, and leads to a generalization of window-based query processing with wide applicability in science domains.

...read moreread less

64

...

Expand

References

Journal Article•10.1108/EB046814

An algorithm for suffix stripping

M. F. Porter

- 01 Dec 1997

- Program: Electronic Library and Informat...

TL;DR: An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL, and performs slightly better than a much more elaborate system with which it has been compared.

...read moreread less

9.1K

Journal Article•10.1109/JRPROC.1952.273898

A Method for the Construction of Minimum-Redundancy Codes

David A. Huffman

- 01 Sep 1952

TL;DR: A minimum-redundancy code is one constructed in such a way that the average number of coding digits per message is minimized.

...read moreread less

6.1K

Journal Article•10.1007/BF02837279

A method for the construction of minimum-redundancy codes

David A. Huffman

- 01 Feb 2006

- Resonance

TL;DR: A minimum-redundancy code is one constructed in such a way that the average number of coding digits per message is minimized.

...read moreread less

5.2K

Journal Article•10.1145/3130348.3130368

A language modeling approach to information retrieval

Jay Ponte, +1 more

- 01 Aug 1998

TL;DR: It will be shown that probabilistic methods can be used to predict topic changes in the context of the task of new event detection and provide further proof of concept for the use of language models for retrieval tasks.

...read moreread less

2.8K

Journal Article•10.1002/ASI.4630270302

Relevance weighting of search terms

Stephen Robertson, +1 more

- 01 May 1976

- Journal of the Association for Informati...

TL;DR: In this article, a series of relevance weighting functions is derived and is justified by theoretical considerations, in particular, it is shown that specific weighted search methods are implied by a general probabilistic theory of retrieval.

...read moreread less

2K