Exploiting programmable network interfaces for parallel query execution in workstation clusters

doi:10.5555/1898953.1899010

Open AccessProceedings Article10.5555/1898953.1899010

Exploiting programmable network interfaces for parallel query execution in workstation clusters

V. Santhosh Kumar, +2 more

- 25 Apr 2006

- pp 77-77

4

TL;DR: This paper proposes schemes where certain application level processing in parallel database query execution is performed on the network processor and observes that the proposed schemes perform equally well even in a scaled architecture i.e., when the number of processors is increased from 2 to 64.

Abstract: Workstation clusters equipped with high performance interconnect having programmable network processors facilitate interesting opportunities to enhance the performance of parallel application run on them. In this paper, we propose schemes where certain application level processing in parallel database query execution is performed on the network processor. We evaluate the performance of TPC-H queries executing on a high end cluster where all tuple processing is done on the host processor, using a timed Petri net model, and find that tuple processing costs on the host processor dominate the execution time. These results are validated using a small cluster. We therefore propose 4 schemes where certain tuple processing activity is offloaded to the network processor. The first 2 schemes offload the tuple splitting activity - computation to identify the node on which to process the tuples, resulting in an execution time speedup of 1.09 relative to the base scheme, but with I/O bus becoming the bottleneck resource. In the 3rd scheme in addition to offloading tuple processing activity, the disk and network interface are combined to avoid the I/O bus bottleneck, which results in speedups up to 1.16, but with high host processor utilization. Our 4th scheme where the network processor also performs apart of join operation along with the host processor, gives a speedup of 1.47 along with balanced system resource utilizations. Further we observe that the proposed schemes perform equally well even in a scaled architecture i.e., when the number of processors is increased from 2 to 64.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Proceedings Article•10.1145/3126908.3126970

sPIN: High-performance streaming Processing In the Network

Torsten Hoefler, +4 more

- 12 Nov 2017

TL;DR: SPIN as discussed by the authors is a portable programming model to offload simple packet processing functions to the network card, which can be used to accelerate redundant in-memory filesystems and several other use cases.

...read moreread less

87

Performance evaluation of interconnection networks using simulation: tools and case studies

Javier Navaridas Palma

- 01 Jan 2009

TL;DR: In this paper, the performance evaluation of interconnection networks has been studied in a simulation environment developed within the author's research group and all the related tools, including trace-driven engine and application-kernels.

...read moreread less

9

•Dissertation

HyQoZ - Optimisation de requêtes hybrides basée sur des contrats SLA

Carlos-Manuel Lopez-Enriquez

- 23 Oct 2014

TL;DR: Le principe consiste a combiner les services de donnees and calcule pour construire un evaluateur de requetes adapte au SLA requis par l’utilisateur, tout en considerant the conditions of QoS des services and du reseau.

...read moreread less

2

Book Chapter•10.1007/11602569_21

Offloading bloom filter operations to network processor for parallel query processing in cluster of workstations

V. Santhosh Kumar, +2 more

- 18 Dec 2005

TL;DR: This paper proposes and evaluates a scheme to offload the Bloom filter operations to the network processor, and explores offloading certain tuple processing activities on to thenetwork processor by adopting a network interface attached disk scheme.

...read moreread less

1

References

•Journal Article•10.1145/129888.129894

Parallel database systems: the future of high performance database systems

David J. DeWitt, +1 more

- 01 Jun 1992

- Communications of The ACM

TL;DR: Eradata, Tandem, and a host of startup companies have successfully developed and marketed highly parallel database machines.

...read moreread less

1.4K

•Proceedings Article•10.1145/602259.602261

Implementation techniques for main memory database systems

David J. DeWitt, +5 more

- 01 Jun 1984

TL;DR: This paper considers the changes necessary to permit a relational database system to take advantage of large amounts of main memory, and evaluates AVL vs B+-tree access methods, hash-based query processing strategies vs sort-merge, and study recovery issues when most or all of the database fits in main memory.

...read moreread less

953

•Journal Article•10.1145/128762.128764

Join processing in relational databases

Priti Mishra, +1 more

- 01 Mar 1992

- ACM Computing Surveys

TL;DR: The different kinds of joins and the various implementation techniques are surveyed and they are classified based on how they partition tuples from different relations.

...read moreread less

522

Journal Article•10.1109/MM.2004.1268994

Microbenchmark performance comparison of high-speed cluster interconnects

Jiuxing Liu, +7 more

- 01 Jan 2004

- IEEE Micro

TL;DR: The results show that to gain more insight into the performance characteristics of these interconnects, it is important to go beyond simple tests such as those for latency and bandwidth, and plan to expand the microbenchmark suite to include more tests and more interConnects.

...read moreread less

102

•Proceedings Article

Using Segmented Right-Deep Trees for the Execution of Pipelined Hash Joins

Ming-Syan Chen, +3 more

- 23 Aug 1992

TL;DR: This paper derives an analytical model for the execution of a pipeline segment, and develops heuristic schemes to determine the query execution plan based on a segmented right-deep tree so that the query can be efficiently executed.

...read moreread less

99