A technique for parallel query optimization using MapReduce framework and a semantic-based clustering method.

doi:10.7717/PEERJ-CS.580

Open AccessJournal Article10.7717/PEERJ-CS.580

A technique for parallel query optimization using MapReduce framework and a semantic-based clustering method.

Elham Azhir, +4 more

- 01 Jan 2021

- PeerJ

- Vol. 7

4

TL;DR: In this article, the authors have applied and tested a model for clustering variant sizes of large query datasets parallelly using MapReduce and showed the effectiveness of the parallel implementation of query workloads clustering to achieve good scalability.

Abstract: Query optimization is the process of identifying the best Query Execution Plan (QEP). The query optimizer produces a close to optimal QEP for the given queries based on the minimum resource usage. The problem is that for a given query, there are plenty of different equivalent execution plans, each with a corresponding execution cost. To produce an effective query plan thus requires examining a large number of alternative plans. Access plan recommendation is an alternative technique to database query optimization, which reuses the previously-generated QEPs to execute new queries. In this technique, the query optimizer uses clustering methods to identify groups of similar queries. However, clustering such large datasets is challenging for traditional clustering algorithms due to huge processing time. Numerous cloud-based platforms have been introduced that offer low-cost solutions for the processing of distributed queries such as Hadoop, Hive, Pig, etc. This paper has applied and tested a model for clustering variant sizes of large query datasets parallelly using MapReduce. The results demonstrate the effectiveness of the parallel implementation of query workloads clustering to achieve good scalability.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.3390/math10193517

Performance Evaluation of Query Plan Recommendation with Apache Hadoop and Apache Spark

Elham Azhir, +3 more

- 17 Sep 2022

- Mathematics

TL;DR: The results of the experiments demonstrated the effectiveness of parallel query clustering in achieving high scalability, and Apache Spark achieved better performance than Apache Hadoop, reaching an average speedup of 2x.

...read moreread less

5

•Posted Content•10.31219/osf.io/mgpr7

Performance Evaluation of Query Plan Recommendation with Apache Hadoop and Apache Spark

17 Sep 2022

TL;DR: In this paper , a MapReduce-based access plan recommendation method is proposed to cluster different sizes of query datasets in the query space based on the query execution plans (QEPs) and the performance evaluation is performed based on execution time.

...read moreread less

5

•Journal Article•10.1155/2022/9095330

Visual Dynamic Simulation Model of Unstructured Data in Social Networks

Zhang Xiang

- 13 Jan 2022

- Security and Communication Networks

TL;DR: The experimental results show that the Hadoop cluster design, implements data persistence by HDFS, uses MapReduce to extract data clusters for distributed computing, and builds a visual dynamic simulation model of unstructured data in social network have a good visualization effect and can effectively improve the stability and efficiency of un Structured data visualization in social networks.

...read moreread less

2

•Journal Article•10.31154/cogito.v9i1.444.60-72

Database Optimization Techniques with Logic Execution Optimization on Microservices Architecture

Samidi Samidi

- 30 Jun 2023

- Cogito smart journal

TL;DR: In this paper , the authors used database optimization techniques with logic execution optimization microservices architecture to obtain query response time efficiency for accounting applications and obtained the source of information from the Accounting Harmony Accounting Module, which has an API (get-list-attachment) with data sourced from Service Accounting (581253 records) and Service Users (2182 records).

...read moreread less

References

•Journal Article•10.1016/0377-0427(87)90125-7

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Peter J. Rousseeuw

- 01 Nov 1987

- Journal of Computational and Applied Mat...

TL;DR: A new graphical display is proposed for partitioning techniques, where each cluster is represented by a so-called silhouette, which is based on the comparison of its tightness and separation, and provides an evaluation of clustering validity.

...read moreread less

19K

Journal Article•10.1145/1327452.1327492

MapReduce: simplified data processing on large clusters

Jeffrey Dean, +1 more

- 01 Jan 2008

- Communications of The ACM

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.

...read moreread less

18.6K

•Proceedings Article

A density-based algorithm for discovering clusters in large spatial Databases with Noise

Martin Ester, +3 more

- 01 Jan 1996

TL;DR: DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.

...read moreread less

17.8K

Journal Article•10.1111/J.1469-8137.1912.TB05611.X

The distribution of the flora in the alpine zone.1

Paul Jaccard

- 01 Feb 1912

- New Phytologist

4.6K

•Journal Article•10.1093/BIOINFORMATICS/17.9.763

Principal component analysis for clustering gene expression data.

Ka Yee Yeung, +1 more

- 01 Sep 2001

- Bioinformatics

TL;DR: The empirical study showed that clustering with the PCs instead of the original variables does not necessarily improve, and often degrades, cluster quality, and would not recommend PCA before clustering except in special circumstances.

...read moreread less

1.3K

...

Expand

A technique for parallel query optimization using MapReduce framework and a semantic-based clustering method.

Chat with Paper

AI Agents for this Paper

Citations

Performance Evaluation of Query Plan Recommendation with Apache Hadoop and Apache Spark

Performance Evaluation of Query Plan Recommendation with Apache Hadoop and Apache Spark

Visual Dynamic Simulation Model of Unstructured Data in Social Networks

Database Optimization Techniques with Logic Execution Optimization on Microservices Architecture

References

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

MapReduce: simplified data processing on large clusters

A density-based algorithm for discovering clusters in large spatial Databases with Noise

The distribution of the flora in the alpine zone.1

Principal component analysis for clustering gene expression data.

Related Papers (5)

Query optimization for massively parallel data processing

Query optimization using clustering and Genetic Algorithm for Distributed Databases

Dynamic query re-optimization

Plan selection based on query clustering

JOMR: Multi-join optimizer technique to enhance map-reduce job