Structural Join Processing for XML Based on MapReduce

doi:10.3778/j.issn.1673-9418.1509011

10.3778/j.issn.1673-9418.1509011

Structural Join Processing for XML Based on MapReduce

Dong LI, +2 more

1

TL;DR: This study proposes a MapReduce-based XML query processing framework, implementing interval encoding, prefix encoding, and hierarchical encoding, and develops a cost model for query optimization, achieving faster query processing speeds with interval encoding.

Abstract: 可扩展标记语言(extensible markup language,XML)已经成为Web上数据表达和数据交换的事实标准,Hadoop已成为云计算和大数据处理典型支撑框架之一,基于Hadoop MapReduce来实现XML查询处理十分必要。为了实现基于MapReduce的XML查询处理,首先实现了区间编码、前缀编码和层次编码等3种不同的XML数据编码方式,以此为基础来研究和实现基于MapReduce的XML结构连接处理。为查询处理建立了代价模型,通过代价估算获得优化的查询计划树。最后开展了XML查询处理实验评估,结果表明相对其他两种XML编码方式,区间编码方式下实现的查询处理速度较快,基于代价估算的优化方法能进一步有效地提高XML查询处理性能。

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

10.3969/j.issn.1000-386x.2018.07.004

An ocl parallel query method based on mapreduce

Xianli Jin, +1 more

TL;DR: A MapReduce-based OCL parallel query method, OPQM, is proposed to efficiently handle large-scale queries in single-machine environments, leveraging object property collections and parallelizing attribute queries to significantly reduce OCL query time.

...read moreread less

References

Proceedings Article•10.1145/1807167.1807222

Efficient parallel set-similarity joins using MapReduce

Rares Vernica, +2 more

- 06 Jun 2010

TL;DR: This paper proposes a 3-stage approach for end-to-end set-similarity joins in parallel using the popular MapReduce framework, and reports results from extensive experiments on real datasets to evaluate the speedup and scaleup properties of the proposed algorithms using Hadoop.

...read moreread less

585

Proceedings Article•10.1145/1989323.1989423

Processing theta-joins using MapReduce

Alper Okcan, +1 more

- 12 Jun 2011

TL;DR: This work derives a surprisingly simple randomized algorithm, called 1-Bucket-Theta, for implementing arbitrary joins (theta-joins) in a single MapReduce job, and provides evidence that for a variety of join problems, it is either close to optimal or the best possible option.

...read moreread less

294

Journal Article•10.14778/2212351.2212353

V-SMART-join: a scalable mapreduce framework for all-pair similarity joins of multisets and vectors

Ahmed Metwally, +1 more

- 01 Apr 2012

TL;DR: V-SMART-Join this paper is a scalable MapReduce-based framework for discovering all pairs of similar entities, which is applicable to sets, multisets, and vectors.

...read moreread less

191

Proceedings Article•10.1145/2038916.2038928

Query optimization for massively parallel data processing

Sai Wu, +3 more

- 26 Oct 2011

TL;DR: A query optimization scheme for MapReduce-based query processing systems by embedding into Hive a query optimizer which is designed to generate an efficient query plan based on the proposed cost model.

...read moreread less

186

Proceedings Article•10.1109/ICDE.2012.87

Parallel Top-K Similarity Join Algorithms Using MapReduce

Younghoon Kim, +1 more

- 01 Apr 2012

TL;DR: This paper investigates how the top-k similarity join algorithms can get benefits from the popular MapReduce framework, and develops the divide-and-conquer and branch- and-bound algorithms and proposes the all pair partitioning and essential pair partitions to minimize the amount of data transfers between map and reduce functions.

...read moreread less

95