Structural Join Processing for XML Based on MapReduce
Dong LI,Zehang DENG,Zuli LI +2 more
1
TL;DR: This study proposes a MapReduce-based XML query processing framework, implementing interval encoding, prefix encoding, and hierarchical encoding, and develops a cost model for query optimization, achieving faster query processing speeds with interval encoding.
read more
Abstract: 可扩展标记语言(extensible markup language,XML)已经成为Web上数据表达和数据交换的事实标准,Hadoop已成为云计算和大数据处理典型支撑框架之一,基于Hadoop MapReduce来实现XML查询处理十分必要。为了实现基于MapReduce的XML查询处理,首先实现了区间编码、前缀编码和层次编码等3种不同的XML数据编码方式,以此为基础来研究和实现基于MapReduce的XML结构连接处理。为查询处理建立了代价模型,通过代价估算获得优化的查询计划树。最后开展了XML查询处理实验评估,结果表明相对其他两种XML编码方式,区间编码方式下实现的查询处理速度较快,基于代价估算的优化方法能进一步有效地提高XML查询处理性能。
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
An ocl parallel query method based on mapreduce
Xianli Jin,Kaixuan Ma +1 more
TL;DR: A MapReduce-based OCL parallel query method, OPQM, is proposed to efficiently handle large-scale queries in single-machine environments, leveraging object property collections and parallelizing attribute queries to significantly reduce OCL query time.
References
Efficient parallel set-similarity joins using MapReduce
Rares Vernica,Michael J. Carey,Chen Li +2 more
- 06 Jun 2010
TL;DR: This paper proposes a 3-stage approach for end-to-end set-similarity joins in parallel using the popular MapReduce framework, and reports results from extensive experiments on real datasets to evaluate the speedup and scaleup properties of the proposed algorithms using Hadoop.
Processing theta-joins using MapReduce
Alper Okcan,Mirek Riedewald +1 more
- 12 Jun 2011
TL;DR: This work derives a surprisingly simple randomized algorithm, called 1-Bucket-Theta, for implementing arbitrary joins (theta-joins) in a single MapReduce job, and provides evidence that for a variety of join problems, it is either close to optimal or the best possible option.
V-SMART-join: a scalable mapreduce framework for all-pair similarity joins of multisets and vectors
Ahmed Metwally,Christos Faloutsos +1 more
- 01 Apr 2012
TL;DR: V-SMART-Join this paper is a scalable MapReduce-based framework for discovering all pairs of similar entities, which is applicable to sets, multisets, and vectors.
Query optimization for massively parallel data processing
Sai Wu,Feng Li,Sharad Mehrotra,Beng Chin Ooi +3 more
- 26 Oct 2011
TL;DR: A query optimization scheme for MapReduce-based query processing systems by embedding into Hive a query optimizer which is designed to generate an efficient query plan based on the proposed cost model.
186
Parallel Top-K Similarity Join Algorithms Using MapReduce
Younghoon Kim,Kyuseok Shim +1 more
- 01 Apr 2012
TL;DR: This paper investigates how the top-k similarity join algorithms can get benefits from the popular MapReduce framework, and develops the divide-and-conquer and branch- and-bound algorithms and proposes the all pair partitioning and essential pair partitions to minimize the amount of data transfers between map and reduce functions.
95