An efficient multi-dimensional index for cloud data management
Xiangyu Zhang,Jing Ai,Zhongyuan Wang,Jiaheng Lu,Xiaofeng Meng +4 more
- 02 Nov 2009
- pp 17-24
113
TL;DR: This paper proposes an efficient approach to build multi-dimensional index for Cloud computing system using the combination of R-tree and KD-tree to organize data records and offer fast query processing and efficient index maintenance.
read more
Abstract: Recently, the cloud computing platform is getting more and more attentions as a new trend of data management. Currently there are several cloud computing products that can provide various services. However, currently the cloud platforms only support simple keyword-based queries and can't answer complex queries efficiently due to lack of efficient index techniques. In this paper we propose an efficient approach to build multi-dimensional index for Cloud computing system. We use the combination of R-tree and KD-tree to organize data records and offer fast query processing and efficient index maintenance. Our approach can process typical multi-dimensional queries including point queries and range queries efficiently. Besides, frequent change of data on big amount of machines makes the index maintenance a challenging problem, and to cope with this problem we proposed a cost estimation-based index update strategy that can effectively update the index structure. Our experiments show that our indexing techniques improve query efficiency by an order of magnitude compared with alternative approaches, and scale well with the size of the data. Our approach is quite general and independent from the underlying infrastructure and can be easily carried over for implementation on various Cloud computing platforms.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
From Big Data to Big Data Mining: Challenges, Issues, and Opportunities
Dunren Che,Mejdl Safran,Zhiyong Peng +2 more
- 22 Apr 2013
TL;DR: This paper provides an overview of big data mining and discusses the related challenges and the new opportunities, including a review of state-of-the-art frameworks and platforms for processing and managing big data as well as the efforts expected onbig data mining.
231
$\mathcal{MD}$-HBase: design and implementation of an elastic data infrastructure for cloud-scale location services
TL;DR: The design and implementation of HBase, a scalable data management infrastructure for LBSs that bridges the gap between scale and functionality is presented and two standard index structures—the K-d tree and the Quad tree—can be layered over a range partitioned key-value store to provide scalable multi-dimensional data infrastructure.
111
ST-Hadoop: a MapReduce framework for spatio-temporal data
TL;DR: The key idea behind the performance gained in ST-Hadoop is its ability in indexing spatio-temporal data within Hadoop Distributed File System.
98
Multi-dimensional Index on Hadoop Distributed File System
Haojun Liao,Jizhong Han,Jinyun Fang +2 more
- 15 Jul 2010
TL;DR: Experimental evaluation demonstrates that the built-in index structure can efficiently improve query performance, and serve as cornerstones for structured or semi-structured data management.
93
An efficient index for massive IOT data in cloud environment
Youzhong Ma,Jia Rao,Weisong Hu,Xiaofeng Meng,Xu Han,Yu Zhang,Yunpeng Chai,Chunqiu Liu +7 more
- 29 Oct 2012
TL;DR: This work proposes an update and query efficient index framework (UQE-Index) based on key-value store that can support high insert throughput and provide efficient multi-dimensional query simultaneously.
68
References
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
- 06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Chord: A scalable peer-to-peer lookup service for internet applications
Ion Stoica,Robert Morris,David R. Karger,M. Frans Kaashoek,Hari Balakrishnan +4 more
- 27 Aug 2001
TL;DR: Results from theoretical analysis, simulations, and experiments show that Chord is scalable, with communication cost and the state maintained by each node scaling logarithmically with the number of Chord nodes.
11.2K
A scalable content-addressable network
Sylvia Ratnasamy,Paul Francis,Mark Handley,Richard M. Karp,Scott Shenker +4 more
- 27 Aug 2001
TL;DR: The concept of a Content-Addressable Network (CAN) as a distributed infrastructure that provides hash table-like functionality on Internet-like scales is introduced and its scalability, robustness and low-latency properties are demonstrated through simulation.
7.2K
The Google file system
Sanjay Ghemawat,Howard Gobioff,Shun-Tak Albert Leung +2 more
- 19 Oct 2003
TL;DR: This paper presents file system interface extensions designed to support distributed applications, discusses many aspects of the design, and reports measurements from both micro-benchmarks and real world use.
Related Papers (5)
Antonin Guttman
- 01 Jun 1984
Sanjay Ghemawat,Howard Gobioff,Shun-Tak Albert Leung +2 more
- 19 Oct 2003