Proceedings Article10.1109/DEXA.2007.118
Querying XML Data using PC Cluster System
Toshiyuki Amagasa,K. Kido,Hiroyuki Kitagawa +2 more
- 03 Sep 2007
- pp 5-9
19
TL;DR: This paper proposes a novel approach for querying large-scale XML data using PC cluster system, and discusses XML data partitioning to enable parallel processing of XML queries, and introduces a path-based partitioning for XML data.
read more
Abstract: This paper proposes a novel approach for querying large-scale XML data using PC cluster system. With the recent spread of the XML format, large-scale data coded in XML ranging from several hundreds of megabytes to several gigabytes has become common. However, XML databases are often innefficient in dealing with huge XML data. The problem is the complexity of the XML data model and query processing. To cope with this problem, we attempt to construct a parallel XML database on top of a PC cluster system. To this end, we discuss XML data partitioning to enable parallel processing of XML queries. We introduce a path-based partitioning for XML data. The obtained XML fragments are then allocated to cluster nodes. To obtain cost-efficient allocation of the fragments, we discuss cost functions for parallel XPath processing and an algorithm to compute pseudo-optimal allocation, which is based on the well-known genetic algorithm. Finally, we demonstrate effectiveness of the proposed scheme by a series of experiments.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Word shape descriptor-based document image indexing: a new DBH-based approach
TL;DR: The exhaustive experimental evaluation of the proposed framework on a collection of documents belonging to Devanagari, Bengali and English scripts has yielded encouraging results.
18
XML data partitioning strategies to improve parallelism in parallel holistic twig joins
Imam Machdi,Toshiyuki Amagasa,Hiroyuki Kitagawa +2 more
- 15 Feb 2009
TL;DR: This paper proposes XML data partitioning strategies that are able to alleviate system performance degradation due to workload imbalance, especially for parallel holistic twig joins processing.
17
A novel approach to perform context‐based automatic spoken document retrieval of political speeches based on wavelet tree indexing
Anishka Gupta,Divakar Yadav +1 more
TL;DR: The proposed system develops a speech recognition system and introduces a novel indexing scheme, based on wavelet trees for retrieving data, on the basis of spoken document retrieval for political speeches, delivered in a variety of environments.
13
GMX: an XML data partitioning scheme for holistic twig joins
Imam Machdi,Toshiyuki Amagasa,Hiroyuki Kitagawa +2 more
- 24 Nov 2008
TL;DR: A grid metadata model for XML is proposed that gives a conceptual view to partition XML data, specifically for holistic twig joins processing and adopts a cost-based model and facilitates a set of partition refinement methods for workload balancing purpose.
11
A Research Survey on Large XML Data: Streaming, Selectivity Estimation and Parallelism
Muath Alrammal,Muath Alrammal,Gaétan Hains +2 more
- 01 Jan 2014
TL;DR: This chapter surveys a large body of recent research on efficient querying methods for XML data and analysis of the literature follows the three dimensions of stream-processing, parallel processing and performance variability.
8
References
Genetic algorithms in search, optimization and machine learning
David E. Goldberg
- 01 Jan 1989
TL;DR: This book brings together the computer techniques, mathematical tools, and research results that will enable both students and practitioners to apply genetic algorithms to problems in many fields.
58.6K
•Book
Genetic algorithms in search, optimization, and machine learning
David E. Goldberg
- 01 Sep 1988
TL;DR: In this article, the authors present the computer techniques, mathematical tools, and research results that will enable both students and practitioners to apply genetic algorithms to problems in many fields, including computer programming and mathematics.
A Method for the Construction of Minimum-Redundancy Codes
David A. Huffman
- 01 Sep 1952
TL;DR: A minimum-redundancy code is one constructed in such a way that the average number of coding digits per message is minimized.
6.1K
Structural joins: a primitive for efficient XML query pattern matching
Shurug Al-Khalifa,H. V. Jagadish,Nick Koudas,Jignesh M. Patel,Divesh Srivastava,Yuqing Wu +5 more
- 07 Aug 2002
TL;DR: It is shown that, in some cases, tree-merge algorithms can have performance comparable to stack-tree algorithms, in many cases they are considerably worse, and this behavior is explained by analytical results that demonstrate that, on sorted inputs, the stack- tree algorithms have worst-case I/O and CPU complexities linear in the sum of the sizes of inputs and output, while the tree-MERge algorithms do not have the same guarantee.
High-order entropy-compressed text indexes
Roberto Grossi,Ankur Gupta,Jeffrey Scott Vitter +2 more
- 12 Jan 2003
TL;DR: A novel implementation of compressed suffix arrays exhibiting new tradeoffs between search time and space occupancy for a given text (or sequence) of n symbols over an alphabet σ, where each symbol is encoded by lg|σ| bits.
Related Papers (5)
Ramez Alkhatib,Marc H. Scholl +1 more
- 01 Nov 2008
Ghassan Z. Qadah
- 01 May 2005
Cheng-Han You,Sheng-De Wang +1 more
- 02 Sep 2011