Proceedings Article10.1109/CCECE.2005.1557377
Parallel processing of XML databases
Ghassan Z. Qadah
- 01 May 2005
- pp 2000-2004
7
TL;DR: This paper examines several techniques for structuring and storing XML data across the different cluster nodes and develops a number of algorithms suitable for processing a certain class of queries, namely, the containment queries, against the parallel XML database.
read more
Abstract: Beowulf cluster is a name given to a high performance, low-cost parallel computer system made of commodity hardware and software components. It consists of a number of processing nodes, interconnected via a switch. The extensible markup language (XML) data model, on the other hand, has recently gained huge popularity because of its ability to represent a wide variety of structured (tabular-like) and semi-structured (textual-like) data. Several query languages have been proposed for the XML data model, the most-widely known is XQuery. This paper reviews the XML data model and its query language within the context of cluster/parallel computing environment. It examines several techniques for structuring and storing XML data across the different cluster nodes. It develops a number of algorithms suitable for processing a certain class of queries, namely, the containment queries, against the parallel XML database. This paper also shows that one of these algorithms, the one that takes advantage of the parallelism existing between the different documents within the XML database, is outperforming all of the other presented ones
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Simultaneous transducers for data-parallel XML parsing
Yinfei Pan,Ying Zhang,Kenneth Chiu +2 more
- 14 Apr 2008
TL;DR: This work parallelize the preparsing pass itself by using a simultaneous finite transducer (SFT), which implicitly maintains multiple preparser results and addresses the challenge of determining the correct initial state at beginning of a chunk by simply considering all possible initial states simultaneously.
28
Hybrid Parallelism for XML SAX Parsing
Yinfei Pan,Ying Zhang,Kenneth Chiu +2 more
- 23 Sep 2008
TL;DR: To handle inherent data dependencies in XML while still allowing reasonable scalability, this work uses a 4-stage software pipeline with a combination of strictly sequential stages and stages that can be further data-parallelized within the stage, a hybrid between pipelined parallelism and data parallelism.
22
Speculative p-DFAs for parallel XML parsing
Ying Zhang,Yinfei Pan,Kenneth Chiu +2 more
- 01 Dec 2009
TL;DR: This paper explores the use of speculation to improve the performance of parallel XML parsing by using an initial preparsing stage to build a sketch of the document which is called the skeleton, and shows good performance and scalability on both a 30 CPU Sun E6500 machine running Solaris and a Linux machine with two Intel Xeon L5320 CPUs.
15
Parsing XML using parallel traversal of streaming trees
Yinfei Pan,Ying Zhang,Kenneth Chiu +2 more
- 17 Dec 2008
TL;DR: This paper investigates parallel, SAX-style parsing of XML via a parallel, depth-first traversal of the streaming document, and shows good scalability up to about 6 cores on a Linux platform.
Practical and Theoretical Aspects of a Parallel Twig Join Algorithm for XML Processing using a GPGPU
Lila Shnaiderman,Oded Shmueli +1 more
- 01 Jan 2012
TL;DR: GPU-Twig as discussed by the authors uses the data and task parallelism of the GPU to perform memory-intensive tasks whereas the CPU is used to perform I/O and resource management, which reduces the running time of queries in comparison with other algorithms on CPU based platforms and multicore based platforms.
References
A Relational Model of Data Large Shared Data Banks
E. F. Codd
- 01 Jan 1970
TL;DR: In this paper, a model based on n-ary relations, a normal form for data base relations, and the concept of a universal data sublanguage are introduced, and certain operations on relations are discussed and applied to the problems of redundancy and consistency in the user's model.
A Relational Model of Data for Large Shared Data Banks (Original Manuscript)
E. F. Codd
- 01 Jan 1970
TL;DR: A model based on n-ary relations, a normal form for data base relations, and the concept of a universal data sublanguage are introduced and certain operations on relations are discussed and applied to the problems of redundancy and consistency in the user's model.
2.6K
On supporting containment queries in relational database management systems
Chun Zhang,Jeffrey F. Naughton,David J. DeWitt,Qiong Luo,Guy M. Lohman +4 more
- 01 May 2001
TL;DR: The results suggest that contrary to most expectations, with some modifications, a native implementations in an RDBMS can support this class of query much more efficiently.
Object-oriented database systems
François Banciihon
- 01 Mar 1988
TL;DR: This paper describes the vision of the current state of object-oriented database research, and describes what it considers to be the main characteristics of an object oriented system: encapsulation, object identity, classes or types, inheritance, overriding and late binding.
293
•Book
Object-Oriented Database Systems
Elisa Bertino,Lorenzo D. Martino +1 more
- 01 Jan 1993
Abstract: Object-oriented data models query languages versions evolution authorization query processing storage management and indexing techniques systems definition of covariance and contravariance formulation of derived parameters for the cost model concludions and future developments.
201
Related Papers (5)
Toshiyuki Amagasa,K. Kido,Hiroyuki Kitagawa +2 more
- 03 Sep 2007
Wei Lu,Kenneth Chiu,Yinfei Pan +2 more
- 28 Sep 2006