TL;DR: This paper shows that XML's ordered data model can indeed be efficiently supported by a relational database system, and proposes three order encoding methods that can be used to represent XML order in the relational data model, and also proposes algorithms for translating ordered XPath expressions into SQL using these encoding methods.
Abstract: XML is quickly becoming the de facto standard for data exchange over the Internet. This is creating a new set of data management requirements involving XML, such as the need to store and query XML documents. Researchers have proposed using relational database systems to satisfy these requirements by devising ways to "shred" XML documents into relations, and translate XML queries into SQL queries over these relations. However, a key issue with such an approach, which has largely been ignored in the research literature, is how (and whether) the ordered XML data model can be efficiently supported by the unordered relational data model. This paper shows that XML's ordered data model can indeed be efficiently supported by a relational database system. This is accomplished by encoding order as a data value. We propose three order encoding methods that can be used to represent XML order in the relational data model, and also propose algorithms for translating ordered XPath expressions into SQL using these encoding methods. Finally, we report the results of an experimental study that investigates the performance of the proposed order encoding methods on a workload of ordered XML queries and updates.
TL;DR: The results suggest that contrary to most expectations, with some modifications, a native implementations in an RDBMS can support this class of query much more efficiently.
Abstract: Virtually all proposals for querying XML include a class of query we term “containment queries”. It is also clear that in the foreseeable future, a substantial amount of XML data will be stored in relational database systems. This raises the question of how to support these containment queries. The inverted list technology that underlies much of Information Retrieval is well-suited to these queries, but should we implement this technology (a) in a separate loosely-coupled IR engine, or (b) using the native tables and query execution machinery of the RDBMS? With option (b), more than twenty years of work on RDBMS query optimization, query execution, scalability, and concurrency control and recovery immediately extend to the queries and structures that implement these new operations. But all this will be irrelevant if the performance of option (b) lags that of (a) by too much. In this paper, we explore some performance implications of both options using native implementations in two commercial relational database systems and in a special purpose inverted list engine. Our performance study shows that while RDBMSs are generally poorly suited for such queries, under conditions they can outperform an inverted list engine. Our analysis further identifies two significant causes that differentiate the performance of the IR and RDBMS implementations: the join algorithms employed and the hardware cache utilization. Our results suggest that contrary to most expectations, with some modifications, a native implementations in an RDBMS can support this class of query much more efficiently.
TL;DR: Wang et al. as mentioned in this paper proposed a new system for indexing and storing XML data based on a numbering scheme for elements, which quickly determines the ancestor-descendant relationship between elements in the hierarchy of XML data.
Abstract: With the advent of XML as a standard for data representation and exchange on the Internet, storing and querying XML data becomes more and more important. Several XML query languages have been proposed, and the common feature of the languages is the use of regular path expressions to query XML data. This poses a new challenge concerning indexing and searching XML data, because conventional approaches based on tree traversals may not meet the processing requirements under heavy access requests. In this paper, we propose a new system for indexing and storing XML data based on a numbering scheme for elements. This numbering scheme quickly determines the ancestor-descendant relationship between elements in the hierarchy of XML data. We also propose several algorithms for processing regular path expressions, namely, (1) -Join for searching paths from an element to another, (2) -Join for scanning sorted elements and attributes to find element-attribute pairs, and (3) -Join for finding Kleene-Closure on repeated paths or elements. The -Join algorithm is highly effective particularly for searching paths that are very long or whose lengths are unknown. Experimental results from our prototype system implementation show that the proposed algorithms can process XML queries with regular path expressions by up to an or
TL;DR: A method for encoding XML tree data that includes the step of encoding the semi-structured data into strings of arbitrary length in a way that maintains non-structural and structural information about the XML data, and enables indexing the encoded XML data in a manner that facilitates efficient search and browsing is presented in this paper.
Abstract: A method for encoding XML tree data that includes the step of encoding the semi-structured data into strings of arbitrary length in a way that maintains non-structural and structural information about the XML data, and enables indexing the encoded XML data in a way that facilitates efficient search and browsing.
TL;DR: This work proposes a framework for enforcing access control policies on published XML documents using cryptography, and describes cryptographic techniques for enforcing the protection model on published data, and provides a performance analysis using real datasets.
Abstract: We propose a framework for enforcing access control policies on published XML documents using cryptography. In this framework the owner publishes a single data instance, which is partially encrypted, and which enforces all access control policies. Our contributions include a declarative language for access policies, and the resolution of these policies into a logical "protection model" which protects an XML tree with keys. The data owner enforces an access control policy by granting keys to users. The model is quite powerful, allowing the data owner to describe complex access scenarios, and is also quite elegant, allowing logical optimizations to be described as rewriting rules. Finally, we describe cryptographic techniques for enforcing the protection model on published data, and provide a performance analysis using real datasets.