TL;DR: Equivalences of XPath 1.0 location paths involving reverse axes, such as anc and prec, are established and used as rewriting rules in an algorithm for transforming location paths with reverse axes into equivalent reverse-axis-free ones.
Abstract: The location path language XPath is of particular importance for XML applications since it is a core component of many XML processing standards such as XSLT or XQuery. In this paper, based on axis symmetry of XPath, equivalences of XPath 1.0 location paths involving reverse axes, such as anc and prec, are established. These equivalences are used as rewriting rules in an algorithm for transforming location paths with reverse axes into equivalent reverse-axis-free ones. Location paths without reverse axes, as generated by the presented rewriting algorithm, enable efficient SAX-like streamed data processing of XPath.
TL;DR: A system for filtering an XML document with XPath expressions and a selective data dissemination system incorporating the system or the method is described in this paper, where a tree prober is associated with the tree builder and employs the XPath expression tree to probe the document data tree and obtain matches with the substrings.
Abstract: A system for, and method of, filtering an XML document with XPath expressions and a selective data dissemination system incorporating the system or the method. In one embodiment, the filtering system includes: (1) a tree builder that builds a document data tree for the XML document and an XPath expression tree based on substrings in the XPath expressions and (2) a tree prober, associated with the tree builder, that employs the XPath expression tree to probe the document data tree and obtain matches with the substrings.
TL;DR: The main idea of this article is to describe some of the main algorithmic techniques that have been proposed for XPath Query Containment, to decrease online computation time in an XML publish-subscribe scenario with hundreds of subscribers and tens of thousands of XML documents to be delivered per day.
Abstract: Consider an XML publish-subscribe scenario with hundreds of subscribers and tens of thousands of XML documents to be delivered per day. Subscribers specify the documents in which they are interested in by means of XPath [8] expressions. If an expression matches a (part of a) document it is delivered to the subscriber. Naturally, it is desired that the decision to which subscriber a document must be sent should be taken quickly. Although the test whether a single XPath expression matches can be done in polynomial time, it is not efficient to test every such expression for every document. Fortunately, there is a partial order on expressions, i.e., for some expressions p, q it might hold that whenever a document matches p it also matches q (denoted p ⊆0 q). If we already know that a document matches p, we do not need to test q anymore, as it matches automatically. Correspondingly, if we know that q does not match then p will not match either. Hence, the inclusion structure of the XPath expressions should be computed in advance to decrease online computation time. This leads to the algorithmic problem of XPath Query Containment, i.e., checking whether p ⊆0 q (for a different, indexbased approach see, e.g., [6]). The main idea of this article is to describe some of the main algorithmic techniques that have been proposed for XPath Query Containment. These techniques are described in Section 5. Before that, in Sections 2 and 3 the basic definitions on XPath and the
TL;DR: This paper shows how XPath can be used to specify the semantics of an access control policy for XML documents, and uses the developed framework to give a formal specification of the five most prominent approaches of access controlfor XML documents from the literature.
Abstract: Access control for XML documents is a non-trivial topic, as can be witnessed from the number of approaches presented in the literature. Trying to compare these, we discovered the need for a simple, clearand unambiguous language to state the declarative semantics of an access control policy. All current approaches state the semantics in natural language, which has none of the above properties. This makes it hard to assess whether the proposed algorithms are correct (i.e., really implement the described semantics). It is also hard to assess the proposed policy on its merits, and to compare it to others (for file systems for instance). This paper shows how XPath can be used to specify the semantics of an access control policy for XML documents. Using XPath has great advantages: it is standard technology, widely used and it has clear and easy syntax and semantics. We use the developed framework to give a formal specification of the five most prominent approaches of access controlfor XML documents from the literature.
TL;DR: This work presents a streaming algorithm for evaluating XPath expressions that use backward axes (parent and ancestor) and forward axes in a single document-order traversal of an XML document that significantly outperforms a traditional nonstreaming XPath engine.
Abstract: We present a streaming algorithm for evaluating XPath expressions that use backward axes (parent and ancestor) and forward axes in a single document-order traversal of an XML document. Other streaming XPath processors handle only forward axes. We show through experiments that our algorithm significantly outperforms (by more than a factor of two) a traditional nonstreaming XPath engine. Furthermore, our algorithm scales better because it retains only the relevant portions of the input document in memory. Our engine successfully processes documents over 1GB in size, whereas the traditional XPath engine degrades considerably in performance for documents over 100 MB in size and fails to complete for documents of size over 200 MB.