TL;DR: A theoretical framework about “matching cross” is established which demonstrates the intrinsic reason in the proof of optimality on holistic algorithms and a set of novel algorithms to efficiently process three categories of extended XML tree patterns are proposed.
Abstract: As business and enterprises generate and exchange XML data more often, there is an increasing need for efficient processing of queries on XML data. Searching for the occurrences of a tree pattern query in an XML database is a core operation in XML query processing. Prior works demonstrate that holistic twig pattern matching algorithm is an efficient technique to answer an XML tree pattern with parent-child (P-C) and ancestor-descendant (A-D) relationships, as it can effectively control the size of intermediate results during query processing. However, XML query languages (e.g., XPath and XQuery) define more axes and functions such as negation function, order-based axis, and wildcards. In this paper, we research a large set of XML tree pattern, called extended XML tree pattern, which may include P-C, A-D relationships, negation functions, wildcards, and order restriction. We establish a theoretical framework about “matching cross” which demonstrates the intrinsic reason in the proof of optimality on holistic algorithms. Based on our theorems, we propose a set of novel algorithms to efficiently process three categories of extended XML tree patterns. A set of experimental results on both real-life and synthetic data sets demonstrate the effectiveness and efficiency of our proposed theories and algorithms.
TL;DR: An indexing classification scheme is suggested and some of the current trends in indexing methods, which indicate a clear shift towards hybrid indexing are discussed, are discussed.
Abstract: With the rapid emergence of XML as a data exchange standard over the Web, storing and querying XML data have become critical issues. The two main approaches to storing XML data are (1) to employ traditional storage such as relational database, object-oriented database and so on, and (2) to create an XML-specific native storage. The storage representation affects the efficiency of query processing. In this paper, firstly, we review the two approaches for storing XML data. Secondly, we review various query optimization techniques such as indexing, labeling and join algorithms to enhance query processing in both approaches. Next, we suggest an indexing classification scheme and discuss some of the current trends in indexing methods, which indicate a clear shift towards hybrid indexing.
TL;DR: A practical attack on XML Encryption is described, which allows to decrypt a ciphertext by sending related ciphertexts to a Web Service and evaluating the server response, and shows that an adversary can decrypt a Ciphertext by performing only 14 requests per plaintext byte on average.
Abstract: XML Encryption was standardized by W3C in 2002, and is implemented in XML frameworks of major commercial and open-source organizations like Apache, redhat, IBM, and Microsoft. It is employed in a large number of major web-based applications, ranging from business communications, e-commerce, and financial services over healthcare applications to governmental and military infrastructures. In this work we describe a practical attack on XML Encryption, which allows to decrypt a ciphertext by sending related ciphertexts to a Web Service and evaluating the server response. We show that an adversary can decrypt a ciphertext by performing only 14 requests per plaintext byte on average. This poses a serious and truly practical security threat on all currently used implementations of XML Encryption.In a sense the attack can be seen as a generalization of padding oracle attacks (Vaudenay, Eurocrypt 2002). It exploits a subtle correlation between the block cipher mode of operation, the character encoding of encrypted text, and the response behaviour of a Web Service if an XML message cannot be parsed correctly.
TL;DR: This survey divides the existing approaches to keyword search on XML into several classes based on the problem they tackled, and performs a comprehensive analysis of these works.
Abstract: Keyword search is a user-friendly approach for users to retrieve information from XML data. Since an XML document can have a large size and contain a lot of information, an XML keyword search result should be a fragment of an XML document dynamically constructed at query time, which is achievable due to the structuredness of XML. Processing keyword searches on XML has several challenges, e.g., what are the elements in the XML document that are relevant to the query? How to generate the results efficiently and rank the results meaningfully? How to present the results to the user in a way such that the user can quickly find the desired information? In this survey, we review the papers in the literature that attempted to address these problems. We divide the existing approaches into several classes based on the problem they tackled, and perform a comprehensive analysis of these works.
TL;DR: A novel labeling scheme is introduced, called extended Dewey, which effectively extends the existing Dewey labeling scheme to combine the types and identifiers of elements in a label, and to avoid the scan of labels for internal query nodes to accelerate query processing (in I/O cost).
Abstract: Finding all the occurrences of a tree pattern in an XML database is a core operation for efficient evaluation of XML queries. The Dewey labeling scheme is commonly used to label an XML document to facilitate XML query processing by recording information on the path of an element. In order to improve the efficiency of XML tree pattern matching, we introduce a novel labeling scheme, called extended Dewey, which effectively extends the existing Dewey labeling scheme to combine the types and identifiers of elements in a label, and to avoid the scan of labels for internal query nodes to accelerate query processing (in I/O cost). Based on extended Dewey, we propose a series of holistic XML tree pattern matching algorithms. We first present TJFast to answer an XML twig pattern query. To efficiently answer a generalized XML tree pattern, we then propose GTJFast, an optimization that exploits the non-output nodes. In addition, we propose TJFastTL and GTJFastTL based on the tag+level data partition scheme to further reduce I/O costs by level pruning. Finally, we report our comprehensive experimental results to show that our set of XML tree pattern matching algorithms are superior to existing approaches in terms of the number of elements scanned, the size of intermediate results and query performance.
TL;DR: It is concluded that xml Schema validation with a hardened XML Schema is capable of fending XML Signature Wrapping attacks, but bears some pitfalls and disadvantages as well.
Abstract: In the context of security of Web Services, the XML Signature Wrapping attack technique has lately received increasing attention. Following a broad range of real-world exploits, general interest in applicable countermeasures rises. However, few approaches for countering these attacks have been investigated closely enough to make any claims about their effectiveness. In this paper, we analyze the effectiveness of the specific countermeasure of XML Schema validation in terms of fending Signature Wrapping attacks. We investigate the problems of XML Schema validation for Web Services messages, and discuss the approach of Schema Hardening, a technique for strengthening XML Schema declarations. We conclude that XML Schema validation with a hardened XML Schema is capable of fending XML Signature Wrapping attacks, but bears some pitfalls and disadvantages as well.
TL;DR: The ChuQL language incorporates records to support the key/value data model of MapReduce, leverages higher-order functions to provide clean semantics, and exploits side-effects to fully expose to XQuery developers the Hadoop framework.
Abstract: MapReduce/Hadoop has gained acceptance as a framework to process, transform, integrate, and analyze massive amounts of Web data on the Cloud The MapReduce model (simple, fault tolerant, data parallelism on elastic clouds of commodity servers) is also attractive for processing enterprise and scientific data Despite XML ubiquity, there is yet little support for XML processing on top of MapReduce In this paper, we describe ChuQL, a MapReduce extension to XQuery, with its corresponding Hadoop implementation The ChuQL language incorporates records to support the key/value data model of MapReduce, leverages higher-order functions to provide clean semantics, and exploits side-effects to fully expose to XQuery developers the Hadoop framework The ChuQL implementation distributes computation to multiple XQuery engines, providing developers with an expressive language to describe tasks over big data
TL;DR: Evaluation results based on the dataset from the ISO/IEC standardization of the vehicle to grid communication interface (V2G CI) prove the applicability of the generated XML-based Web services of restricted devices in terms of message size, performance, and code footprint.
Abstract: Embedded network programming remains a highly complex task for developers since unique characteristics of such networks have to be faced: one of them is the communication between a diversity of resource constraint nodes. Another one is the infrastructure dynamics. The widely-used standardized Web service technologies would perfectly meet such unique characteristics and ease the development of applications. Such technologies that enable, e.g., requesting or subscribing service data, however, process usually plain XML documents which are not suitable for small embedded devices with very limited resources. This is due to XML's verbosity, its bandwidth usage, and its associated processing overhead. The paper addresses these issues and describes an innovative and optimized source code generation technique by means of W3C's Efficient XML Interchange (EXI) format for developing XML-based Web services for the embedded domain. This offers developers a seamless use of the wide-spread service protocols in the embedded domain as well. Evaluation results based on the dataset from the ISO/IEC standardization of the vehicle to grid communication interface (V2G CI) prove the applicability of the generated XML-based Web services of restricted devices in terms of message size, performance, and code footprint.
TL;DR: An overview of the facilities of the XSUpdate language and of the Eχup system is provided to provide an insight into the functioning of this engine for processing schema modification and document adaptation statements.
Abstract: Data on the Web mostly are in XML format and the need often arises to update their structure, commonly described by an XML Schema. When a schema is modified the effects of the modification on documents need to be faced. XSUpdate is a language that allows to easily identify parts of an XML Schema, apply a modification primitive on them and finally define an adaptation for associated documents, while Eχup is the corresponding engine for processing schema modification and document adaptation statements. Purpose of this demonstration is to provide an overview of the facilities of the XSUpdate language and of the Eχup system.
TL;DR: An overview on existing research related to XML document/grammar comparison is provided, presenting the background and discussing the various techniques related to the problem, as well as discussing some prominent application domains.
Abstract: XML document comparison is becoming an ever more popular research issue due to the increasingly abundant use of XML. Likewise, a growing interest fosters the development of XML grammar matching and comparison, due to the proliferation of heterogeneous XML data sources, particularly on the Web. Nonetheless, the process of comparing XML documents with XML grammars, i.e., XML document and grammar similarity evaluation, has not yet received the attention it deserves. In this paper, we provide an overview on existing research related to XML document/grammar comparison, presenting the background and discussing the various techniques related to the problem. We also discuss some prominent application domains, ranging over document classification and clustering, document transformation, grammar evolution, selective dissemination of XML information, XML querying, as well as alert filtering in intrusion detection systems and Web Services matching and communications.
TL;DR: This work presents an algebraic approach for propagating source updates to XML materialized views expressed in a powerful XML tree pattern formalism and highlights the benefits of this approach over existing algorithms through a series of experiments.
Abstract: Materialized views can bring important performance benefits when querying XML documents. In the presence of XML document changes, materialized views need to be updated to faithfully reflect the changed document. In this work, we present an algebraic approach for propagating source updates to XML materialized views expressed in a powerful XML tree pattern formalism. Our approach differs from the state of the art in the area in two important ways. First, it relies on set-oriented, algebraic operations, to be contrasted with node-based previous approaches. Second, it exploits state-of-the-art features of XML stores and XML query evaluation engines, notably XML structural identifiers and associated structural join algorithms. We present algorithms for determining how updates should be propagated to views, and highlight the benefits of our approach over existing algorithms through a series of experiments.
TL;DR: This paper designs TwigTable algorithm to incorporate property and value information into query processing, and proposes three object-based optimization techniques to Twig table that can be correctly discovered in any XML data.
Abstract: In this paper, we demonstrate how the semantic information, such as value, property, object class and relationship between object classes in XML data impacts XML query processing. We show that the lack of using semantics causes different problems in value management and content search in existing approaches. Motivated on solving these problems, we propose a semantic approach for XML twig pattern query processing. In particular, we design TwigTable algorithm to incorporate property and value information into query processing. This information can be correctly discovered in any XML data. In addition, we propose three object-based optimization techniques to TwigTable. If more semantics of object classes are known in an XML document, we can process queries more efficiently with these semantic optimizations. Last, we show the benefits of our approach by a comprehensive experimental study.
TL;DR: This work focuses on XML data integration by studying rewritings of XML target schemas in terms of source schemas, and considers Visibly pushdown Automata (VPAs), which accept Visibly Pushdown Languages (VPLs), which are the basis of formalisms for specifying XML schemas.
TL;DR: In this paper, a hybrid navigation/streaming format for XML documents is proposed to allow efficient storage and processing of queries on the XML data that provides the benefits of both navigation and streaming and ameliorates the disadvantages of each.
Abstract: A method for storing XML documents a hybrid navigation/streaming format is provided to allow efficient storage and processing of queries on the XML data that provides the benefits of both navigation and streaming and ameliorates the disadvantages of each. Each XML document to be stored is independently analyzed to determine a combination of navigable and streamable storage format that optimizes the processing of the data for anticipated access patterns.
TL;DR: In this paper, an XML template having one or more nodes is received and mapping information indicating an association of data and nodes of the uploaded XML template is obtained. Once the mapping is received, the structure of the XML template was determined.
Abstract: An XML template having one or more nodes is received. Mapping information indicating an association of data and nodes of the uploaded XML template is obtained. Once the mapping is received, the structure of the XML template is determined. Based on the determined structure and the mapping provided, an XML based SQL query is generated. The generated SQL query can be executed to provide the XML document.
TL;DR: A new mapping method is developed to overcome the limitations the limitations and shows that it is efficient in terms of removing relation redundancy.
Abstract: The eXtensible Markup Language (XML) has recently emerged as a standard for data representation and interchange on the web. Based on its popularity used in most application, the critical issues are to store and to query XML data to exploit the full power of this technology. Since relational database is widely used technology for storing and querying, therefore replacing it with pure XML database is not a good choice and very expensive process. It is thus crucial to map XML data into relational data and this process is one that occurs frequently. Many existing methods exist in the literature, and defining what the best mapping method is explicitly important. The intention of this paper is to the existing mapping methods in terms of generating good relational schema. At the end a new mapping method is developed to overcome the limitations the limitations and shows that it is efficient in terms of removing relation redundancy.
TL;DR: A unified definition is presented, the key properties including validation of XML graphs against different XML schema languages are outlined, and a software package is provided that enables others to make use of these ideas.
TL;DR: It is shown that Visibly Pushdown Languages are closed under the defined language operators and this enables us to expand the schemas (for XML) in order to account for flexible or constrained evolution.
TL;DR: A notion of compactness is formally defined which allows for comparing documents and shows that the update-based method produces time-stamped XML documents that are more satisfactory wrt space-efficiency than the general method.
Abstract: The management of temporal data is a crucial issue in many applications. Recently, XML has become the standard for data exchange and representation. Consequently, important efforts have been made on the development of temporal extensions for XML. This paper investigates how to generate or maintain space-efficient time-stamped documents. We formally define a notion of compactness which allows for comparing documents. Then, we present two methods. For the first one, called general method, no restriction is made on the evolution of the XML documents whereas for the second one, called update-based method, changes are assumed to be specified by updates. For both methods, the issue is to enable processing very large documents, to use existing engines and to comply to Xquery Update Facility. The two methods are compared in terms of space-efficiency. The update-based method produces time-stamped XML documents that are more satisfactory wrt space-efficiency than the general method. This goes to show that the update-based method effectively takes advantage of the updates.
TL;DR: A conceptual model for XML data is exploited to generate SAWSDL enriched XML schemas, but mainly to automatically generate the so called Lifting and Lowering schema mappings in a form of XSLT scripts.
Abstract: With the introduction of the SAWSDL W3C recommendation, the possibility of enriching web service interfaces with semantic model references surfaced as a foundation for semantic web services. However, the recommendation says neither what the semantic model should be nor what to do with the actual XML data. In this paper, we exploit our conceptual model for XML data to generate SAWSDL enriched XML schemas, but mainly to automatically generate the so called Lifting and Lowering schema mappings in a form of XSLT scripts. These scripts can be used to transform the XML data produced by the web service into RDF data (lifting) and vice versa (lowering). In the RDF data state the data can be manipulated using a knowledge given by a corresponding ontology mapped to our model. Also the reasoning power granted by the ontology description can be exploited.
TL;DR: This work isolates a set of five requirements that must be fulfilled in order to have a faithful representation of the XML data-exchange problem by a relational translation, and demonstrates that these requirements naturally suggest the inlining technique for dataexchange tasks.
Abstract: We consider data exchange for XML documents: given source and target schemas, a mapping between them, and a document conforming to the source schema, construct a target document and answer target queries in a way that is consistent with source information. The problem has primarily been studied in the relational context, in which data-exchange systems have also been built. Since many XML documents are stored in relations, it is natural to consider using a relational system for XML data exchange. However, there is a complexity mismatch between query answering in relational and XML data exchange, which indicates that restrictions have to be imposed on XML schemas and mappings, and on XML shredding schemes, to make the use of relational systems possible. We isolate a set of five requirements that must be fulfilled in order to have a faithful representation of the XML data-exchange problem by a relational translation. We then demonstrate that these requirements naturally suggest the inlining technique for dataexchange tasks. Our key contribution is to provide shredding algorithms for schemas, documents, mappings and queries, and demonstrate that they enable us to correctly perform XML data-exchange tasks using a relational system.
TL;DR: The proposed access control labeling scheme supports the efficient processing of dynamic XML data, eliminating the need for re-labeling and secure query processing.
TL;DR: This paper proposes a direct parallel method to solve data-parallel XML parsing without pre-parsing, and proposes a non-synchronized splitter approach to the parallel XML querying using XPath expressions.
Abstract: Data-parallel XML parsing has a crucial problem in partitioning XML documents. Existing approaches need a pre-parse step to determine the partitions. In this paper, we propose a direct parallel method to solve this problem without pre-parsing. In the direct parallel method, we directly start the parallel parsing by finding the "light tower", which is a particular character with some exceptions, called clues. We handle the exceptions by watching the clues and reparsing the partition if it is required in the parsing stage. We also propose a non-synchronized splitter approach to the parallel XML querying using XPath expressions. In the non-synchronized splitter approach, we split an XPath expression into pieces to be executed by threads and we use a data structure, called the ancestor table, to help each thread handle its part of XPath expression independently without communications between threads. Our experiments show that our approach scales well from small sized files to huge sized files.
TL;DR: The XXACF (eXtensible Role-Based XML Access Control Framework) framework for controlling access to XML documents in different environments represents an improvement over the existing systems and enables defining context-sensitive access control policies on different priority and granularity levels.
Abstract: It is often the case that XML documents contain information of different
sensitivity degrees that must be selectively shared by user communities. This
paper presents the XXACF (eXtensible Role-Based XML Access Control Framework)
framework for controlling access to XML documents in different environments.
The proposed access control model of XXACF is described. The framework
represents an improvement over the existing systems and enables defining
context-sensitive access control policies on different priority and
granularity levels, the enforcement of access control for different
operations on XML documents, as well as different ways of access control
enforcement for the same operation.
TL;DR: This paper investigates how an attacker can still interfere with Web Services communication even in the presence of XML Signatures, and discusses the interrelation ofxml Signatures and XML Encryption, focussing on their security properties and expressiveness in different application scenarios.
Abstract: XML Signatures are used to protect XML-based Web Service communication against a broad range of attacks related to man-in-the-middle scenarios. However, due to the complexity of the Web Services specification landscape, the task of applying XML Signatures in a robust and reliable manner becomes more and more challenging. In this paper, we investigate this issue, describing how an attacker can still interfere with Web Services communication even in the presence of XML Signatures. Additionally, we discuss the interrelation of XML Signatures and XML Encryption, focussing on their security properties and expressiveness in different application scenarios.
TL;DR: An experimental study on an XML database shows that the proposed XML schemas provide high query performance on the relevant elements for the workload and, at the same time, low cost of data redundancy on elements that are not relevant for update operations.
Abstract: In general, the design of XML schemas involves translating conceptual schemas into XML schemas which aim to be: (i) normalized schemas, and (ii) connected structures in order to achieve good performance on queries. However, these requirements address a trade-off because highly connected XML structures allow data redundancy, and normalized schemas generate disconnected XML structures. This paper describes a workload-based approach which balances this trade-off on translating conceptual schemas into XML structures. An experimental study on an XML database shows that our XML schemas provide high query performance on the relevant elements for the workload and, at the same time, low cost of data redundancy on elements that are not relevant for update operations.
TL;DR: This paper presents the implementation of XML encryption utilizing Public Key Infrastructure (PKI) technology compliance with W3C's working draft for XML encryption.
Abstract: XML is the de-facto language of business transaction and widely used as a standard format to exchange electronic documents and messages. The most popular technology about the XML is the feature of structuring data and the XML based encryption in a natural way to handle complex requirement for securing XML data flow and exchange between applications. In this paper, we present the implementation of XML encryption utilizing Public Key Infrastructure (PKI) technology compliance with W3C's working draft for XML encryption.
TL;DR: An XML C source code representation for representing source code, one is for intra-file information, which consists of syntax structure, flow, and type information, and the other is for inter-file relation, which is cross-reference information.
Abstract: We propose an XML C source code representation to support developing CASE tools. Since source code is a main artifact of software development, most CASE tools have some features related to source code editor, static analyzer, profiler, etc. To develop such tools, detailed information related to source code is needed. However, it is quite difficult to reuse program analysis features because they do not have common interfaces even for parsing and data/control-flow analysis that are most common features for such CASE tools. To address this issue, we focus on XML as an intermediate representation for source code information. Existing XML representations only represent structure of syntax trees and lack some important information required for CASE tools. We propose two models for representing source code, one is for intra-file information, which consists of syntax structure, flow, and type information, the other is for inter-file relation, which is cross-reference information. We also introduce CASE tools with our representation and demonstrate the efficacy in CASE tool development. To evaluate the efficacy, we show that a coding rule checker and a cross-referencer can be easily implemented using common XML processing libraries such as XSLT and XPath.