TL;DR: In this paper, a method to convert data between a relational format and an XML document, by creating a set of XML Mapping Definition from metadata, selecting relational data from a relational application database, and converting the relational data to the XML document using the set of mapping definitions, is presented.
Abstract: A method to convert data between a relational format and an XML document, by creating a set of XML Mapping Definition from metadata; selecting relational data from a relational application database, and converting the relational data to the XML document using the set of XML Mapping Definition.
TL;DR: An XML normal form, XNF, is defined that avoids update anomalies and redundancies in DTDs, and is presented as a lossless algorithm for converting any DTD into one in XNF.
Abstract: This article takes a first step towards the design and normalization theory for XML documents. We show that, like relational databases, XML documents may contain redundant information, and may be prone to update anomalies. Furthermore, such problems are caused by certain functional dependencies among paths in the document. Our goal is to find a way of converting an arbitrary DTD into a well-designed one, that avoids these problems. We first introduce the concept of a functional dependency for XML, and define its semantics via a relational representation of XML. We then define an XML normal form, XNF, that avoids update anomalies and redundancies. We study its properties, and show that XNF generalizes BCNF; we also discuss the relationship between XNF and normal forms for nested relations. Finally, we present a lossless algorithm for converting any DTD into one in XNF.
TL;DR: This work proposes a new labeling scheme that take advantage of the unique property of prime numbers to meet the need for efficient support to order-sensitive queries and updates of XML queries.
Abstract: Efficient evaluation of XML queries requires the determination of whether a relationship exists between two elements. A number of labeling schemes have been designed to label the element nodes such that the relationships between nodes can be easily determined by comparing their labels. With the increased popularity of XML on the Web, finding a labeling scheme that is able to support order-sensitive queries in the presence of dynamic updates becomes urgent. We propose a new labeling scheme that take advantage of the unique property of prime numbers to meet this need. The global order of the nodes can be captured by generating simultaneous congruence values from the prime number node labels. Theoretical analysis of the label size requirements for the various labeling schemes is given. Experiment results indicate that the prime number labeling scheme is compact compared to existing dynamic labeling schemes, and provides efficient support to order-sensitive queries and updates.
TL;DR: This work proposes a hierarchical algorithm (S-GRACE) for clustering XML documents based on structural information in the data, and proposes a computationally efficient distance metric defined between documents and sets of documents using the notion of structure graph (s-graph).
Abstract: With the standardization of XML as an information exchange language over the Internet, a huge amount of information is formatted in XML documents. In order to analyze this information efficiently, decomposing the XML documents and storing them in relational tables is a popular practice. However, query processing becomes expensive since, in many cases, an excessive number of joins is required to recover information from the fragmented data. If a collection consists of documents with different structures (for example, they come from different DTDs), mining clusters in the documents could alleviate the fragmentation problem. We propose a hierarchical algorithm (S-GRACE) for clustering XML documents based on structural information in the data. The notion of structure graph (s-graph) is proposed, supporting a computationally efficient distance metric defined between documents and sets of documents. This simple metric yields our new clustering algorithm which is efficient and effective, compared to other approaches based on tree-edit distance. Experiments on real data show that our algorithm can discover clusters not easily identified by manual inspection.
TL;DR: In this article, techniques for changing data for an XML construct in an SQL/XML compliant database management system (DBMS) are provided for modifying content for the component without modifying the entire instance.
Abstract: Techniques are provided for changing data for an XML construct in an SQL/XML compliant database management system (DBMS). The DBMS allows instances of XML type to represent XML constructs, such as XML documents, XML elements, XML attributes, and fragments of XML documents. An SQL statement is received that includes an XML operator that operates on a particular component in an instance of XML type. During execution of the SQL statement, the XML operator is evaluated by modifying content for the component without modifying the entire instance. For example, an XML delete operator deletes the particular component from the instance. Other XML operators include an insert operator, an insert-before operator, an append-child operator, and an update operator. During execution, these operators may be rewritten to operate on existing SQL constructs, or evaluated by updating only some of the existing SQL constructs, or both.
TL;DR: The matching algorithm is exploited for the classification of XML documents against a set of DTDs, the evolution of the DTD structure, the evaluation of structural queries, the selective dissemination ofxml documents, and the protection of XML document contents.
TL;DR: This paper advocates a pre-processing method called QFilter that uses Non-deterministic Finite Automata (NFA) to rewrite user's query such that any parts violating access control rules are pruned.
Abstract: At present, most of the state-of-the-art solutions for XML access controls are either (1) document-level access control techniques that are too limited to support fine-grained security enforcement; (2) view-based approaches that are often expensive to create and maintain; or (3) impractical proposals that require substantial security-related support from underlying XML databases. In this paper, we take a different approach that assumes no security support from underlying XML databases and examine three alternative fine-grained XML access control solutions, namely primitive, pre-processing and post-processing approaches. In particular, we advocate a pre-processing method called QFilter that uses Non-deterministic Finite Automata (NFA) to rewrite user's query such that any parts violating access control rules are pruned. We show the construction and execution of a QFilter and demonstrate its superiority to other competing methods.
TL;DR: This paper formalizes the problem of query evaluation on Active XML documents, and provides algorithms to solve it, and presents an implementation that is compliant with XML and Web services standards, and is used as part of the ActiveXML system.
Abstract: In this paper, we study query evaluation on Active XML documents (AXML for short), a new generation of XML documents that has recently gained popularity. AXML documents are XML documents whose content is given partly extensionally, by explicit data elements, and partly intensionally, by embedded calls to Web services, which can be invoked to generate data.A major challenge in the efficient evaluation of queries over such documents is to detect which calls may bring data that is relevant for the query execution, and to avoid the materialization of irrelevant information. The problem is intricate, as service calls may be embedded anywhere in the document, and service invocations possibly return data containing calls to new services. Hence, the detection of relevant calls becomes a continuous process. Also, a good analysis must take the service signatures into consideration.We formalize the problem, and provide algorithms to solve it. We also present an implementation that is compliant with XML and Web services standards, and is used as part of the ActiveXML system. Finally, we experimentally measure the performance gains obtained by a careful filtering of the service calls to be triggered.
TL;DR: The most up to date, comprehensive, and practical guide to Web services security, and the first to cover the final release of new standards SAML 1.1 and WS-Security.
Abstract: The most up to date, comprehensive, and practical guide to Web services security, and the first to cover the final release of new standards SAML 1.1 and WS-Security. Comprehensive coverage and practical examples of the industry standards XML Signature and XML Encryption, and the first book to cover the final WS-Security and SAML 1.1 specifications. Authors Jothy Rosenberg and David Remy are security experts who co-founded GeoTrust, the #2 Web site certificate authority and currently work for Service Integrity and BEA Systems, respectively. According to IBM, American Express, Sun Microsystems, and other industry leaders, well-defined security standards and procedures are a crucial element to the adoption of web services in industry.
TL;DR: XACT is introduced, a high-level approach for Java using XML templates as a first-class data type with operations for manipulating XML values based on XPath, which permits static type checking using DTD schemas as types.
Abstract: XML documents generated dynamically by programs are typically represented as text strings or DOM trees. This is a low-level approach for several reasons: 1) traversing and modifying such structures can be tedious and error prone, 2) although schema languages, e.g., DTD, allow classes of XML documents to be defined, there are generally no automatic mechanisms for statically checking that a program transforms from one class to another as intended. We introduce XACT, a high-level approach for Java using XML templates as a first-class data type with operations for manipulating XML values based on XPath. In addition to an efficient runtime representation, the data type permits static type checking using DTD schemas as types. By specifying schemes for the input and output of a program, our analysis algorithm will statically verify that valid input data is always transformed into valid output data and that the operations are used consistently.
TL;DR: A fast and efficient indexing technique for XML documents is discussed, and the XML graph numbering scheme can be used for indexing and securing graph structure of XML documents, providing an efficient method to speed up XML data processing.
Abstract: XML (extensible markup language) is becoming the current standard for establishing interoperability on the Web. XML data are self-descriptive and syntax-extensible; this makes it very suitable for representation and exchange of semi-structured data, and allows users to define new elements for their specific applications. As a result, the number of documents incorporating this standard is continuously increasing over the Web. The processing of XML documents may require a traversal of all document structure and therefore, the cost could be very high. A strong demand for a means of efficient and effective XML processing has posed a new challenge for the database world. This paper discusses a fast and efficient indexing technique for XML documents, and introduces the XML graph numbering scheme. It can be used for indexing and securing graph structure of XML documents. This technique provides an efficient method to speed up XML data processing. Furthermore, the paper explores the classification of existing methods impact of query processing, and indexing.
TL;DR: ShreX is the first comprehensive and end-to-end solution to the relational storage of XML data and supports all the mapping strategies proposed in the literature, but also new useful strategies that had not been considered previously.
Abstract: The use of relational database management systems (RDBMSs) to store and query XML data has attracted considerable interest with a view to leveraging their powerful and reliable data management services. Due to the mismatch between the relational and XML data models, it is necessary to first shred and load the XML data into relational tables, and then btranslate XML queries over the original data into equivalent SQL queries over the mapped tables. Although there is a rich literature on XML-relational storage, none of the existing solutions addresses all the storage problems in a single framework. Works on mapping strategies often have little or no details about query translation, and proposals for query translation often target a specific mapping strategy. XML-storage solutions provided by RDBMS also have limitations. Notably, they are tied to a specific backend and use proprietary mapping languages, which not only may require a steep learning curve, but often are unable to express certain desirable mappings.In order to address these limitations, we developed ShreX, a XML-to-relational mapping framework and system that provides the first comprehensive and end-to-end solution to the relational storage of XML data. Mappings in ShreX are defined through annotations to an XML Schema. The use of XML Schema simplifies the mapping process, since it does not require users to master a new specialized mapping language. The use of annotations allows mapping choices to be combined in many different ways. As a result, ShreX not only supports all the mapping strategies proposed in the literature, but also new useful strategies that had not been considered previously. ShreX provides generic (and automatic) document shredding and query translation capabilities; and it is portable --- its mapping specifications are independent of the database backend.
TL;DR: A deeper understanding of the performance impacts of binary XML encodings is provided in order to clarify the ongoing and often contentious debate over their merits, particularly in the domain of high performance XML stream processing.
Abstract: This paper provides an objective evaluation of the performance impacts of binary XML encodings, using a fast stream-based XQuery processor as our representative application. Instead of proposing one binary format and comparing it against standard XML parsers, we investigate the individual effects of several binary encoding techniques that are shared by many proposals. Our goal is to provide a deeper understanding of the performance impacts of binary XML encodings in order to clarify the ongoing and often contentious debate over their merits, particularly in the domain of high performance XML stream processing.
TL;DR: The paper describes an approach to easily conduct analysis of source-code differences using meta-differencing to reflect the fact that additional knowledge of the differences can be automatically derived.
Abstract: The paper describes an approach to easily conduct analysis of source-code differences The approach is termed meta-differencing to reflect the fact that additional knowledge of the differences can be automatically derived Meta-differencing is supported by an underlying source-code representation developed by the authors The representation, srcML, is an XML format that explicitly embeds abstract syntax within the source code while preserving the documentary structure as dictated by the developer XML tools are leveraged together with standard differencing utilities (ie, diff,) to generate a meta-difference The meta-difference is also represented in an XML format called srcDiff The meta-difference contains specific syntactic information regarding the source-code changes In turn this can be queried and searched with XML tools for the purpose of extracting information about the specifics of the changes A case study of using the meta-differencing approach on an open-source system is presented to demonstrate its usefulness and validity
TL;DR: In this article, a method, mechanism, and computer program product for storing, accessing, and managing XML data is described, which is applicable to all database systems and other servers which support storing and managing XOR content.
Abstract: A method, mechanism, and computer program product for storing, accessing, and managing XML data is disclosed. The approach supports efficient evaluation of XPath queries and also improves the performance of data/fragment extraction. The approach can be applied to schema-less documents. The approach is applicable to all database systems and other servers which support storing and managing XML content. In addition, the approach can be applied to store, manage, and retrieve other types of unstructured or semi-structured data in a database system.
TL;DR: This paper introduces a new numbering scheme called DLN (Dynamic Level Numbering) and several variants of it that focuses on efficient insert operations, support of streamed data and fast retrieval of document fragments.
Abstract: Relational database systems are increasingly used to manage XML documents, especially for data-centric XML. In this paper we present a new approach to efficiently manage document-centric XML data based on a generic relational mapping. Such a generic XML storage is especially useful in data integration systems to manage highly diverse XML documents. We focus on efficient insert operations, support of streamed data and fast retrieval of document fragments. Therefore we introduce a new numbering scheme called DLN (Dynamic Level Numbering) and several variants of it. A performance evaluation based on a prototypical implementation demonstrates the high efficiency of DLN.
TL;DR: This work presents a generic algorithm to translate path expression queries into SQL in the presence of recursion in the schema and queries, which handles a general class of XML-to-relational mappings, which includes all techniques proposed in literature.
Abstract: We consider the problem of translating XML queries into SQL when XML documents have been stored in an RDBMS using a schema-based relational decomposition. Surprisingly, there is no published XML-to-SQL query translation algorithm for this scenario that handles recursive XML schemas. We present a generic algorithm to translate path expression queries into SQL in the presence of recursion in the schema and queries. This algorithm handles a general class of XML-to-relational mappings, which includes all techniques proposed in literature. Some of the salient features of this algorithm are: (i) It translates a path expression query into a single SQL query, irrespective of how complex the XML schema is, (ii) It uses the "with" clause in SQL99 to handle recursive queries even over nonrecursive schemas, (iii) It reconstructs recursive XML subtrees with a single SQL query and (iv) It shows that the support for linear recursion in SQL99 is sufficient for handling path expression queries over arbitrarily complex recursive XML schema.
TL;DR: This paper proposes an efficient client-based evaluator of access control rules for regulating access to XML documents that takes benefit from a dedicated index to quickly converge towards the authorized parts of a - potentially streaming - document.
Abstract: The erosion of trust put in traditional database servers and in Database Service Providers, the growing interest for different forms of data dissemination and the concern for protecting children from suspicious Internet content are different factors that lead to move the access control from servers to clients. Several encryption schemes can be used to serve this purpose but all suffer from a static way of sharing data. With the emergence of hardware and software security elements on client devices, more dynamic client-based access control schemes can be devised. This paper proposes an efficient client-based evaluator of access control rules for regulating access to XML documents. This evaluator takes benefit from a dedicated index to quickly converge towards the authorized parts of a - potentially streaming - document. Additional security mecanisms guarantee that prohibited data can never be disclosed during the processing and that the input document is protected from any form of tampering. Experiments on synthetic and real datasets demonstrate the effectiveness of the approach.
TL;DR: In this paper, the authors consider the problem of specifying and verifying cryptographic security protocols for XML web services and propose an approach to the specification and verification of security protocols based on a faithful account of the XML wire format.
Abstract: We consider the problem of specifying and verifying cryptographic security protocols for XML web services. The security specification WS-Security describes a range of XML security tokens, such as username tokens, public-key certificates, and digital signature blocks, amounting to a flexible vocabulary for expressing protocols. To describe the syntax of these tokens, we extend the usual XML data model with symbolic representations of cryptographic values. We use predicates on this data model to describe the semantics of security tokens and of sample protocols distributed with the Microsoft WSE implementation of WS-Security. By embedding our data model within Abadi and Fournet's applied pi calculus, we formulate and prove security properties with respect to the standard Dolev-Yao threat model. Moreover, we informally discuss issues not addressed by the formal model. To the best of our knowledge, this is the first approach to the specification and verification of security protocols based on a faithful account of the XML wire format.
TL;DR: The use of XML types can allow the combination of XML-and Java-type systems, which overcomes many deficiencies in existing marshaling and unmarshaling systems by translating XML schemas which define XML data in an XML document into XML types in Java.
Abstract: The use of XML types can allow the combination of XML- and Java-type systems, which overcomes many deficiencies in existing marshaling and unmarshaling systems by translating XML schemas which define XML data in an XML document into XML types in Java. Unlike traditional attempts at translating between XML and Java, XML schemas realized as XML types can remain fully faithful to the XML, and are capable of a number of XML data operations. In addition, the XML types can be easily transformed among themselves and Java types, and a lightweight store retaining XML information at tag level allows incremental XML marshaling and unmarshaling. This description is not intended to be a complete description of, or limit the scope of, the invention. Other features, aspects, and objects of the invention can be obtained from a review of the specification, the figures, and the claims.
TL;DR: An implementation of these algorithms that is independent of, and can be customized for different storage mechanisms for XML, is discussed, and extensive experimental results showing that the approach is highly efficient and scalable are presented.
Abstract: We discuss incremental validation of XML documents with respect to DTDs and XML schema definitions. We consider insertions and deletions of subtrees, as opposed to leaf nodes only, and we also consider the validation of ID and IDREF attributes. For arbitrary schemas, we give a worst-case n log n time and linear space algorithm, and show that it often is far superior to revalidation from scratch. We present two classes of schemas, which capture most real-life DTDs, and show that they admit a logarithmic time incremental validation algorithm that, in many cases, requires only constant auxiliary space. We then discuss an implementation of these algorithms that is independent of, and can be customized for different storage mechanisms for XML. Finally, we present extensive experimental results showing that our approach is highly efficient and scalable.
TL;DR: In this paper, a method, apparatus, and article of manufacture for processing an extensible markup language (XML) script using an XML based scripting language are provided, and the XML script is parsed.
Abstract: Various embodiments of a method, apparatus, and article of manufacture for processing an extensible markup language (XML) script using an XML based scripting language are provided. The XML script is parsed. The XML script comprises element nodes. Each element node comprises a component name. A first element node comprises a first component name referencing a first user-defined component. An argument is passed to the first user-defined component. The argument is evaluated when an element node associated with the first user-defined component comprises an evaluate-component name that explicitly specifies that the argument be evaluated.
TL;DR: ShreX is a freely-available system for shredding, loading and querying XML documents in relational databases that provides generic (mapping-independent) functions for loading shredded documents into relations and for translating XML queries into SQL.
Abstract: We describe ShreX, a freely-available system for shredding, loading and querying XML documents in relational databases. ShreX supports all mapping strategies proposed in the literature as well as strategies available in commercial RDBMSs. It provides generic (mapping-independent) functions for loading shredded documents into relations and for translating XML queries into SQL. ShreX is portable and can be used with any relational database backend.
TL;DR: In this paper, a word processor including a native XML file format is provided, and a well formed XML file fully represents the word-processor document, and fully supports 100% of the wordprocessor's rich formatting.
Abstract: A word processor including a native XML file format is provided. The well formed XML file fully represents the word-processor document, and fully supports 100% of word-processor's rich formatting. There are no feature losses when saving the word-processor documents as XML. A published XSD file defines all the rules behind the word-processor's XML file format. Hints may be provided within the XML associated files providing applications that understand XML a shortcut to understanding some of the features provided by the word-processor. The word-processing document is stored in a single XML file. Additionally, manipulation of word-processing documents may be done on computing devices that do not include the word-processor itself.
TL;DR: This paper describes a validating XML parsing method based on deterministic finite state automata (DFA) that supports the implementation of high-performance Web services.
Abstract: This paper describes a validating XML parsing method based on deterministic finite state automata (DFA). XML parsing and validation is performed by a schema-specific XML parser that encodes the admissible parsing states as a DFA. This DFA is automatically constructed from the XML schemas of XML messages using a code generator. A twolevel DFA architecture is used to increase efficiency and to reduce the generated code size. The lower-level DFA efficiently parses syntactically well-formed XML messages. The higher-level DFA validates the messages and produces application events associated with transitions in the DFA. Two example case studies are presented and performance results are given to demonstrate that the approach supports the implementation of high-performance Web services.
TL;DR: In this article, a validator exposes public APIs that allow such validation-time requests from an event handler that is associated with an external application and that is registered with the XML stream.
Abstract: An XML processing model enables applications that use an XML stream to perform metadata-based or other processing of data during a data validation operation while preserving a streaming processing model. For example, while an XML node is being validated, requests can be received regarding the status of the validation and any processing that may be required with the node in order to conform it to requirements of an external application. A validator exposes public APIs that allow such validation-time requests from an event handler that is associated with an external application and that is registered with the XML stream. Messages that identify schema annotation definitions are provided to an external application to direct the type of processing to be performed on nodes at application runtime. Thus, applications can process a node according to the annotation definition concurrently with validation of the given node by the validator.
TL;DR: In this article, the concept of renderers and translators is introduced in connection with bidirectional conversion between object models and XML, which can be shared and reused by any and all renderer implementations.
Abstract: The concept of “renderers” and “translators” is introduced in connection with bidirectional conversion between object models and XML. A renderer embodies the logic responsible for mediating the parser specific APIs for reading and writing XML. It utilizes a plurality of translator objects, which embody the mapping information needed to convert the XML into object model instances. The translator objects themselves do not contain “knowledge” of parser implementations; thus, the translators are common and can be shared and reused by any and all renderer implementations. Since each translator embodies the knowledge and rules regarding how to convert an XML model to an object model, and how to convert object models to XML, it is thus independent of the particular renderer that is being used, whether it be SAX, DOM, or some other renderer.
TL;DR: In this article, an Extensible Mark-up Language (XML) schema is used to generate configuration settings files for a wide area network (WAN) configuration schema defines an XML file for configuring a WAN device.
Abstract: An Extensible Mark-up Language (XML) schema is used to generate configuration settings files. A wireless configuration XML schema defines an XML file for configuring wireless network settings on a wireless device. A wide area network (WAN) configuration schema defines an XML file for configuring a WAN device. A local area network (LAN) configuration schema defines an XML file for configuring a LAN device. A broadband modem configuration schema defines an XML file for configuring a broadband modem device. A device configuration schema defines an XML file for reporting the configuration of a device.
TL;DR: In this article, the authors present a system for management of XML data stored in a hierarchical format such as, e.g., a relational database, when the data is retrieved and manipulated using a schema-driven format, such as XML, where each logical unit in at least one of the original XML data or the copy of the XML data is annotated in a manner that uniquely identifies each unit.
Abstract: Systems, methods, and computer program products for management of data that is stored in a hierarchical format such as, e.g., a relational database, when the data is retrieved and manipulated using a schema-driven format such as, e.g., XML are disclosed. In one implementation a copy of the XML data retrieved from the database and is generated and each logical unit in at least one of the original XML data or the copy of the XML data is annotated in a manner that uniquely identifies each logical unit. For example, each XML node may be assigned a unique numerical or string identifier. As the data is manipulated, algorithms may be implemented to use the annotations to track changes to the XML data and to ensure that the manipulated XML data complies with one or more required data formats. When the XML data is ready to be transferred back to the database(s) from which it was obtained, a series of operations are implemented to validate the data and to determine the nature of operation to be performed to restore the data to the databases.
TL;DR: This work proposes combining role-based access control as found in the Role Graph Model, with a methodology originally designed for object-oriented databases, to provide a general access control methodology for parts of XML documents.
Abstract: In order to provide a general access control methodology for parts of XML documents, we propose combining role-based access control as found in the Role Graph Model, with a methodology originally designed for object-oriented databases. We give a description of the methodology, showing how different access modes, XPath expressions and roles can be combined, and how propagation of permissions is handled. Given this general approach, a system developer can design a complex authorization model for collections of XML documents.