TL;DR: A technique is presented that allows to represent the tree structure of an XML document in an efficient way by “compressing” their tree structure, which allows to directly execute queries without prior decompression.
Abstract: Implementations that load XML documents and give access to them via, e.g., the DOM, suffer from huge memory demands: the space needed to load an XML document is usually many times larger than the size of the document. A considerable amount of memory is needed to store the tree structure of the XML document. Here a technique is presented that allows to represent the tree structure of an XML document in an efficient way. The representation exploits the high regularity in XML documents by “compressing” their tree structure; the latter means to detect and remove repetitions of tree patterns. The functionality of basic tree operations, like traversal along edges, is preserved in the compressed representation. This allows to directly execute queries (and in particular, bulk operations) without prior decompression. For certain tasks like validation against an XML type or checking equality of documents, the representation allows for provably more efficient algorithms than those running on conventional representations.
TL;DR: This paper starts looking into the basic properties of XML data exchange, that is, restructuring of XML documents that conform to a source DTD under a target DTD, and answering queries written over the target schema, and proves a dichotomy theorem that classifies data exchange settings into those over which query answering is tractable, and those overWhich it is coNP-complete.
Abstract: Data exchange is the problem of finding an instance of a target schema, given an instance of a source schema and a specification of the relationship between the source and the target. Theoretical foundations of data exchange have recently been investigated for relational data.In this paper, we start looking into the basic properties of XML data exchange, that is, restructuring of XML documents that conform to a source DTD under a target DTD, and answering queries written over the target schema. We define XML data exchange settings in which source-to-target dependencies refer to the hierarchical structure of the data. Combining DTDs and dependencies makes some XML data exchange settings inconsistent. We investigate the consistency problem and determine its exact complexity.We then move to query answering, and prove a dichotomy theorem that classifies data exchange settings into those over which query answering is tractable, and those over which it is coNP-complete, depending on classes of regular expressions used in DTDs. Furthermore, for all tractable cases we give polynomial-time algorithms that compute target XML documents over which queries can be answered.
TL;DR: This paper develops a method to perform holistic twig pattern matching on XML documents partitioned using various streaming schemes and can process a large class of twig patterns consisting of both ancestor-descendant and parent-child relationships and avoid generating redundant intermediate results.
Abstract: Searching for all occurrences of a twig pattern in an XML document is an important operation in XML query processing. Recently a holistic method TwigStack. [2] has been proposed. The method avoids generating large intermediate results which do not contribute to the final answer and is CPU and I/O optimal when twig patterns only have ancestor-descendant relationships. Another important direction of XML query processing is to build structural indexes [3][8][13][15] over XML documents to avoid unnecessary scanning of source documents. We regard XML structural indexing as a technique to partition XML documents and call it streaming scheme in our paper. In this paper we develop a method to perform holistic twig pattern matching on XML documents partitioned using various streaming schemes. Our method avoids unnecessary scanning of irrelevant portion of XML documents. More importantly, depending on different streaming schemes used, it can process a large class of twig patterns consisting of both ancestor-descendant and parent-child relationships and avoid generating redundant intermediate results. Our experiments demonstrate the applicability and the performance advantages of our approach.
TL;DR: The TurboXPath path processor is proposed, which accepts a language equivalent to a subset of the for-let-where constructs of XQuery over a single document, and can be extended to provide full XQuery support or used to augment federated database engines for efficient handling of queries over XML data streams produced by external sources.
Abstract: Efficient querying of XML streams will be one of the fundamental features of next-generation information systems. In this paper we propose the TurboXPath path processor, which accepts a language equivalent to a subset of the for-let-where constructs of XQuery over a single document. TurboXPath can be extended to provide full XQuery support or used to augment federated database engines for efficient handling of queries over XML data streams produced by external sources. Internally, TurboXPath uses a tree-shaped path expression with multiple outputs to drive the execution. The result of a query execution is a sequence of tuples of XML fragments matching the output nodes. Based on a streamed execution model, TurboXPath scales up to large documents and has limited memory consumption for increased concurrency. Experimental evaluation of a prototype demonstrates performance gains compared to other state-of-the-art path processors.
TL;DR: In this paper, a method and apparatus for translating queries such as path expressions and SQL/XML constructs into SQL statements to be executed against an XML index, which improves processor time as opposed to applying path expressions directly to the original XML documents to extract the desired information.
Abstract: A method and apparatus is provided for translating queries, such as path expressions and SQL/XML constructs, into SQL statements to be executed against an XML index, which improves processor time as opposed to applying path expressions directly to the original XML documents to extract the desired information. Simple path expressions, filter expressions, descendant axes, wildcards, logical expressions, relational expressions, literals, and other path expressions are all translated into SQL for efficient querying of an XML index. Similarly, rules for translating SQL/XML constructs into SQL are provided.
TL;DR: A Universal Model for XML Information Retrieval and a Relevance Propagation Method for Adhoc and Heterogeneous Tracks at INEX 2004.
Abstract: Overview of INEX 2004.- Overview of INEX 2004.- Methodology.- Narrowed Extended XPath I (NEXI).- NEXI, Now and Next.- If INEX Is the Answer, What Is the Question?.- Reliability Tests for the XCG and inex-2002 Metrics.- Ad Hoc Retrieval.- Component Ranking and Automatic Query Refinement for XML Retrieval.- MultiText Experiments for INEX 2004.- Logic-Based XML Information Retrieval for Determining the Best Element to Retrieve.- An Algebra for Structured Queries in Bayesian Networks.- IR of XML Documents - A Collective Ranking Strategy.- TRIX 2004 - Struggling with the Overlap.- The Utrecht Blend: Basic Ingredients for an XML Retrieval System.- Hybrid XML Retrieval Revisited.- Analyzing the Properties of XML Fragments Decomposed from the INEX Document Collection.- A Voting Method for XML Retrieval.- Mixture Models, Overlap, and Structural Hints in XML Element Retrieval.- GPX - Gardens Point XML Information Retrieval at INEX 2004.- Hierarchical Language Models for XML Component Retrieval.- Ranked Retrieval of Structured Documents with the S-Term Vector Space Model.- Merging XML Indices.- DocBase - The INEX Evaluation Experience.- Ad Hoc Retrieval and Relevance Feedback.- TIJAH at INEX 2004 Modeling Phrases and Relevance Feedback.- Flexible Retrieval Based on the Vector Space Model.- Relevance Feedback.- Relevance Feedback for XML Retrieval.- Ad Hoc Retrieval and Heterogeneous Document Collection.- A Universal Model for XML Information Retrieval.- Cheshire II at INEX '04: Fusion and Feedback for the Adhoc and Heterogeneous Tracks.- Using a Relevance Propagation Method for Adhoc and Heterogeneous Tracks at INEX 2004.- Heterogeneous Document Collection.- Building and Experimenting with a Heterogeneous Collection.- A Test Platform for the INEX Heterogeneous Track.- EXTIRP 2004: Towards Heterogeneity.- Natural Language Processing of Topics.- NLPX at INEX 2004.- Analysing Natural Language Queries at INEX 2004.- Interactive Studies.- The Interactive Track at INEX 2004.- Interactive Searching Behavior with Structured XML Documents.
TL;DR: In this paper, the authors present techniques, systems and apparatus for automatically generating schema using an initial documents constructed in an XML compatible format, which can be implemented as software operating on a computer system, as a computer module, as computer program product and as a series of related devices and products.
Abstract: Techniques, systems and apparatus for automatically generating schema using an initial documents constructed in an XML compatible format are disclosed. A method involves providing an initial XML document that and analyzing the XML document to identify the XML data structures in the document and generating a data framework that corresponds to the data structure of the XML document. The data items of the initial XML document are analyzed to determine data constraints based on the data items of the initial XML. Schema are then generated based on the data framework generated and the data constraints determined from the raw xml data. These principles can be implemented as software operating on a computer system, as a computer module, as a computer program product and as a series of related devices and products.
TL;DR: This paper proposes an architecture and adaptive algorithms for efficiently computing top-k matches to XML queries that can be used to evaluate both exact and approximate matches where approximation is defined by relaxing XPath axes.
Abstract: The ability to compute top-k matches to XML queries is gaining importance due to the increasing number of large XML repositories. The efficiency of top-k query evaluation relies on using scores to prune irrelevant answers as early as possible in the evaluation process. In this context, evaluating the same query plan for all answers might be too rigid because, at any time in the evaluation, answers have gone through the same number and sequence of operations, which limits the speed at which scores grow. Therefore, adaptive query processing that permits different plans for different partial matches and maximizes the best scores is more appropriate. In this paper, we propose an architecture and adaptive algorithms for efficiently computing top-k matches to XML queries. Our techniques can be used to evaluate both exact and approximate matches where approximation is defined by relaxing XPath axes. In order to compute the scores of query answers, we extend the traditional tf*idf measure to account for document structure. We conduct extensive experiments on a variety of benchmark data and queries, and demonstrate the usefulness of the adaptive approach for computing top-k queries in XML.
TL;DR: The problem of query equivalence is addressed with respect to this transformation, and a performance-oriented principle for sequencing tree structures is introduced to guide the sequencing of tree structures.
Abstract: Sequence-based XML indexing aims at avoiding expensive join operations in query processing. It transforms structured XML data into sequences so that a structured query can be answered holistically through subsequence matching. In this paper, we address the problem of query equivalence with respect to this transformation, and we introduce a performance-oriented principle for sequencing tree structures. With query equivalence, XML queries can be performed through subsequence matching without join operations, post-processing, or other special handling for problems such as false alarms. We identify a class of sequencing methods for this purpose, and we present a novel subsequence matching algorithm that observe query equivalence. Still, query equivalence is just a prerequisite for sequence-based XML indexing. Our goal is to find the best sequencing strategy with regard to the time and space complexity in indexing and querying XML data. To this end, we introduce a performance-oriented principle to guide the sequencing of tree structures. For any given XML data set, the principle finds an optimal sequencing strategy according to its schema and its data distribution. We present a novel method that realizes this principle. In our experiments, we show the advantages of sequence-based indexing over traditional XML indexing methods, and we compare several sequencing strategies and demonstrate the benefit of the performance-oriented sequencing principle.
TL;DR: The HopI index as discussed by the authors is a connection index for XML documents based on the concept of a 2-hop cover, which provides space and time-efficient reachability tests along the ancestor, descendant, and link axes to support path expressions with wildcards in XML search engines.
Abstract: The HOPI index, a connection index for XML documents based on the concept of a 2-hop cover, provides space- and time-efficient reachability tests along the ancestor, descendant, and link axes to support path expressions with wildcards in XML search engines. This paper presents enhanced algorithms for building HOPI, shows how to augment the index with distance information, and discusses incremental index maintenance. Our experiments show substantial improvements over the existing divide-and-conquer algorithm for index creation, low space overhead for including distance information in the index, and efficient updates.
TL;DR: The proposed methodology on building XML data warehouses covers processes including data cleaning and integration, summarization, intermediate XML documents, and updating/linking existing documents and creating fact tables, and utilise the XQuery technology in all of the above processes.
Abstract: Developing a data warehouse for XML documents involves two major processes: one of creating it, by processing XML raw documents into a specified data warehouse repository; and the other of querying it, by applying techniques to better answer users’ queries. This paper focuses on the first part; that is identifying a systematic approach for building a data warehouse of XML documents, specifically for transferring data from an underlying XML database into a defined XML data warehouse. The proposed methodology on building XML data warehouses covers processes including data cleaning and integration, summarization, intermediate XML documents, and updating/linking existing documents and creating fact tables. In this paper, we also present a case study on how to put this methodology into practice. We utilise the XQuery technology in all of the above processes.
TL;DR: This paper focuses on the development of a model for a transfer function of the HITS Algorithm for XML Retrieval, which automates the very labor-intensive and therefore time-heavy and expensive process of manually cataloging and decrypting XML documents.
Abstract: Methodology.- Overview of INEX 2006.- The Wikipedia XML Corpus.- INEX 2006 Evaluation Measures.- Choosing an Ideal Recall-Base for the Evaluation of the Focused Task: Sensitivity Analysis of the XCG Evaluation Measures.- Ad Hoc Track.- A Method of Preferential Unification of Plural Retrieved Elements for XML Retrieval Task.- CISR at INEX 2006.- Compact Representations in XML Retrieval.- CSIRO's Participation in INEX 2006.- Dynamic Element Retrieval in a Semi-structured Collection.- Efficient, Effective and Flexible XML Retrieval Using Summaries.- Evaluating Structured Information Retrieval and Multimedia Retrieval Using PF/Tijah.- EXTIRP: Baseline Retrieval from Wikipedia.- Filtering and Clustering XML Retrieval Results.- GPX - Gardens Point XML IR at INEX 2006.- IBM HRL at INEX 06.- Indexing "Reading Paths" for a Structured Information Retrieval at INEX 2006.- Influence Diagrams and Structured Retrieval: Garnata Implementing the SID and CID Models at INEX'06.- Information Theoretic Retrieval with Structured Queries and Documents.- SIRIUS XML IR System at INEX 2006: Approximate Matching of Structure and Textual Content.- Structured Content-Only Information Retrieval Using Term Proximity and Propagation of Title Terms.- Supervised and Semi-supervised Machine Learning Ranking.- The University of Kaiserslautern at INEX 2006.- TopX - AdHoc Track and Feedback Task.- Tuning and Evolving Retrieval Engine by Training on Previous INEX Testbeds.- Using Language Models and the HITS Algorithm for XML Retrieval.- Using Topic Shifts in XML Retrieval at INEX 2006.- XSee: Structure Xposed.- Natural Language Processing Track.- Shallow Parsing of INEX Queries.- Using Rich Document Representation in XML Information Retrieval.- NLPX at INEX 2006.- Heterogeneous Collection Track.- The Heterogeneous Collection Track at INEX 2006.- Probabilistic Retrieval Approaches for Thorough and Heterogeneous XML Retrieval.- Multimedia Track.- The INEX 2006 Multimedia Track.- Fusing Visual and Textual Retrieval Techniques to Effectively Search Large Collections of Wikipedia Images.- Social Media Retrieval Using Image Features and Structured Text.- XFIRM at INEX 2006. Ad-Hoc, Relevance Feedback and MultiMedia Tracks.- Interactive Track.- The Interactive Track at INEX 2006.- Use Case Track.- XML-IR Users and Use Cases.- A Taxonomy for XML Retrieval Use Cases.- What XML-IR Users May Want.- Document Track.- Report on the XML Mining Track at INEX 2005 and INEX 2006.- Classifying XML Documents Based on Structure/Content Similarity.- Document Mining Using Graph Neural Network.- Evaluating the Performance of XML Document Clustering by Structure Only.- FAT-CAT: Frequent Attributes Tree Based Classification.- Unsupervised Classification of Text-Centric XML Document Collections.- XML Document Mining Using Contextual Self-organizing Maps for Structures.- XML Document Transformation with Conditional Random Fields.- XML Structure Mapping.
TL;DR: In this article, the role of context is investigated through incorporation of the parent's model for XML component retrieval, and it is shown that context can improve the effectiveness of finding relevant components slightly.
Abstract: Experiments using hierarchical language models for XML component retrieval are presented in this paper. The role of context is investigated through incorporation of the parent's model. We find that context can improve the effectiveness of finding relevant components slightly. Additionally, biasing the results toward long components through the use of component priors improves exhaustivity but harms specificity, so care must be taken to find an appropriate trade-off.
TL;DR: In this paper, a method for processing XML documents in a computer-based system includes associating each of a plurality of information items with a corresponding one of the binary-data units and providing a XML document associated with a XML information set comprising one or more of the plurality of items.
Abstract: A method for processing XML documents in a computer-based system includes associating each of a plurality of information items with a corresponding one of a plurality of binary-data units and providing a XML document associated with a XML information set comprising one or more of the plurality of information items. The method includes serializing the XML document into a binary XML format, or de-serializing the XML document from the binary XML format. Serializing includes translating the one or more information items of the XML information set into their corresponding one or more binary-data units. De-serializing includes translating one or more binary-data units of the binary XML format into their corresponding one or more information items. A computer readable medium is encoded with a program for execution on at least one processor. The program, when executed on the at least one processor, can perform the method for processing XML documents.
TL;DR: Foreword Preface I: XML Technologies 1. HTML and Web Pages 2. XML Documents 3. Navigating XML Trees with XPath 4. Schema Languages 5. Transforming XML documents with XSLT 6. Querying XML Documents with XQuery
Abstract: Foreword Preface I: XML Technologies 1. HTML and Web Pages 2. XML Documents 3. Navigating XML Trees with XPath 4. Schema Languages 5. Transforming XML Documents with XSLT 6. Querying XML Documents with XQuery 7. XML Programming II: Web Technologies 8. The HTTP Protocol 9. Programming Web Applications with Servlets 10. Programming Web Applications with JSP 11. Web Services 12. A Complete Application Bibliography Index
TL;DR: The diffX algorithm for detecting changes between two versions of an XML document is presented, in order to optimize the runtime of mapping the nodes between the two versions and to minimize the size of the edit script.
Abstract: This paper presents the diffX algorithm for detecting changes between two versions of an XML document. The identified changes are reported as a script of edit operations. The script, when applied to the first version of the XML document, will produce the second version. The goal is to optimize the runtime of mapping the nodes between the two versions and to minimize the size of the edit script. To achieve this goal an isolated tree fragment mapping technique is used, in order to iteratively identify the largest matching tree fragments between the tree representations, of the two versions of the document. The mapping technique is robust enough to handle differences in both the structure and the content of the two trees. The generated edit script from the mapping acknowledges the different order sensitiveness of element and attributes of XML data model. The primitives for the edit script comprise both the atomic (node) and non-atomic (subtree) edit operations natural to XML document modification. The runtime of the algorithm is O(n2).
TL;DR: Experimental results verify that the disk-based F&B Index can scale up for large data size with good query performance compared with state-of-the-art XML query processing algorithms.
Abstract: With the proliferation of XML data and applications on the Internet, efficient XML query processing techniques are in great demand. Answering queries using XML indexes is a natural approach. A number of XML indexes have been proposed in the literature: among them, FB the result is a disk-based F&B Index with good clustering properties. In addition, novel query processing algorithms exploiting the physical organization of the disk-based F&B Indexes are proposed. Experimental results verify that our disk-based F&B Index can scale up for large data size with good query performance compared with state-of-the-art XML query processing algorithms.
TL;DR: A new Labelling Scheme for Dynamic XML data (LSDX) is proposed that supports the representation of the ancestor - descendant relationship and sibling relationship between nodes and facilitates fast update of XML data.
Abstract: In order to facilitate query processing for XML data, several path indexing, labelling and numbering scheme have been proposed. However, if XML data need to be updated frequently, most of these approaches will need to re-compute existing labels which is rather time consuming. In this paper, we propose a new Labelling Scheme for Dynamic XML data (LSDX) that supports the representation of the ancestor - descendant relationship and sibling relationship between nodes. Moreover, LSDX supports the process of updating XML data without the need of re-labelling existing labels, hence facilitating fast update. Some experimental works have been conducted to show its effectiveness.
TL;DR: A prototype compiler for XJ is built, and preliminary experiments demonstrate that the performance of XJ programs can approach that of traditional low-level API-based interfaces, while providing a higher level of abstraction.
Abstract: The increased importance of XML as a data representation format has led to several proposals for facilitating the development of applications that operate on XML data. These proposals range from runtime API-based interfaces to XML-based programming languages. The subject of this paper is XJ, a research language that proposes novel mechanisms for the integration of XML as a first-class construct into Java™. The design goals of XJ distinguish it from past work on integrating XML support into programming languages --- specifically, the XJ design adheres to the XML Schema and XPath standards. Moreover, it supports in-place updates of XML data thereby keeping with the imperative nature of Java. We have built a prototype compiler for XJ, and our preliminary experiments demonstrate that the performance of XJ programs can approach that of traditional low-level API-based interfaces, while providing a higher level of abstraction.
TL;DR: An overview of emerging XML storage approaches highlights current practices along with prospective research and implementation trends.
Abstract: We survey emerging native XML storage approaches and identify and highlight popular implementations tailored to XML's "nature" and syntax. By understanding the storage practices of emerging native XML environments, programmers and software designers can better exploit the technology's scalability and reliability benefits. It is because XML is rapidly becoming the Internet standard for data representation and exchange, efficient XML document storage has become a core data management issue. Most early XML storage practices rely on conventional database management systems. However, such systems involve mappings and transformations between XML and the underlying database structure. More recent efforts are based on specific XML-tailored systems that provide ad hoc functionalities. This overview of emerging XML storage approaches highlights current practices along with prospective research and implementation trends.
TL;DR: A multiversionsion data model based on XML schema is introduced and basic mechanisms for the maintenance and retrieval of multiversion norm texts are defined and a prototype management system is described.
Abstract: In this paper, we present the results of a research project concerning the temporal management of normative texts in XML format. In particular, four temporal dimensions (publication, validity, efficacy and transaction times) are used to correctly represent the evolution of norms in time and their resulting versioning. Hence, we introduce a multiversion data model based on XML schema and define basic mechanisms for the maintenance and retrieval of multiversion norm texts. Finally, we describe a prototype management system which has been implemented and evaluated.
TL;DR: The proposed XML parser, Deltarser, is adaptive since it partially parses and then remembers XML document fragments that it has not met before, and processes safely since its partial parsing correctly checks the well-formedness of documents.
Abstract: XML (Extensible Markup Language) processing can incur significant runtime overhead in XML-based infrastructural middleware such as Web service application servers. This paper proposes a novel mechanism for efficiently processing similar XML documents. Given a new XML document as a byte sequence, the XML parser proposed in this paper normally avoids syntactic analysis but simply matches the document with previously processed ones, reusing those results. Our parser is adaptive since it partially parses and then remembers XML document fragments that it has not met before. Moreover, it processes safely since its partial parsing correctly checks the well-formedness of documents. Our implementation of the proposed parser complies with the JSR 63 standard of the Java API for XML Processing (JAXP) 1.1 specification. We evaluated Deltarser performance with messages using Google Web services. Comparing to Piccolo (and Apache Xerces), it effectively parses 35% (106%) faster in a server-side use-case scenario, and 73% (126%) faster in a client-side use-case scenario.
TL;DR: This work presents XSugar, which makes it possible to manage dual syntax for XML languages, and statically checks that the transformations are reversible and that all XML documents generated from the alternative syntax are valid according to a given XML schema.
Abstract: XML is successful as a machine processable data interchange format, but it is often too verbose for human use. For this reason, many XML languages permit an alternative more legible non-XML syntax. XSLT stylesheets are often used to convert from the XML syntax to the alternative syntax; however, such transformations are not reversible since no general tool exists to automatically parse the alternative syntax back into XML.
We present XSugar, which makes it possible to manage dual syntax for XML languages. An XSugar specification is built around a context-free grammar that unifies the two syntaxes of a language. Given such a specification, the XSugar tool can translate from alternative syntax to XML and vice versa. Moreover, the tool statically checks that the transformations are reversible and that all XML documents generated from the alternative syntax are valid according to a given XML schema.
TL;DR: This work provides a formal framework for XML Schema-driven decompositions, which encompasses the decomposition proposed in prior work and extends them with decomPOSitions that employ denormalized tables and binary-coded XML fragments.
Abstract: XML database systems emerge as a result of the acceptance of the XML data model. Recent works have followed the promising approach of building XML database management systems on underlying RDBMS’s. Achieving query processing performance reduces to two questions: (i) How should the XML data be decomposed into data that are stored in the RDBMS? (ii) How should the XML query be translated into an efficient plan that sends one or more SQL queries to the underlying RDBMS and combines the data into the XML result? We provide a formal framework for XML Schema-driven decompositions, which encompasses the decompositions proposed in prior work and extends them with decompositions that employ denormalized tables and binary-coded XML fragments. We provide corresponding query processing algorithms that translate the XML query conditions into conditions on the relational tables and assemble the decomposed data into the XML query result. Our key performance focus is the response time for delivering the first results of a query. The most effective of the described decompositions have been implemented in XCacheDB, an XML DBMS built on top of a commercial RDBMS, which serves as our experimental basis. We present experiments and analysis that point to a class of decompositions, called inlined decompositions, that improve query performance for full results and first results, without significant increase in the size of the database.
TL;DR: This tutorial will provide an insight into how XML functionality fits into relational database management systems as seen by three major relational vendors: IBM, Microsoft and Oracle.
Abstract: As XML has evolved from a document markup language to a widely-used format for exchange of structured and semistructured data, managing large amounts of XML data has become increasingly important. A number of companies, including both established database vendors and startups, have recently announced new XML database systems or new XML functionality integrated into existing database systems. This tutorial will provide an insight into how XML functionality fits into relational database management systems as seen by three major relational vendors: IBM, Microsoft and Oracle.
TL;DR: In this paper, a method of parsing an XML data stream comprises receiving XML data streams containing a namespace prefix and an associated element tag name, which are converted into a token that uniquely represents a namespace specification that is associated with the prefix and the element tag.
Abstract: In one embodiment, a method of parsing an XML data stream comprises receiving an XML data stream containing a namespace prefix and an associated element tag name. The element tag name is associated with an element tag. The namespace prefix and the element tag name are converted into a token that uniquely represents a namespace specification that is associated with the namespace prefix and the element tag. A stack is defined and is configured to receive one or more tokens during parsing of the XML data stream. Parsing of the XML data stream is performed without requiring an XML tree structure comprising an XML document embodied by the XML data stream, to be built.
TL;DR: This paper formally defines the elements in data warehousing and discusses various semantic conflicts occurring among heterogeneous data cubes, and proposes the system architecture and related resolution procedures for all kinds of semantic conflicts.
Abstract: Data warehousing has been widely adopted by contemporary enterprises. For inter-organizational information sharing, the need cannot be over-emphasized to conduct researches on the integration of heterogeneous data warehouses to overcome the challenging situations today. That makes it urgent to establish a systematic integration methodology for integrating heterogeneous data warehouses via the Internet or proprietary extranets. Traditionally, researchers usually employed a canonical format as the integration medium for logical data integrations among heterogeneous systems. In this paper, to fully utilize the power of the Internet, we propose a framework and develop a prototype to integrate heterogeneous data warehouses by XML technologies. We first formally define the elements in data warehousing and discuss various semantic conflicts occurring among heterogeneous data cubes. Then, we propose the system architecture and related resolution procedures for all kinds of semantic conflicts. For local data cubes with different schemas, we define a global XML Schema to integrate the local cube structures, and transform each local cube respectively into an XML document conforming to the global XML Schema. These transformed XML documents obtained from local cubes will be manipulated by pre-defined XQuery commands to form a unified XML document, which can be regarded as the global cube. The integrated global cube can be easily stored and manipulated in native XML databases. The proposed methodology enables global users to browse or pose multi-dimensional expressions (MDX) on the global cube to obtain a result in the same way as they perform locally.
TL;DR: This work extends the usual XML data model with symbolic representations of cryptographic values and uses predicates on this data model to describe the semantics of security elements and of sample protocols distributed with the Microsoft WSE implementation of WS-Security.
TL;DR: To enforce the access constraints on user queries, the Secure Query Rewrite (SQR) is proposed -- a set of rules that can be used to rewrite a user XPath query on the security view into an equivalent XQuery expression against the original data, with the guarantee that the users only see information in the view but not any data that was blocked.
Abstract: Being able to express and enforce role-based access control on XML data is a critical component of XML data management. However, given the semi-structured nature of XML, this is non-trivial, as access control can be applied on the values of nodes as well as on the structural relationship between nodes. In this context, we adopt and extend a graph editing language for specifying role-based access constraints in the form of security views. A Security Annotated Schema (SAS) is proposed as the internal representation for the security views and can be automatically constructed from the original schema and the security view specification. To enforce the access constraints on user queries, we propose Secure Query Rewrite (SQR) -- a set of rules that can be used to rewrite a user XPath query on the security view into an equivalent XQuery expression against the original data, with the guarantee that the users only see information in the view but not any data that was blocked. Experimental evaluation demonstrates the efficiency and the expressiveness of our approach.