TL;DR: The XRANK system is presented, designed to handle the novel features of XML keyword search, which naturally generalizes a hyperlink based HTML search engine such as Google and can be used to query a mix of HTML and XML documents.
Abstract: We consider the problem of efficiently producing ranked results for keyword search queries over hyperlinked XML documents. Evaluating keyword search queries over hierarchical XML documents, as opposed to (conceptually) flat HTML documents, introduces many new challenges. First, XML keyword search queries do not always return entire documents, but can return deeply nested XML elements that contain the desired keywords. Second, the nested structure of XML implies that the notion of ranking is no longer at the granularity of a document, but at the granularity of an XML element. Finally, the notion of keyword proximity is more complex in the hierarchical XML data model. In this paper, we present the XRANK system that is designed to handle these novel features of XML keyword search. Our experimental results show that XRANK offers both space and performance benefits when compared with existing approaches. An interesting feature of XRANK is that it naturally generalizes a hyperlink based HTML search engine such as Google. XRANK can thus be used to query a mix of HTML and XML documents.
TL;DR: An exhaust gas recirculator in an internal combustion engine wherein a part of the exhaust gas is supplied to a suction manifold only at the time when the throttle valve is opened and in some time interval immediately after the time afterwards.
Abstract: An exhaust gas recirculator in an internal combustion engine wherein a part of the exhaust gas is supplied to a suction manifold only at the time when the throttle valve is opened and in some time interval immediately after the time afterwards, and the supply of the exhaust gas is stopped in time interval in the steady operation of the engine at the degree of opening of the throttle valve in the above supply period.
TL;DR: eXist as discussed by the authors is an Open Source native XML database system, which supports keyword search on element and attribute contents and an enhanced indexing scheme at the architecture's core supports quick identification of structural node relationships.
Abstract: With the advent of native and XML enabled database systems, techniques for efficiently storing, indexing and querying large collections of XML documents have become an important research topic. This paper presents the storage, indexing and query processing architecture of eXist, an Open Source native XML database system. eXist is tightly integrated with existing tools and covers most of the native XML database features. An enhanced indexing scheme at the architecture's core supports quick identification of structural node relationships. Based on this scheme, we extend the application of path join algorithms to implement most parts of the XPath query language specification and add support for keyword search on element and attribute contents.
TL;DR: In this article, the XML component operation can be transformed to a relational database operation on a particular set of one or more relational database constructs of the first set, which does not involve the X component operation.
Abstract: Techniques for executing database commands include receiving a database command that includes an XML component operation that operates on an XML construct that is based on a first set of one or more relational database constructs. It is determined whether the XML component operation can be transformed to a relational database operation on a particular set of one or more relational database constructs of the first set, which does not involve the XML component operation. If it is determined that the XML component operation can be transformed, then the XML component operation is rewritten as a particular relational database operation that operates on the particular set and that does not involve the XML component operation. The particular relational database operation is evaluated. In another aspect, techniques include determining a primitive set of XML generation operations and replacing non-primitive XML generation operations with one or more operations from the primitive set.
TL;DR: The main result of the paper is that typechecking for k-pebble transducers is decidable, and therefore, typechecking can be performed for a broad range of XML transformation languages, including XML-QL and a fragment of XSLT.
TL;DR: A static analysis technique that can identify at compile time which parts of the input document are needed to answer an arbitrary XQuery, and a loading algorithm that takes the resulting information to build a projected document, which is smaller than the original document, and on which the query yields the same result.
Abstract: XQuery is not only useful to query XML in databases, but also to applications that must process XML documents as files or streams. These applications suffer from the limitations of current main-memory XQuery processors which break for rather small documents. In this paper we propose techniques, based on a notion of projection for XML, which can be used to drastically reduce memory requirements in XQuery processors. The main contribution of the paper is a static analysis technique that can identify at compile time which parts of the input document are needed to answer an arbitrary XQuery. We present a loading algorithm that takes the resulting information to build a projected document, which is smaller than the original document, and on which the query yields the same result. We implemented projection in the Galax XQuery processor. Our experiments show that projection reduces memory requirements by a factor of 20 on average, and is effective for a wide variety of queries. In addition, projection results in some speedup during query evaluation.
TL;DR: A complete framework for distributed and replicated dynamic XML documents, and an algorithm that, for a given peer, chooses data and services that the peer should replicate to improve the efficiency of maintaining and querying its dynamic data are described.
Abstract: The advent of XML as a universal exchange format, and of Web services as a basis for distributed computing, has fostered the apparition of a new class of documents: dynamic XML documents. These are XML documents where some data is given explicitly while other parts are given only intensionally by means of embedded calls to web services that can be called to generate the required information. By the sole presence of Web services, dynamic documents already include inherently some form of distributed computation. A higher level of distribution that also allows (fragments of) dynamic documents to be distributed and/or replicated over several sites is highly desirable in today's Web architecture, and in fact is also relevant for regular (non dynamic) documents.The goal of this paper is to study new issues raised by the distribution and replication of dynamic XML data. Our study has originated in the context of the Active XML system [1, 3, 22] but the results are applicable to many other systems supporting dynamic XML data. Starting from a data model and a query language, we describe a complete framework for distributed and replicated dynamic XML documents. We provide a comprehensive cost model for query evaluation and show how it applies to user queries and service calls. Finally, we describe an algorithm that, for a given peer, chooses data and services that the peer should replicate to improve the efficiency of maintaining and querying its dynamic data.
TL;DR: This paper extends XML DTDs with several classes of integrity constraints and investigates the complexity of reasoning about these constraints and establishes complexity and axiomatization results for the (finite) implication problems associated with these constraints.
TL;DR: In this paper, a SQL statement includes a particular operator that operates on a first instance of XML type that represents a first set of XML elements, during execution of the SQL statement, the particular operator is evaluated by generating an ordered collection of instances of XML types.
Abstract: Techniques for managing XML data in an SQL compliant DBMS include receiving an SQL statement. The SQL statement includes a particular operator that operates on a first instance of XML type that represents a first set of XML elements. During execution of the SQL statement, the particular operator is evaluated by generating an ordered collection of instances of XML type. Each different instance in the ordered collection is based on a different XML element from the first set; and there is an instance in the ordered collection for every XML element from either the first set or from the first set and its descendents. When descendents are included, each entry in the ordered collection indicates a level in the XML tree. In another aspect, an aggregate operator in the SQL statement operates on a collection of instances, with associated levels, to generate a single instance of XML type.
TL;DR: Comparing relational database performance shows that the desired response times and transaction rates over XML data can not be achieved without major improvements in XML parsing technology, and identifies research topics which are most promising for XML parser performance in database systems.
Abstract: XML parsing is generally known to have poor performance characteristics relative to transactional database processing. Yet, its potentially fatal impact on overall database performance is being underestimated. We report real-word database applications where XML parsing performance is a key obstacle to a successful XML deployment. There is a considerable share of XML database applications which are prone to fail at an early and simple road block: XML parsing. We analyze XML parsing performance and quantify the extra overhead of DTD and schema validation. Comparison with relational database performance shows that the desired response times and transaction rates over XML data can not be achieved without major improvements in XML parsing technology. Thus, we identify research topics which are most promising for XML parser performance in database systems.
TL;DR: This paper is the first attempt at describing the XML Web and the documents contained in it and shows that, despite its short history, XML already permeates the Web, both in terms of generic domains and geographically.
Abstract: Although originally designed for large-scale electronic publishing, XML plays an increasingly important role in the exchange of data on the Web. In fact, it is expected that XML will become the lingua franca of the Web, eventually replacing HTML. Not surprisingly, there has been a great deal of interest on XML both in industry and in academia. Nevertheless, to date no comprehensive study on the XML Web (i.e., the subset of the Web made of XML documents only) nor on its contents has been made. This paper is the first attempt at describing the XML Web and the documents contained in it. Our results are drawn from a sample of a repository of the publicly available XML documents on the Web, consisting of about 200,000 documents. Our results show that, despite its short history, XML already permeates the Web, both in terms of generic domains and geographically. Also, our results about the contents of the XML Web provide valuable input for the design of algorithms, tools and systems that use XML in one form or another.
TL;DR: This paper focuses on the engineering and the experimental evaluation of the MARS system, a system for publishing as XML data from mixed (relational+XML) proprietary storage, while supporting redundancy in storage for tuning purposes.
Abstract: We present a system for publishing as XML data from mixed (relational+XML) proprietary storage, while supporting redundancy in storage for tuning purposes. The correspondence between public and proprietary schemas is given by a combination of LAV-and GAV-style views expressed in XQuery. XML and relational integrity constraints are also taken into consideration. Starting with client XQueries formulated against the public schema the system achieves the combined effect of rewriting-with-views, composition-with-views and query minimization under integrity constraints to obtain optimal reformulations against the proprietary schema. The paper focuses on the engineering and the experimental evaluation of the MARS system.
TL;DR: A lightweight fact extractor is presented that utilizes XML tools, such as XPath and XSLT to extract static information from C++ source code programs to facilitate the use of a wide variety of XML tools.
Abstract: A lightweight fact extractor is presented that utilizes XML tools, such as XPath and XSLT to extract static information from C++ source code programs. The source code is first converted into an XML representation, srcML, to facilitate the use of a wide variety of XML tools. The method is deemed lightweight because only a partial parsing of the source is done. Additionally, the technique is quite robust and can be applied to incomplete and noncompilable source code. The trade off to this approach is that queries on some low level details cannot be directly addressed. This approach is applied to a fact extractor benchmark as comparison with other, heavier weight, fact extractors. Fact extractors are widely used to support understanding tasks associated with maintenance, reverse engineering and various other software engineering tasks.
TL;DR: In this article, a system and method for the efficient indexing and delivery of information to interested users who have expressed an interest in or subscribed to information items that are continuously released or published by some data source in XML format is presented.
Abstract: The present invention provides a system and method for the efficient indexing and delivery of information to interested users who have expressed an interest in or “subscribed” to information items that are continuously released or “published” by some data source in XML format. Previously, publish and subscribe systems accepted keyword-based subscription profiles and did not support subscription to XML documents according to their structures. Direct approach to implement XML-based publish and subscribe system by checking each user profile against an XML document is very time consuming. The presentation invention, though, provides an efficient method to identify interested subscribers for each XML document by indexing queries utilizing a graphical structure of nodes. When an XML document is published, the index identifies all matched expressions in the index and delivers at least a portion of an XML document to a user who has expressed an interest in receiving this information.
TL;DR: A program product, system and method for transforming data between an XML representation and a relational database system wherein a mapping description is created in a mark-up language such as XML and XSL is described in this article.
Abstract: A program product, system and method for transforming data between an XML representation and a relational database system wherein a mapping description is created in a mark-up language such as XML and XSL. The mapping description specifying a set of conditions for source data to satisfy. When mapping to XML, an XML output format is specified in the mapping description and the data is formatted accordingly. When mapping to a RDBMS, actions to be executed on the RDBMS tables are specified in the mapping description and the actions are perfomed.
TL;DR: In this paper, an XML index can be implemented as a node table and the node table may have a B+-tree structure and be populated by shredding the XML values in the primary table.
Abstract: Storing and querying XML data in a primary table or document utilizes an index of XML data and includes creating a primary table structure, creating a primary XML index commensurate with the primary table structure, populating the primary table and the primary XML index, and running a query on the XML data in a primary table by utilizing the XML index. The XML index can be implemented as a node table. The node table may have a B+-tree structure and be populated by shredding the XML values in the primary table. The XML data may be stored as binary large objects in an XML column of the primary table. Secondary XML indexes may be created to assist in the search and retrieval of XML data stored in the primary table. Both the primary XML index and the secondary XML index tables may be created using data definition language statements.
TL;DR: In QRS, reefs (regions expressed by floating-point numbers), a variant of regions, are used for expressing node-numbers, and thus they can be used for detecting ancestor-descendant relationship among nodes for the purpose of efficient query processing.
Abstract: Update management of XML documents is an increasingly important research issue in XML databases, because contents of XML documents evolve as time goes by. Even though, XML databases should be able to effectively process XML queries as well as updates on the documents. We propose a robust node-numbering scheme for XML documents named QRS (quartering-regions scheme). In QRS, reefs (regions expressed by floating-point numbers), a variant of regions, are used for expressing node-numbers. Reefs are almost compatible to regions, and thus they can be used for detecting ancestor-descendant relationship among nodes for the purpose of efficient query processing. Moreover, reefs can cope with updates by utilizing gaps between reefs in terms of floating-point numbers. Consequently, we can avoid node renumbering as much as possible.
TL;DR: It is shown that while most XML path query processing techniques work off SAX events, in some cases it pays off to preprocess the input document, augmenting it with auxiliary information that can be used to evaluate the queries faster.
Abstract: XML path queries form the basis of complex filtering of XML data. Most current XML path query processing techniques can be divided in two groups. Navigation-based algorithms compute results by analyzing an input document one tag at a time. In contrast, index-based algorithms take advantage of precomputed numbering schemes over the input XML document. We introduce a new index-based technique, index-filter, to answer multiple XML path queries. Index-filter uses indexes built over the document tags to avoid processing large portions of the input document that are guaranteed not to be part of any match. We analyze index-filter and compare it against Y-filter, a state-of-the-art navigation-based technique. We show that both techniques have their advantages, and we discuss the scenarios under which each technique is superior to the other one. In particular, we show that while most XML path query processing techniques work off SAX events, in some cases it pays off to preprocess the input document, augmenting it with auxiliary information that can be used to evaluate the queries faster. We present experimental results over real and synthetic XML documents that validate our claims.
TL;DR: In this paper, a system and method for validating an extensible markup language (XML) document and reporting schema violations in real-time is presented, where a parallel tree is maintained that includes nodes corresponding to non-native XML elements of the XML document.
Abstract: A system and method for validating an extensible markup language (XML) document and reporting schema violations in real time. A parallel tree is maintained that includes nodes corresponding to non-native XML elements of the XML document. When changes occur to the XML document, the non-native XML elements corresponding to the changes are marked. The nodes corresponding the marked non-native XML elements are validated against an XML schema that corresponds to the non-native XML markup. The elements and nodes corresponding to errors in the non-native XML markup are then reported to the user according to display indicators in the XML document and the parallel tree.
TL;DR: A semi-automated methodology for designing web warehouses from XML sources modeled by XML Schemas, with particular relevance to the problem of detecting shared hierarchies and convergence of dependencies, and of modeling many-to-many relationships.
Abstract: Web warehousing plays a key role in providing the managers with up-to-date and comprehensive information about their business domain. On the other hand, since XML is now a standard de facto for the exchange of semi-structured data, integrating XML data into web warehouses is a hot topic. In this paper we propose a semi-automated methodology for designing web warehouses from XML sources modeled by XML Schemas. In the proposed methodology, design is carried out by first creating a schema graph, then navigating its arcs in order to derive a correct multidimensional representation. Differently from previous approaches in the literature, particular relevance is given to the problem of detecting shared hierarchies and convergence of dependencies, and of modeling many-to-many relationships. The approach is implemented in a prototype that reads an XML Schema and produces in output the logical schema of the warehouse.
TL;DR: In this article, a system and method for XML query cursor implementation through the steps of query translation and processing, query result navigation, and positioned update is described. But, given a user's navigation patterns, a system-and method is provided to select either a multi-cursor, outer union, or hybrid approach as an optimal implementation for an XQuery query cursor.
Abstract: A system and method are provided for XML query cursor implementation through the steps of query translation and processing, query result navigation, and positioned update. An XML query cursor implemented in Interface Definition Language (IDL) as well as an extension to XQuery, an XML query language, is described. These steps are addressed by one of three approaches: multi-cursor, outer union, or hybrid. In each approach, XML data is assumed to be stored in a relational database with a mapping that maps each element to a row in a relational database table. In each approach, a system and method provide for cursor movements and positioned updates in increments of a node, sub-tree, or entire document. Given a user's navigation patterns, a system and method is provided to select either a multi-cursor, outer union, or hybrid approach as an optimal implementation for an XML query cursor.
TL;DR: In this paper, a pre-boot execution environment for XML-based security and key management services is described, where XML console in and console out interfaces are loaded, and corresponding API's are published to enable use of the interfaces by various firmware and software components.
Abstract: Methods and systems to support XML-based security and key management services in a pre-boot execution environment. During pre-boot, XML console in and console out interfaces are loaded, and corresponding API's are published to enable use of the interfaces by various firmware and software components. A network stack is set up to enable XML content received at the network interface to be forwarded to the XML console in interface and XML content provided at the XML content out interface to be sent out via the network interface. Security operations may then be performed to authenticate a client system hosting the XML interfaces, to authenticate remote servers to which the client system may communicate with, and to validate boot images provided to the computer system. Key management services are also supported.
TL;DR: A temporal XML query language, τXQuery, is presented, in which valid time support is added to XQuery by minimally extending the syntax and semantics of X query by adopting a stratum approach which maps a τX query to a conventional XQuery.
Abstract: As with relational data, XML data changes over time with the creation, modification, and deletion of XML documents. Expressing queries on time-varying (relational or XML) data is more difficult than writing queries on nontemporal data. In this paper, we present a temporal XML query language, τXQuery, in which we add valid time support to XQuery by minimally extending the syntax and semantics of XQuery. We adopt a stratum approach which maps a τXQuery query to a conventional XQuery. The paper focuses on how to perform this mapping, in particular, on mapping sequenced queries, which are by far the most challenging. The critical issue of supporting sequenced queries (in any query language) is time-slicing the input data while retaining period timestamping. Timestamps are distributed throughout an XML document, rather than uniformly in tuples, complicating the temporal slicing while also providing opportunities for optimization. We propose four optimizations of our initial maximally-fragmented time-slicing approach: selected node slicing, copy-based per-expression slicing, in-place per-expression slicing, and idiomatic slicing, each of which reduces the number of constant periods over which the query is evaluated. While performance tradeoffs clearly depend on the underlying XQuery engine, we argue that there are queries that favor each of the five approaches.
TL;DR: In this paper, a semi-automated methodology for designing web warehouses from XML sources modeled by XML Schemas is proposed, which is carried out by first creating a schema graph, then navigating its arcs in order to derive a correct multidimensional representation.
Abstract: Web warehousing plays a key role in providing the managers with up-to-date and comprehensive information about their business domain. On the other hand, since XML is now a standard de facto for the exchange of semi-structured data, integrating XML data into web warehouses is a hot topic. In this paper we propose a semi-automated methodology for designing web warehouses from XML sources modeled by XML Schemas. In the proposed methodology, design is carried out by first creating a schema graph, then navigating its arcs in order to derive a correct multidimensional representation. Differently from previous approaches in the literature, particular relevance is given to the problem of detecting shared hierarchies and convergence of dependencies, and of modeling many-to-many relationships. The approach is implemented in a prototype that reads an XML Schema and produces in output the logical schema of the warehouse.
TL;DR: In this paper, a DEAF-core technology is used to convert inputs to outputs accessible to people with disabilities by using data storage and transmission format that includes both semantic information and content.
Abstract: DEAF-core technology converts inputs to outputs accessible to people with disabilities. Communication is improved with DEAF-core technology by using data storage and transmission format that includes both semantic information and content. User-defined input, responsible for conveying semantic information, and raw analog input, such as text, are converted into a unique XML format (“gh XML”). “gh XML” includes standard XML encoded with accessibility information that allows a user to communicate both verbal (text) and non-verbal (semantic) information as part of the input. “gh XML” is a temporary format which is further converted using XSLT (extensible Stylesheet Language Transformations) into individual versions of XML specific to each output. After the “gh XML” is converted into the desired XML format, custom rendering engines specific to the desired output convert the individual version of XML into a viable analog format for display.
TL;DR: This paper motivates and presents critical requirements for the management of MPEG-7 media descriptions and the resulting consequences for XML database solutions and discusses current state-of-the-art database solutions for XML documents.
Abstract: MPEG-7 constitutes a promising standard for the description of multimedia content. It can be expected that a lot of applications based on MPEG-7 media descriptions will be set up in the near future. Therefore, means for the adequate management of large amounts of MPEG-7-compliant media descriptions are certainly desirable. Essentially, MPEG-7 media descriptions are XML documents following media description schemes defined with a variant of XML Schema. Thus, it is reasonable to investigate current database solutions for XML documents regarding their suitability for the management of these descriptions. In this paper, we motivate and present critical requirements for the management of MPEG-7 media descriptions and the resulting consequences for XML database solutions. Along these requirements, we discuss current state-of-the-art database solutions for XML documents. The analysis and comparison unveil the limitations of current database solutions with respect to the management of MPEG-7 media descriptions and point the way to the need for a new generation of XML database solutions.
TL;DR: This paper presents the Oracle XML DB solution for a flexible mapping of XML Schemas to object-relational database, which preserves document fidelity, including ordering, namespaces, comments, processing instructions etc., and handles all the XML Schema semantics including cyclic definitions, dervations (extension and restriction), and wildcards.
Abstract: The W3C XML Scheme language is becomimg increasingly popular for expressing the data model for XML documents. It is a powerful language that incorporates both strutural and datatype modeling features. There are many benefits to storing XML Schema compliant data in a database system, including better queryability, optimied updates and stronger validation. However, the fidelity of the XML document cannot be sacrificed. Thus, the fundamental problem facing database implementers is: how can XML Schemes be mapped to relational (and object-relational) database without losing schema semantics or data-fidelity? In this paper, we present the Oracle XML DB solution for a flexible mapping of XML Schemas to object-relational database. It preserves document fidelity, including ordering, namespaces, comments, processing instructions etc., and handles all the XML Schema semantics including cyclic definitions, dervations (extension and restriction), and wildcards. We also discuss various query and update optimiations that involve rewriting XPath operations to directly operate on the underlying relational data.
TL;DR: In this paper, a method and apparatus for composing XSL transformations with XML publishing views is presented. But the method is not suitable for the use of XML documents as views of relational databases.
Abstract: A method and apparatus are provided for composing XSL transformations with XML publishing views. XSL transformations are performed on XML documents defined as views of relational databases. A portion of a relational database can be exported to an XML document. An initial view query defines an XML view on the relational database and an XSLT stylesheet specifies at least one transformation. The initial view query is modified to account for an effect of the transformation and the modified view query is applied to the relational database to obtain the XML document. When the modified view query is evaluated on a relational database instance, the same XML document is obtained as would be obtained by evaluating the XSLT stylesheet on the original XML view.
TL;DR: In this article, the authors present a unique process that utilizes a unique notion of physical XML Schemas, i.e., P-Schemas; a p-Schema costing procedure; a set of P-schema rewritings; and a search strategy to heuristically determine the P-schemas with the least cost.
Abstract: Extensible Markup Language (XML) data is mapped to be stored in an alternative data base management system (DBMS) by generating a plurality of alternative ones of mappings in response to a supplied XML document and corresponding XML schema; evaluating at least a prescribed attribute of each of the plurality of mappings with respect to an expected workload for the storage system; and selecting one of the alternative mappings based on the prescribed attribute which is the most advantageous for the expected system workload. More specifically, applicants employ a unique process that utilizes a unique notion of physical XML Schemas, i.e., P-Schemas; a P-Schema costing procedure; a set of P-Schema rewritings; and a search strategy to heuristically determine the P-Schema with the least cost. Specifically, the unique notion of physical XML Schemas, extend XML Schemas to contain data statistics; a P-Schema can be easily and uniquely mapped into a storage configuration for the target DBMS. The P-Schema costing procedure estimates the cost of evaluating the query workload on the corresponding unique storage configuration. The set of P-Schema rewritings, when successively applied to a P-Schema, yields a space of alternative P-Schemas. These alternative P-Schemas have the property that any XML document that is valid for the initial P-Schema is also valid for any of these alternative P-Schemas. The search strategy examines this space of alternative P-Schemas to heuristically determine the P-Schema with the least cost. The storage configuration derived from this least cost P-Schema is the desired storage configuration to be used to store the XML data in the target DBMS.
TL;DR: A new conditional access system architecture that uses XML digital signature and encryption to securely distribute audio, video, image, and data on the Web and supports payment transactions in a secure environment is proposed.
Abstract: A new conditional access system architecture is proposed. It uses XML digital signature and encryption to securely distribute audio, video, image, and data on the Web. It also supports payment transactions in a secure environment.