TL;DR: In this article, instructions are received to open an eXtensible Markup Language (XML) document and the XML document is searched to locate a processing instruction (PI) containing an entity.
Abstract: Instructions are received to open an eXtensible Markup Language (XML) document. The XML document is searched to locate a processing instruction (PI) containing an entity. The entity, by example, can be a href attribute, a URL, a name, or a character string identifying an application that created an HTML electronic form associated with the XML document. A solution is discovered using the entity. The XML document is opened with the solution. The solution includes an XSLT presentation application and an XML schema. The XML document can be inferred from the XML schema and portions of the XML document are logically coupled with fragments of the XML schema. The XSLT presentation application is executing to transform the coupled portions of the XML document into the HTML electronic form containing data-entry fields associated with the coupled portions. Data entered through the data-entry fields can be validated using the solution.
TL;DR: A technique is presented that allows to represent the tree structure of an XML document in an efficient way by compressing their tree structure, and the functionality of basic tree operations, like traversal along edges, is preserved under this compressed representation.
TL;DR: This paper proposes a data model for tracking historical information in an XML document and for recovering the state of the document as of any given time, and introduces a new class of summaries, denoted TSummary, that adds the time dimension to the well-known path summarization schemes.
Abstract: In this paper we address the problem of modeling and implementing temporal data in XML. We propose a data model for tracking historical information in an XML document and for recovering the state of the document as of any given time. We study the temporal constraints imposed by the data model, and present algorithms for validating a temporal XML document against these constraints, along with methods for fixing inconsistent documents. In addition, we discuss different ways of mapping the abstract representation into a temporal XML document, and introduce TXPath, a temporal XML query language that extends XPath 2.0. In the second part of the paper, we present our approach for summarizing and indexing temporal XML documents. In particular we show that by indexing continuous paths, i.e., paths that are valid continuously during a certain interval in a temporal XML graph, we can dramatically increase query performance. To achieve this, we introduce a new class of summaries, denoted TSummary, that adds the time dimension to the well-known path summarization schemes. Within this framework, we present two new summaries: LCP and Interval summaries. The indexing scheme, denoted TempIndex, integrates these summaries with additional data structures. We give a query processing strategy based on TempIndex and a type of ancestor-descendant encoding, denoted temporal interval encoding. We present a persistent implementation of TempIndex, and a comparison against a system based on a non-temporal path index, and one based on DOM. Finally, we sketch a language for updates, and show that the cost of updating the index is compatible with real-world requirements.
TL;DR: It is demonstrated that XSPARQL provides concise and intuitive solutions for mapping between XML and RDF in either direction, addressing both the use cases of GRDDL and SAWSDL.
Abstract: With currently available tools and languages, translating between an existing XML format and RDF is a tedious and error-prone task. The importance of this problem is acknowledged by the W3C GRDDL working group who faces the issue of extracting RDF data out of existing HTML or XML files, as well as by the Web service community around SAWSDL, who need to perform lowering and lifting between RDF data from a semantic client and XML messages for a Web service. However, at the moment, both these groups rely solely on XSLT transformations between RDF/XML and the respective other XML format at hand. In this paper, we propose a more natural approach for such transformations based on merging XQuery and SPARQL into the novel language XSPARQL. We demonstrate that XSPARQL provides concise and intuitive solutions for mapping between XML and RDF in either direction, addressing both the use cases of GRDDL and SAWSDL. We also provide and describe an initial implementation of an XSPARQL engine, available for user evaluation.
TL;DR: A survey of four representative XML parsing models-DOM, SAX, StAX, and VTD-reveals their suitability for different types of applications.
Abstract: Parsing is an expensive operation that can degrade XML processing performance. A survey of four representative XML parsing models-DOM, SAX, StAX, and VTD-reveals their suitability for different types of applications.
TL;DR: This work presents XSugar, which makes it possible to manage dual syntax for XML languages, and statically checks that the transformations are reversible and that all XML documents generated from the alternative syntax are valid according to a given XML schema.
TL;DR: In this article, a method and system for analyzing relationship between molecular structure and biological activity in one or more molecules by transforming molecular structure data into a hierarchical representation of chemical concepts and descriptors and detecting common tree-like patterns in the data.
Abstract: Method and system for analyzing relationship between molecular structure and biological activity in one or more molecules by transforming molecular structure data into a hierarchical representation of chemical concepts and descriptors and detecting common tree-like patterns in the data.
TL;DR: A model combining the advantages of node filtering and query rewriting systems and overcoming their limitations is described, suitable as the basis of a standard technique for XML access control enforcement.
TL;DR: By means of a review of the available literature the authors draw several conclusions about the status quo of XML security and the current state and focuses of research as well as the existing challenges are derived.
TL;DR: The nested tree structure is proposed that makes it possible to use the dynamic interval-based labeling scheme, which supports XML data updates with almost no node relabeling as well as efficient structural join processing.
TL;DR: In this paper, a method and system for maintaining an XML index in response to piece-wise modifications on indexed XML documents is presented, where the database server that manages the XML index determines which nodes are involved in the piecewise modifications, and updates the index based on only those nodes.
Abstract: A method and system are provided for maintaining an XML index in response to piece-wise modifications on indexed XML documents. The database server that manages the XML index determines which nodes are involved in the piece-wise modifications, and updates the XML index based on only those nodes. Index entries for nodes not involved in the piece-wise modifications remain unchanged.
TL;DR: In this paper, a system and method for developing and enabling model-driven extensible Markup Language (XML) transformation to XML Metadata Interchange (XMI) format incorporate a strong built-in validation capability.
Abstract: A system and method for developing and enabling model-driven extensible Markup Language (XML) transformation to XML Metadata Interchange (XMI) format incorporate a strong built-in validation capability A platform independent framework applies multiple passes of transformation, where each pass performs specific operations on internal models Different source models are then merged into a target model
TL;DR: This paper compares the quality of the formed clusters with those of one of the latest XML clustering algorithms and shows that the proposed algorithm outperforms it in the case of both homogeneous and heterogeneous XML documents.
Abstract: In this paper we propose a unified clustering algorithm for both homogeneous and heterogeneous XML documents. Depending on the type of the XML documents, the proposed algorithm modifies its distance metric in order to properly adapt to the special structural characteristics of homogeneous and heterogeneous XML documents. We compare the quality of the formed clusters with those of one of the latest XML clustering algorithms and show that our algorithm outperforms it in the case of both homogeneous and heterogeneous XML documents.
TL;DR: The author introduces his own conceptual model for XML called XSEM that extends the Entity-Relationship model and takes into account the specifics identified in the previous parts of the text.
Abstract: XML is a popular format for data representation. As the amount of data represented in XML grows, it is necessary to concentrate on the process of modeling XML schemes of the XML representations. However, modeling the XML schemes on the level of XML schema languages, such as XML Schema, has some drawbacks. A natural idea to improve this situation is to model the XML schemes first on a conceptual level. It is motivated by the world of relational databases where the author also starts modeling the data first on a conceptual level. In this publication the focus lies on conceptual modeling for XML. The author starts with a motivating example to point out to several problems that can arise when using only XML schema languages for modeling XML schemes. It is discussed how modeling the data on conceptual level can help. Also, it is shown that conceptual modeling for XML has some specifics that should be taken into account by a conceptual model for XML. Mainly, this means that it is necessary to separate the conceptual modeling process in two parts. In the main part of the publication, the author introduces his own conceptual model for XML called XSEM that extends the Entity-Relationship model and takes into account the specifics identified in the previous parts of the text. The book is concluded with possible applications of the proposed model and the current and future work in the area.
TL;DR: In this article, a computer-implemented method for use with an extensible markup language (XML) document includes inputting a high-level mapping specification for a schema mapping; and generating a target XML document based on the mapping.
Abstract: A computer-implemented method for use with an extensible markup language (XML) document includes inputting a high-level mapping specification for a schema mapping; and generating a target XML document based on the mapping. The method may perform schema mapping-based XML transformation as a three-phase process comprising tuple extraction, XML-fragment generation, and data merging. The tuple extraction phase may be adapted to handle streamed XML data (as well as stored/indexed XML data). The data merging phase may use a hybrid method that can dynamically switch between main memory-based and disk-based algorithms based on the size of the XML data to be merged.
TL;DR: In this article, the authors describe programmatic access to persistent XML and relational data from applications based on explicit mappings between object classes, XML schema types, and relations. But the mappings are used in data access, that is, they drive query and update processing.
Abstract: Described is programmatic access to persistent XML and relational data from applications based upon explicit mappings between object classes, XML schema types, and relations. The mappings are used in data access, that is, they drive query and update processing. A query may be processed into a query for accessing the XML data and another query for second type for accessing the relational data. Mappings support strongly-typed classes and loosely-typed classes, and may be conditional upon other data, may decouple query and update translation performed at runtime from schema translation used at compile time, and/or may be compiled into transformations that produce objects from XML data and transformations that produce XML data from objects. Mappings may be generated automatically or provided by the developer.
TL;DR: This document describes an XML patch framework utilizing XML Path language (XPath) selectors utilizing selector values and updated new data content that constitute the basis of patch operations described in this document.
Abstract: Extensible Markup Language (XML) documents are widely used as
containers for the exchange and storage of arbitrary data in
today's systems. In order to send changes to an XML document, an
entire copy of the new version must be sent, unless there is a means
of indicating only the portions that have changed. This document
describes an XML patch framework utilizing XML Path language (XPath)
selectors. These selector values and updated new data content
constitute the basis of patch operations described in this document.
In addition to them, with basic , and
directives a set of patches can then be applied to update an existing
XML document.
TL;DR: This paper sketches the prototype native XML database system called XML Transaction Coordinator (XTC) and specifies the operations for accessing and modifying stored documents and introduces four XML lock protocols of growing sophistication and complexity, which are based on a tree-structured DOM storage model.
Abstract: Processing XML documents in multi-user database management environments requires a suitable storage model, support of typical XML document processing (XDP) interfaces, and concurrency control mechanisms tailored to the XML data model. In this paper, we sketch our prototype native XML database system called XML Transaction Coordinator (XTC) and specify the operations for accessing and modifying stored documents. The key contribution is the design and optimization of fine-grained lock protocols supporting collaborative processing of XML documents. For this reason, we introduce four XML lock protocols of growing sophistication and complexity, which are based on a tree-structured DOM storage model. The lock modes of these protocols, called taDOM* lock protocols, are tailor-made for the operations of the DOM API. Because of the protocols' complexity, their correctness is not obvious; hence, we present the ideas to prove the lock protocol correctness guaranteeing the specified data processing behavior of the given XDP operations. Finally, using XTC as our testbed system, we run extensive performance measurements to empirically evaluate our lock protocols and to compare their performance behavior against all known fine-grained competitor protocols under the same benchmark in an identical system setting. It turns out that tailor-made optimization pays off and that the taDOM* protocols are the clear winners in our lock protocol contest.
TL;DR: A mild condition on SPJ views is proposed, and under this condition the analysis of deletions on relational views becomes PTIME while the insertion analysis is NP-complete, and an efficient algorithm to process relational view deletions is developed.
Abstract: This paper investigates the view update problem for XML views published from relational data. We consider XML views defined in terms of mappings directed by possibly recursive DTDs compressed into DAGs and stored in relations. We provide new techniques to efficiently support XML view updates specified in terms of XPath expressions with recursion and complex filters. The interaction between XPath recursion and DAG compression of XML views makes the analysis of the XML view update problem rather intriguing. Furthermore, many issues are still open even for relational view updates, and need to be explored. In response to these, on the XML side, we revise the notion of side effects and update semantics based on the semantics of XML views, and present efficient algorithms to translate XML updates to relational view updates. On the relational side, we propose a mild condition on SPJ views, and show that under this condition the analysis of deletions on relational views becomes PTIME while the insertion analysis is NP-complete. We develop an efficient algorithm to process relational view deletions, and a heuristic algorithm to handle view insertions. Finally, we present an experimental study to verify the effectiveness of our techniques.
TL;DR: Both theoretical proof and experimental results reported in this paper demonstrate that the concise structure of Version Tree and the reduced input size make TwigVersion outperform the existing approaches.
Abstract: A common problem of XML query algorithms is that execution time and input size grows rapidly as the size of XML document increases. In this paper, we propose a version-labeling scheme and TwigVersion algorithm to address this problem. The version-labeling scheme is utilized to identify all repetitive structures in XML documents, and the Version Tree is constructed to hold such version information. To process a query, TwigVersion generates a filter through the created Version Tree, and the final answer to the query can be retrieved from the database easily through the filtering process. Both theoretical proof and experimental results reported in this paper demonstrate that the concise structure of Version Tree and the reduced input size make TwigVersion outperform the existing approaches.
TL;DR: A new model expressing Arden Syntax with the eXtensible Markup Language (XML) was developed to increase its portability and uses two syntax checking mechanisms, first an XML validation process, and second, a syntax check using an XSL style sheet.
TL;DR: The objective of this paper is to describe the XML model that abstracts the differences in the underlying heterogeneous client-server message formats and provides a common XML message interface.
Abstract: Applications that use directory services or relational databases operate in a client-server model, where a client requests information from a server, and the server returns a response to the client. These client-server applications typically have a specific message protocol that is unique to that application. Systems with multiple client-server applications require that there are separate client programs that individually communicate with their respective server programs. A need exists to access information from heterogeneous systems in a standard message request-response format. A generic eXtensible Markup Language (XML) model was developed to obtain data from diverse measurement systems. The objective of this paper is to describe the XML model that abstracts the differences in the underlying heterogeneous client-server message formats and provides a common XML message interface. The XML messages are parsed through a common XML gateway that decides to which application server to forward the messages. The generic XML messages are translated to the correct application server format before being sent to the application server.
TL;DR: This paper focuses on XML data update management in XEnDB, and proposes a generic update methodology that utilizes the proposed schema and uses the SQL/XML as a standard language.
TL;DR: This paper defines Visibly Pushdown Transducers (VPTs) that give the framework for solving the validation problem under two different semantics for edit operations on XML, and gives streaming algorithms that solve the problem under both the semantics.
Abstract: Visibly Pushdown Languages (VPLs), recognized by Visibly Pushdown Automata (VPAs), are a nicely behaved family of contextfree languages It has been shown that VPAs are equivalent to Extended Document Type Definitions (EDTDs), and thus, they provide means for elegantly solving various problems on XML Especially, it has been shown that VPAs are the apt device for streaming XML
One of the important problems about XML that can be addressed using VPAs is the validation problem in which we need to decide whether an XML document conforms to the specification given by an EDTD In this paper, we are interested in solving the approximate version of this problem, which is to decide whether an XML document can be modified by a tolerable number of edit operations to yield a valid one with respect to a given EDTD
For this, we define Visibly Pushdown Transducers (VPTs) that give us the framework for solving this problem under two different semantics for edit operations on XML While the first semantics is a generalization of edit operations on strings, the second semantics is new and motivated by the special nature of XML documents Usings VPTs, we give streaming algorithms that solve the problem under both the semantics These algorithms use storage space that only depends on the size of the EDTD and the number of tolerable errors Furthermore, they can check approximate validity of an incoming XML document in a single pass over the document, using auxilliary stack space that is proportional to the depth of the XML document
TL;DR: This paper has defined the iStarML model interchange format as a practical solution to the problem of sharing models and results among tools and presents its motivation, objectives and current outcomes, the expected contributions and finally the on going and future work.
Abstract: There are several tools currently available in the i* community with
different purposes. This situation poses both benefits and difficulties. Benefits, because different groups may be able to share their models and results among their tools, and even connect different tools in order to perform complex processes. Difficulties, because most of these tools differ either in the underlying metamodel of the language, or the format in which they store the models, or in both. To overcome the difficulties and exploit the benefits, we have defined the iStarML model interchange format as a practical solution to this problem. In this paper we present the research line which supports this outcome. We present its motivation, objectives and current outcomes, the expected contributions and finally our on going and future work.
TL;DR: The focus of the paper is on the verification of temporal properties of runs of Active XML systems, specified in a tree-pattern based temporal logic, Tree-LTL, that allows expressing a rich class of semantic properties of the application.
Abstract: Active XML is a high-level specification language tailored to data-intensive, distributed, dynamic Web services. Active XML is based on XML documents with embedded function calls. The state of a document evolves depending on the result of internal function calls (local computations) or external ones (interactions with users or other services). Function calls return documents that may be active, so may activate new subtasks. The focus of the paper is on the verification of temporal properties of runs of Active XML systems, specified in a tree-pattern based temporal logic, Tree-LTL, that allows expressing a rich class of semantic properties of the application. The main results establish the boundary of decidability and the complexity of automatic verification of Tree-LTL properties.
TL;DR: A hash-based structural join algorithm, HGJoin, is first proposed to handle reachability queries on graph-structured XML documents, and it is extended to the algorithms to process structural queries in form of bipartite graphs, which have high performance.
Abstract: When XML documents are modeled as graphs, many research issues arise. In particular, there are many new challenges in query processing on graph-structured XML documents because traditional query processing techniques for tree-structured XML documents cannot be directly applied. This paper studies the problem of structural queries on graph-structured XML documents. A hash-based structural join algorithm, HGJoin, is first proposed to handle reachability queries on graph-structured XML documents. Then, it is extended to the algorithms to process structural queries in form of bipartite graphs. Finally, based on these algorithms, a strategy to process subgraph queries in form of general DAGs is proposed. Analysis and experiments show that all the algorithms have high performance. It is notable that all the algorithms above can be slightly modified to process structural queries in form of general graphs.
TL;DR: An XML search engine called CXLEngine is proposed, which is an improvement over OOXSearch and adopts all the techniques of OOX search in addition to new techniques that handle the types of XML trees described above, which OOxSearch does not handle well.
Abstract: We proposed previously in [9] an XML semantic search engine called OOXSearch, which answers loosely structured queries. It takes into account the semantic relationships between data elements based on their contexts. The context of a data element is determined by its parent element. The framework of OOXSearch treats each parent-children set of elements as a single unified entity. OOXSearch works well for all types of XML trees, except when the tree contains a parent that has a child interior element, whose type is the same as the type of its parent (e.g. the parent is "professor and its child interior element is "student" - both professor and student belong to the "person" type). In this paper, we propose an XML search engine called CXLEngine, which is an improvement over OOXSearch. It adopts all the techniques of OOXSearch in addition to new techniques that handle the types of XML trees described above, which OOXSearch does not handle well. We evaluated CXLEngine by comparing it experimentally with OOXSearch and with two other proposed systems, XSEarch [5] and Schema-Free XQuery [8]. The results showed marked improvement.
TL;DR: This paper proposes an automatic strategy for the selection of XML materialized views that exploits a data mining technique, more precisely the clustering of the query workload, and demonstrates its efficiency, even when queries are complex.
Abstract: XML data warehouses form an interesting basis for decision-support applications that exploit complex data. However, native XML database management systems currently bear limited performances and it is necessary to design strategies to optimize them. In this paper, we propose an automatic strategy for the selection of XML materialized views that exploits a data mining technique, more precisely the clustering of the query workload. To validate our strategy, we implemented an XML warehouse modeled along the XCube specifications. We executed a workload of XQuery decision-support queries on this warehouse, with and without using our strategy. Our experimental results demonstrate its efficiency, even when queries are complex.