Top 518 papers published in the topic of Streaming XML in 2007

Showing papers on "Streaming XML published in 2007"

Semantic Annotations for WSDL and XML Schema

[...]

J. Farrell

1 Jan 2007

562 citations

Journal Article•10.1109/TKDE.2007.1060•

Efficiently Querying Large XML Data Repositories: A Survey

[...]

Gang Gou¹, Rada Chirkova¹•Institutions (1)

North Carolina State University¹

01 Oct 2007-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This survey considers two classes of major XML query processing techniques: the relational approach and the native approach, which could result in higher query processing performance and also significantly reduce system reengineering costs.

...read moreread less

Abstract: Extensible markup language (XML) is emerging as a de facto standard for information exchange among various applications on the World Wide Web. There has been a growing need for developing high-performance techniques to query large XML data repositories efficiently. One important problem in XML query processing is twig pattern matching, that is, finding in an XML data tree D all matches that satisfy a specified twig (or path) query pattern Q. In this survey, we review, classify, and compare major techniques for twig pattern matching. Specifically, we consider two classes of major XML query processing techniques: the relational approach and the native approach. The relational approach directly utilizes existing relational database systems to store and query XML data, which enables the use of all important techniques that have been developed for relational databases, whereas in the native approach, specialized storage and query processing systems tailored for XML data are developed from scratch to further improve XML query performance. As implied by existing work, XML data querying and management are developing in the direction of integrating the relational approach with the native approach, which could result in higher query processing performance and also significantly reduce system reengineering costs.

...read moreread less

196 citations

Patent•

Enabling dynamic voiceXML in an X+V page of a multimodal application

[...]

Charles W. Cross¹, Hilary A. Pike¹, Lisa A. Seacat¹, Marc White¹•Institutions (1)

Nuance Communications¹

14 Mar 2007

TL;DR: In this paper, the authors propose an approach to enable dynamic VoiceXML in an X+V page of a multimodal application implemented with the multimodAL application operating in a multimoderal browser on a multi-modal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes.

...read moreread less

Abstract: Enabling dynamic VoiceXML in an X+V page of a multimodal application implemented with the multimodal application operating in a multimodal browser on a multimodal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes, the multimodal application operatively coupled to a VoiceXML interpreter, including representing by the multimodal browser an XML element of a VoiceXML dialog of the X+V page as an ECMAScript object, the XML element comprising XML content; storing by the multimodal browser the XML content of the XML element in an attribute of the ECMAScript object; and accessing the XML content of the XML element in the attribute of the ECMAScript object from an ECMAScript script in the X+V page.

...read moreread less

152 citations

Proceedings Article•

Inferring XML schema definitions from XML data

[...]

Geert Jan Bex¹, Frank Neven¹, Stijn Vansummeren¹•Institutions (1)

University of Hasselt¹

23 Sep 2007

TL;DR: A theoretically complete algorithm is provided that always infers the correct XSD when a sufficiently large corpus of XML documents is available and a variant of this algorithm is presented that works well on real-world data sets.

...read moreread less

Abstract: Although the presence of a schema enables many optimizations for operations on XML documents, recent studies have shown that many XML documents in practice either do not refer to a schema, or refer to a syntactically incorrect one. It is therefore of utmost importance to provide tools and techniques that can automatically generate schemas from sets of sample documents. While previous work in this area has mostly focused on the inference of Document Type Definitions (DTDs for short), we will consider the inference of XML Schema Definitions (XSDs for short) --- the increasingly popular schema formalism that is turning DTDs obsolete. In contrast to DTDs where the content model of an element depends only on the element's name, the content model in an XSD can also depend on the context in which the element is used. Hence, while the inference of DTDs basically reduces to the inference of regular expressions from sets of sample strings, the inference of XSDs also entails identifying from a corpus of sample documents the contexts in which elements bear different content models. Since a seminal result by Gold implies that no inference algorithm can learn the complete class of XSDs from positive examples only, we focus on a class of XSDs that captures most XSDs occurring in practice. For this class, we provide a theoretically complete algorithm that always infers the correct XSD when a sufficiently large corpus of XML documents is available. In addition, we present a variant of this algorithm that works well on real-world (and therefore incomplete) data sets.

...read moreread less

134 citations

Patent•

Device control system employing extensible markup language for defining information resources

[...]

Gang Wang, Matteo Contolini, Chengyi Zheng, Heinz-Werner Stiller

17 Oct 2007

TL;DR: In this article, a device control system including at least one device operable by the system, a processor, and two or more communication components, each communication component including an XML parser for parsing the XML document and extracting the message data.

...read moreread less

Abstract: A device control system including at least one device operable by the system, at least one processor, software executing on the at least one processor for receiving message data and determining a corresponding XML document type, software executing on the at least one processor for generating a XML document based on the XML document type, the XML document including the message data, software executing on the processor for packetizing the XML document, and two or more communication components, each communication component including an XML parser for parsing the XML document and extracting the message data.

...read moreread less

113 citations

Journal Article•10.1016/J.DATAK.2005.11.008•

Node labeling schemes for dynamic XML documents reconsidered

[...]

Theo Härder¹, Michael Peter Haustein¹, Christian Mathis¹, Markus Wagner¹•Institutions (1)

Kaiserslautern University of Technology¹

1 Jan 2007

TL;DR: This paper evaluates existing range-based and prefix-based labeling schemes, before proposing its own scheme based on DeweyIDs, which is experimentally explored as a general and immutable node labeling mechanism, stress its synergetic potential for query processing and locking, and show how it can be implemented efficiently.

...read moreread less

Abstract: We explore suitable node labeling schemes used in collaborative XML DBMSs (XDBMSs, for short) supporting typical XML document processing interfaces. Such schemes have to provide holistic support for essential XDBMS processing steps for declarative as well as navigational query processing and, with the same importance, lock management. In this paper, we evaluate existing range-based and prefix-based labeling schemes, before we propose our own scheme based on DeweyIDs. We experimentally explore its suitability as a general and immutable node labeling mechanism, stress its synergetic potential for query processing and locking, and show how it can be implemented efficiently. Various compression and optimization measures deliver surprising space reductions, frequently reduce the size of storage representation-compared to an already space-efficient encoding scheme-to less than 20-30% in the average and, thus, conclude their practical relevance.

...read moreread less

109 citations

Patent•

Health integration platform API

[...]

Sean Nolan¹, Jeffrey Dick Jones¹, Johnson T. Apacible¹, Vijay Varadan¹•Institutions (1)

Microsoft¹

1 Nov 2007

TL;DR: In this article, an application program interface (API) is provided for requesting, storing, and accessing data within a health integration network, which facilitates secure and seamless access to the centrally-stored data by offering authentication/authorization, as well as the ability to receive requests in an extensible language format, such as XML, and returns resulting data in XML format.

...read moreread less

Abstract: An application program interface (API) is provided for requesting, storing, and otherwise accessing data within a health integration network. The API facilitates secure and seamless access to the centrally-stored data by offering authentication/authorization, as well as the ability to receive requests in an extensible language format, such as XML, and returns resulting data in XML format. The data can also have transformation, style and/or schema information associated with it which can be returned in the resulting XML and/or applied to the data beforehand by the API. The API can be utilized in many environment architectures including XML over HTTP and a software development kit (SDK).

...read moreread less

93 citations

Journal Article•10.1016/J.IS.2005.12.008•

Efficient schema-based XML-to-Relational data mapping

[...]

Mustafa Atay¹, Artem Chebotko¹, Dapeng Liu¹, Shiyong Lu¹, Farshad Fotouhi¹ - Show less +1 more•Institutions (1)

Wayne State University¹

01 May 2007-Information Systems

TL;DR: A lossless schema mapping algorithm to generate a database schema from a DTD, which makes several improvements over existing algorithms, and two linear data mapping algorithms based on DOM and SAX, respectively, to map ordered XML data to relational data are proposed.

...read moreread less

91 citations

Proceedings Article•10.1145/1247480.1247590•

An XML transaction processing benchmark

[...]

Matthias Nicola¹, Irina Kogan¹, Berni Schiefer¹•Institutions (1)

IBM¹

11 Jun 2007

TL;DR: This paper has developed an application-oriented and domain-specific benchmark called "Transaction Processing over XML" (TPoX), which exercises all aspects of XML databases, including storage, indexing, logging, transaction processing, and concurrency control.

...read moreread less

Abstract: XML database functionality has been emerging in "XML-only" databases as well as in the major relational database products. Yet, there is no industry standard XML database benchmark to evaluate alternative implementations. The research community has proposed several benchmarks which are all useful in their respective scope, such as evaluating XQuery processors. However, they do not aim to evaluate a database system in its entirety and do not represent all relevant characteristics of a real-world XML application. Often they only define read-only single-user tests on a single XML document. We have developed an application-oriented and domain-specific benchmark called "Transaction Processing over XML" (TPoX). It exercises all aspects of XML databases, including storage, indexing, logging, transaction processing, and concurrency control. Based on our analysis of real XML applications, TPoX simulates a financial multi-user workload with XML data conforming to the FIXML standard. In this paper we describe TPoX and present early performance results. We also make its implementation publicly available.

...read moreread less

87 citations

Journal Article•10.1016/J.JCSS.2006.10.022•

Propagating XML constraints to relations

[...]

Susan B. Davidson¹, Wenfei Fan², Carmem S. Hara³•Institutions (3)

University of Pennsylvania¹, University of Edinburgh², Federal University of Paraná³

01 May 2007-Journal of Computer and System Sciences

TL;DR: The ability to compute XML key propagation is a first step toward establishing a connection between XML data and its relational representation at the semantic level.

...read moreread less

80 citations

Proceedings Article•10.1145/1247480.1247512•

Efficient algorithms for evaluating xpath over streams

[...]

Gang Gou¹, Rada Chirkova¹•Institutions (1)

North Carolina State University¹

11 Jun 2007

TL;DR: This paper proposes two O(|D||Q|)-time stream-querying algorithms, LQ and EQ, which are based on the lazy strategy and on the eager strategy, respectively, and are the first XPath stream-quireying algorithms that achieve O( |D|| Q|) time performance.

...read moreread less

Abstract: In this paper we address the problem of evaluating XPath queries over streaming XML data We consider a practical XPath fragment called Univariate XPath, which includes the commonly used '/' and '//' axes and allows *-node tests and arbitrarily nested predicates It is well known that this XPath fragment can be efficiently evaluated in O(|D||Q|) time in the non-streaming environment, where |D| is the document size and |Q| is the query size However, this is not necessarily true in the streaming environment, since streaming algorithms have to satisfy stricter requirement than non-streaming algorithms, in that all data must be read sequentially in one pass Therefore, it is not surprising that state-of-the-art stream-querying algorithms have higher time complexity than O(|D||Q|) In this paper we revisit the XPath stream-querying problem, and show that Univariate XPath can be efficiently evaluated in O|D||Q|) time in the streaming environment Specifically, we propose two O(|D||Q|)-time stream-querying algorithms, LQ and EQ, which are based on the lazy strategy and on the eager strategy, respectively To the best of our knowledge, LQ and EQ are the first XPath stream-querying algorithms that achieve O(|D||Q|) time performance Further, our algorithms achieve O(|D||Q|) time performance without trading off space performance Instead, they have better buffering-space performance than state-of-the-art stream-querying algorithms In particular, EQ achieves optimal buffering-space performance Our experimental results show that our algorithms have not only good theoretical complexity but also considerable practical performance advantages over existing algorithms

...read moreread less

Journal Article•10.1016/J.DATAK.2006.06.015•

An efficient infrastructure for native transactional XML processing

[...]

Michael Peter Haustein¹, Theo Härder¹•Institutions (1)

Kaiserslautern University of Technology¹

1 Jun 2007

TL;DR: The core part of the paper describes the infrastructural services for XML document storage with compressed DeweyIDs, the principles and methods for navigational and declarative processing of queries, as well as the lock modes and protocols to enable efficient collaboration.

...read moreread less

Abstract: Implementation techniques for relational database management systems (DBMSs) have proven their efficiency and robustness in many existing systems. However, many of these concepts and mechanisms cannot be used when implementing a native XML DBMS (XDBMS) because of substantial differences in the processing properties of natively stored XML documents as compared to relational tables. Therefore, we have to develop new and appropriate techniques with ACID transaction guarantees tailored to the processing characteristics of tree documents and the operations on them. For this reason, we want to provide for an efficient infrastructure of XDBMSs consisting of tree node addressing and indexing together with fine-grained locking of tree nodes. In this respect, our prime and novel contribution is to reveal the potential of our prefix-based node labeling called DeweyIDs supporting record addressing, indexing, and locking protocols. In this paper, we first sketch our version of prefix-based node labeling and summarize a quantitative study on them. An overview of our layered XDBMS architecture indicates the concepts and functionalities to be reused from relational DBMS implementations. The core part of the paper describes the infrastructural services for XML document storage with compressed DeweyIDs, the principles and methods for navigational and declarative processing of queries, as well as the lock modes and protocols to enable efficient collaboration. Selected empirical experiments evaluate the XTC system performance and support our system assessment.

...read moreread less

Journal Article•10.1145/1276920.1276926•

Probabilistic interval XML

[...]

Edward Hung¹, Lise Getoor², V. S. Subrahmanian²•Institutions (2)

Hong Kong Polytechnic University¹, University of Maryland, College Park²

01 Aug 2007-ACM Transactions on Computational Logic

TL;DR: This paper proposes the Probabilistic Interval XML (PIXML for short) data model, and provides an operational semantics that may be used to compute answers to queries and that is correct for a large class of probabilistic instances.

...read moreread less

Abstract: Interest in XML databases has been expanding rapidly over the last few years. In this paper, we study the problem of incorporating probabilistic information into XML databases. We propose the Probabilistic Interval XML (PIXML for short) data model in this paper. Using this data model, users can express probabilistic information within XML markups. In addition, we provide two alternative formal model-theoretic semantics for PIXML data. The first semantics is a “global” semantics which is relatively intuitive, but is not directly amenable to computation. The second semantics is a “local” semantics which supports efficient computation. We prove several correspondence results between the two semantics. To our knowledge, this is the first formal model theoretic semantics for probabilistic interval XML. We then provide an operational semantics that may be used to compute answers to queries and that is correct for a large class of probabilistic instances.

...read moreread less

Patent•

Rewriting node reference-based XQuery using SQL/SML

[...]

Zhen Hua Liu¹, Hui Joe Chang¹, James W. Warner¹•Institutions (1)

Business International Corporation¹

13 Dec 2007

TL;DR: Reference-based SQL/XML operators as discussed by the authors return a reference to a node to determine whether the corresponding node comes logical before, after, or is the same as another node.

...read moreread less

Abstract: Techniques for processing reference-based SQL/XML operators are provided. Instead of extracting copies of one or more nodes from XML data, a reference-based operator returns a reference to a node. Such a reference is used to determine, for example, whether the corresponding node comes logical before, after, or is the same as another node. An SQL/XML query that includes a reference-based operator may be the original query, or may be generated (e.g., rewritten) from a non-SQL/XML query, such as an XQuery query. One or more physical rewrites may be performed on the SQL/XML query, depending on how the XML data is stored and/or whether an XML index exists for the XML data.

...read moreread less

Patent•

WYSIWYG, browser-based XML editor

[...]

Daniel G. Zarzar¹, Alberto Swett¹•Institutions (1)

Microsoft¹

29 Jun 2007

TL;DR: In this article, computer-implemented methods and computer-readable storage media are disclosed for facilitating browser-based, what-you-see-is-whatyou-get (WYSIWYG) editing of an extensible markup language (XML) file.

...read moreread less

Abstract: Computer-implemented methods and computer-readable storage media are disclosed for facilitating browser-based, what-you-see-is-what-you-get (WYSIWYG) editing of an extensible markup language (XML) file. A browser executing on a local computing system is used to access a hypertext markup language (HTML) representation of an extensible markup language (XML) file. The HTML representation includes a plurality of elements of the XML file formatted in accordance with an extensible stylesheet language (XSL) transform associated with the XML file. A plurality of editing handlers is inserted within the HTML representation to facilitate modifying the HTML representation and applying the changes to the XML file. A user is permitted to modify the HTML representation for purposes of applying the modifications to the XML file.

...read moreread less

Proceedings Article•10.1145/1272457.1272462•

Parallel XML processing by work stealing

[...]

Wei Lu¹, Dennis Gannon¹•Institutions (1)

Indiana University¹

25 Jun 2007

TL;DR: A stealing-based dynamic load-balancing mechanism, called ThreadCrew, by which multiple threads are able to process the disjointed parts of the XML document in parallel with balanced load distribution, and a novel mechanism to trace the stealing actions is provided.

...read moreread less

Abstract: A language for semi-structured documents, XML has emerged as the core of the web services architecture, and is playing crucial roles in messaging systems, databases, and document processing. However, the processing of XML documents has been regarded as the performance bottleneck in most systems and applications. On the other side, the multicore processor, emerged as a solution for the clock-speed limitation of the modern CPUs, has been growingly prevalent. Leveraging the parallelism provided by the multicorere source to speedup the software execution is becoming the trend of the software development. In this paper, we present a parallel processing model for the XML document. The model is not designed just for a specific XML processing task, instead, it is a general model, by which we are able to explore various parallel XML document processing. The kernel of the model is a stealing-based dynamic load-balancing mechanism, called ThreadCrew, by which multiple threads are able to process the disjointed parts of the XML document in parallel with balanced load distribution. The model also provides a novel mechanism to trace the stealing actions, thus the equivalent sequential result can be gotten by gluing the multiple parallel-running results together. To show the feasibility and effectiveness of our approaches, we present our C# implementation of parallel XML serialization in this paper. Our empirical study shows our parallel XML serialization algorithm can improved the XML serializing performance significantly on a multicore machine.

...read moreread less

Patent•

Index Structure for Supporting Structural XML Queries

[...]

Wei Fan¹, Haixun Wang¹, Philip S. Yu¹•Institutions (1)

IBM¹

19 Jul 2007

TL;DR: ViST as mentioned in this paper is a novel index structure for searching XML documents that uses tree structures as the basic unit of query to avoid expensive join operations, and provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over indexing either just content or structure.

...read moreread less

Abstract: The present invention provides a ViST (or “virtual suffix tree”), which is a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, it is shown that querying XML data is equivalent to finding (non-contiguous) subsequence matches. A variety of XML queries, including those with branches, or wild-cards (‘*’ and ‘//’), can be expressed by structure-encoded sequences. Unlike index methods that disassemble a query into multiple sub-queries, and then join the results of these sub-queries to provide the final answers, ViST uses tree structures as the basic unit of query to avoid expensive join operations. Furthermore, ViST provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over methods indexing either just content or structure. ViST supports dynamic index update, and it relies solely on B+Trees without using any specialized data structures that are not well supported by common database management systems (hereinafter referred to as “DBMSs”).

...read moreread less

Patent•

Method for loading large XML documents on demand

[...]

William D. Clarke¹, Tao Zhan¹•Institutions (1)

Pitney Bowes¹

23 Apr 2007

TL;DR: In this paper, the system provides a Wrapper class for the XML Document class and the Element class, which can be used to access external components as required by a user application.

...read moreread less

Abstract: Systems and methods for loading XML documents on demand are described. The system provides a Wrapper class for the XML Document class and the Element class. A user application then utilizes the Wrapper class in the same way that the Element class and Document class would be used to access any element in the XML Document. The Wrapper class loads external components as required. The external component retrieval is completely transparent to the user application and the user application is able to access the entire XML document as if it were completely loaded into a DOM object in memory. Accordingly, each element is accessible in a random manner. In one configuration, the XML document components or external components are stored in a database in a BLOB field as a Digital Document. The system uses external components to efficiently use resources as compared to systems using Xlink and external entities.

...read moreread less

Patent•

Method and system for providing XML-based asynchronous and interactive feeds for web applications

[...]

Alexander Kordun¹, Neil J. Schultz¹•Institutions (1)

IBM¹

26 Jun 2007

TL;DR: A system for providing XML-based asynchronous and interactive feeds for Web applications that provides a highly efficient and extensible XML Javascript framework allowing easy insertion of a comment/news feed control into any Web page as discussed by the authors.

...read moreread less

Abstract: A system for providing XML-based asynchronous and interactive feeds for Web applications that provides a highly efficient and extensible XML Javascript framework allowing easy insertion of a comment/news feed control into any Web page. The framework allows for reading of any XML format and provides a new and easy way for modifying the look-and-feel of the control via HTML templates with familiar XPath bindings. The rendering performed through the system supports both flat and indented (“threaded”) views for a comment thread. The system improves the parsing speed of incoming XML, and supports a flexible event model for others to develop plug-ins and mashups in the spirit of Web 2.0.

...read moreread less

Proceedings Article•10.1145/1242572.1242715•

Mapping-driven XML transformation

[...]

Haifeng Jiang¹, Howard Ho¹, Lucian Popa¹, Wook-Shin Han²•Institutions (2)

IBM¹, Kyungpook National University²

8 May 2007

TL;DR: This paper presents a three-phase framework for high-performance XML-to-XML transformation based on schema mappings, and elaborate on novel techniques such as streamed extraction of mapped source values and scalable disk-based merging of overlapping data.

...read moreread less

Abstract: Clio is an existing schema-mapping tool that provides user-friendly means to manage and facilitate the complex task of transformation and integration of heterogeneous data such as XML over the Web or in XML databases. By means of mappings from source to target schemas, Clio can help users conveniently establish the precise semantics of data transformation and integration. In this paper we study the problem of how to efficiently implement such data transformation (i.e., generating target data from the source data based on schema mappings). We present a three-phase framework for high-performance XML-to-XML transformation based on schema mappings, and discuss methodologies and algorithms for implementing these phases. In particular, we elaborate on novel techniques such as streamed extraction of mapped source values and scalable disk-based merging of overlapping data (including duplicate elimination). We compare our transformation framework with alternative methods such as using XQuery or SQL/XML provided by current commercial databases. The results demonstrate that the three-phase framework (although as simple as it is) is highly scalable and outperforms the alternative methods by orders of magnitude.

...read moreread less

Proceedings Article•10.1145/1242572.1242717•

Querying and maintaining a compact XML storage

[...]

Raymond K. Wong, Franky Lam, William M. Shui

8 May 2007

TL;DR: This paper presents a new storage scheme for XML data that supports all navigational operations in near constant time, and features a small memory footprint that increases cache locality, whilst still supporting standard APIs and necessary database operations, such as queries and updates, efficiently.

...read moreread less

Abstract: As XML database sizes grow, the amount of space used for storing the data and auxiliary data structures becomes a major factor in query and update performance. This paper presents a new storage scheme for XML data that supports all navigational operations in near constant time. In addition to supporting efficient queries, the space requirement of the proposed scheme is within a constant factor of the information theoretic minimum, while insertions and deletions can be performed in near constant time as well. As a result, the proposed structure features a small memory footprint that increases cache locality, whilst still supporting standard APIs, such as DOM, and necessary database operations, such as queries and updates, efficiently. Analysis and experiments show that the proposed structure is space and time efficient.

...read moreread less

Patent•

System and method of xml based content fragmentation for rich media streaming

[...]

Vidya Setlur¹, Ramakrishna Vedantham¹•Institutions (1)

Nokia¹

12 Jul 2007

TL;DR: In this article, a system and method for partitioning XML-based content into fragments, where transport packets are generated for encapsulating the fragments and streaming the encapsulated fragments to a receiver, such as a mobile device.

...read moreread less

Abstract: A system and method for partitioning XML-based content into fragments, where transport packets are generated for encapsulating the fragments and streaming the encapsulated fragments to a receiver, such as a mobile device. Fragmentation of the XML-based content can be performed either with or without regard for any underlying XML syntax or structure. In either case, certain relevant fragmentation information is encapsulated with the fragmented XML-based content in the transport packets that allow for various reconstruction, error concealment, and retransmission schemes for presenting the streamed XML-based content on/to the receiver.

...read moreread less

Proceedings Article•10.1145/1242572.1242841•

Preserving XML queries during schema evolution

[...]

Mirella M. Moro¹, Susan Malaika², Lipyeow Lim²•Institutions (2)

University of California, Riverside¹, IBM²

8 May 2007

TL;DR: A taxonomy of changes for XML schema evolution is described and guidelines for writing queries in such a way that they continue to operate as expected across evolving schemas are proposed.

...read moreread less

Abstract: In XML databases, new schema versions may be released as frequently as once every two weeks. This poster describes a taxonomy of changes for XML schema evolution. It examines the impact of those changes on schema validation and query evaluation. Based on that study, it proposes guidelines for XML schema evolution and for writing queries in such a way that they continue to operate as expected across evolving schemas.

...read moreread less

Journal Article•10.1016/J.DATAK.2006.01.002•

A space efficient XML DOM parser

[...]

Fangju Wang¹, Jing Li¹, Hooman Homayounfar¹•Institutions (1)

University of Guelph¹

1 Jan 2007

TL;DR: This research develops a space efficient DOM parser, called SEDOM, based on a new compression approach and a set of manipulation algorithms, which enable many DOM operations to be performed when the data are in the compressed format, and allow individual parts of a document to be compressed, decompressed and manipulated.

...read moreread less

Abstract: In many XML applications, parsing is a key operation. When the processing involves modifying data, random access, and/or in an order different from the one in which elements are stored, a DOM parser has to be used. A major problem with using a DOM parser is memory consumption. The size of a DOM tree created from an XML document may be as large as 10 times of the size of the original document. Maintaining the tree of a big document requires a large amount of memory. It may cause costly swapping. In the worst cases, a DOM parser cannot handle a document at all because of its size. In this research, we develop a space efficient DOM parser, called SEDOM. It is based on a new compression approach and a set of manipulation algorithms, which enable many DOM operations to be performed when the data are in the compressed format, and allow individual parts of a document to be compressed, decompressed and manipulated. It can be used to efficiently manipulate very large XML documents. In this paper, we describe SEDOM, and compare its performance with three existing DOM parsers and an XML compressor.

...read moreread less

Book Chapter•10.4018/978-1-59904-228-2.CH003•

An Overviewof Similarity Measures for Clustering XML Documents

[...]

Giovanna Guerrini¹, Marco Mesiti², Ismael Sanz³•Institutions (3)

University of Genoa¹, University of Milan², James I University³

1 Jan 2007

TL;DR: This chapter discusses and compares the most relevant similarity measures and their employment for XML document clustering and compares link-based similarity approaches developed for Web data clustering for XML documents.

...read moreread less

Abstract: The large amount and heterogeneity of XML documents on the Web requires the development of clustering techniques to group together similar documents. Documents can be grouped together according to their content, their structure, and the links inside and among the documents. For instance, grouping together documents with similar structure has interesting applications in the context of information extraction, heterogeneous data integration, personalized content delivery, access-control definition, Web site structural analysis, and the comparison of RNA secondary structures. Many approaches have been proposed for evaluating the structural and content similarity between tree-based and vector-based repIDEA GROUP PUBLISHING This paper appears in the publication, Web Data Management Practices: Emerging Techniques and Technologies edited by Athena Vakali and George Pallis © 2007, Idea Group Inc. 701 E. Chocolate Avenue, Suite 200, Hershey PA 17033-1240, USA Tel: 717/533-8845; Fax 717/533-8661; URL-http://www.idea-group.com ITB13451 An Overview of Similarity Measures for Clustering XML Documents 57 Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. resentations of XML documents. Link-based similarity approaches developed for Web data clustering have been adapted for XML documents. This chapter discusses and compares the most relevant similarity measures and their employment for XML document clustering.

...read moreread less

Patent•

Method and system for performing operations on data using XML streams

[...]

Arun T. Jacob

5 Oct 2007

TL;DR: In this paper, the authors present a method and system for performing operations on data using XML streams, such as addition, subtraction, multiplication, and division, in XML data.

...read moreread less

Abstract: The present invention provides a method and system for performing operations on data using XML streams. An XML schema defines a limited set of operations that may be performed on data. These operations include addition, subtraction, multiplication and division. The operations are placed in an XML stream that conforms to the XML schema. The XML stream may perform one or more of the defined operations on the data. The limited set of operations allows data to be validated and processed without excessive overhead.

...read moreread less

Journal Article•10.1007/S00778-005-0169-1•

Attribute grammars for scalable query processing on XML streams

[...]

Christoph Koch¹, Stefanie Scherzinger¹•Institutions (1)

Saarland University¹

1 Jul 2007

TL;DR: XSAGs are the first scalable query language for XML streams that allows for actual data transformations rather than just document filtering and the XSAG formalism provides a strong intuition for which queries can or cannot be processed scalably on streams.

...read moreread less

Abstract: We introduce the notion of XML Stream Attribute Grammars (XSAGs). XSAGs are the first scalable query language for XML streams (running strictly in linear time with bounded memory consumption independent of the size of the stream) that allows for actual data transformations rather than just document filtering. XSAGs are also relatively easy to use for humans. Moreover, the XSAG formalism provides a strong intuition for which queries can or cannot be processed scalably on streams. We introduce XSAGs together with the necessary language-theoretic machinery, study their theoretical properties such as expressiveness and complexity, and discuss their implementation.

...read moreread less

Journal Article•

Coupled schema transformation and data conversion for XML and SQL

[...]

Pablo Berdaguer, Alcino Cunha, Hugo Pacheco, Joost Visser

01 Jan 2007-Lecture Notes in Computer Science

TL;DR: It is shown how the system can be used to tackle various two-level transformation scenarios, such as XML schema evolution coupled with document migration, and hierarchical-relational data mappings that convert between XML documents and SQL databases.

...read moreread less

Abstract: A two-level data transformation consists of a type-level transformation of a data format coupled with value-level transformations of data instances corresponding to that format. We have implemented a system for performing two-level transformations on XML schemas and their corresponding documents, and on SQL schemas and the databases that they describe. The core of the system consists of a combinator library for composing type-changing rewrite rules that preserve structural information and referential constraints. We discuss the implementation of the system's core library, and of its SQL and XML front-ends in the functional language Haskell. We show how the system can be used to tackle various two-level transformation scenarios, such as XML schema evolution coupled with document migration, and hierarchical-relational data mappings that convert between XML documents and SQL databases.

...read moreread less

Patent•

Generating a transition system for use with model checking

[...]

Ralf Huuck, Ansgar Fehnker, Patrick Jayet, Felix Rauch

12 Sep 2007

TL;DR: In this paper, a transition system and an extensible markup language (XML) representation of the data is generated by querying the XML representation using (markup) query language.

...read moreread less

Abstract: The invention concerns model program analysis of software code using model checking. Initially, a transition system (22) and an extensible markup language (XML) (24) representation of the data is generated. Next, labels (26) for the transition system are generated by querying the XML representation of the data using (markup) query language. The labels and the structure of the transition system are then used as input to model checking techniques to analyse the software code (28). It is an advantage of the invention that the problem of labelling a transition system can be transformed into the XML domain so that detailed information about the software code can be extracted using queries in a format that can be run in the XML domain which are well known. At the same time the transformation to the XML domain does not prevent the use of efficient model checking technologies.

...read moreread less

Book Chapter•10.1007/978-3-540-75987-4_4•

A methodology for coupling fragments of XPath with structural indexes for XML documents

[...]

George H. L. Fletcher¹, Dirk Van Gucht², Yuqing Wu², Marc Gyssens³, Sofia Brenes², Jan Paredaens⁴ - Show less +2 more•Institutions (4)

Washington State University Vancouver¹, Indiana University², University of Hasselt³, University of Antwerp⁴

23 Sep 2007

TL;DR: In XPath query evaluation, indices similar to those used in relational database systems - namely, value indices on tags and text values - are first used, together with structural join algorithms, which turn out to be simple and efficient.

...read moreread less

Abstract: Supporting efficient access to XML data using XPath [3] continues to be an important research problem [6, 12]. XPath queries are used to specify nodelabeled trees which match portions of the hierarchical XML data. In XPath query evaluation, indices similar to those used in relational database systems - namely, value indices on tags and text values - are first used, together with structural join algorithms [1, 2, 19]. This approach turns out to be simple and efficient. However, the structural containment relationships native to XML data are not directly captured by value indices.

...read moreread less

...

Expand