Top 415 papers published in the topic of Streaming XML in 2009

Showing papers on "Streaming XML published in 2009"

Proceedings Article•10.1145/1559845.1559921•

DDE: from dewey to a fully dynamic XML labeling scheme

[...]

Liang Xu¹, Tok Wang Ling¹, Huayu Wu¹, Zhifeng Bao¹•Institutions (1)

29 Jun 2009

TL;DR: A novel labeling scheme called DDE (for Dynamic DEwey) which is tailored for both static and dynamic XML documents which can completely avoid re-labeling and its label quality is most resilient to the number and order of insertions compared to the existing approaches.

...read moreread less

Abstract: Labeling schemes lie at the core of query processing for many XML database management systems. Designing labeling schemes for dynamic XML documents is an important problem that has received a lot of research attention. Existing dynamic labeling schemes, however, often sacrifice query performance and introduce additional labeling cost to facilitate arbitrary updates even when the documents actually seldom get updated. Since the line between static and dynamic XML documents is often blurred in practice, we believe it is important to design a labeling scheme that is compact and efficient regardless of whether the documents are frequently updated or not. In this paper, we propose a novel labeling scheme called DDE (for Dynamic DEwey) which is tailored for both static and dynamic XML documents. For static documents, the labels of DDE are the same as those of dewey which yield compact size and high query performance. When updates take place, DDE can completely avoid re-labeling and its label quality is most resilient to the number and order of insertions compared to the existing approaches. In addition, we introduce Compact DDE (CDDE) which is designed to optimize the performance of DDE for insertions. Both DDE and CDDE can be incorporated into existing systems and applications that are based on dewey labeling scheme with minimum efforts. Experiment results demonstrate the benefits of our proposed labeling schemes over the previous approaches.

...read moreread less

92 citations

Journal Article•10.1145/1620585.1620590•

Static analysis of active XML systems

[...]

Serge Abiteboul¹, Luc Segoufin², Victor Vianu³•Institutions (3)

French Institute for Research in Computer Science and Automation¹, École normale supérieure de Cachan², University of California, San Diego³

14 Dec 2009-ACM Transactions on Database Systems

TL;DR: This article focuses on the verification of temporal properties of runs of Active XML systems, specified in a tree-pattern-based temporal logic, Tree-LTL, which allows expressing a rich class of semantic properties of the application.

...read moreread less

Abstract: Active XML is a high-level specification language tailored to data-intensive, distributed, dynamic Web services. Active XML is based on XML documents with embedded function calls. The state of a document evolves depending on the result of internal function calls (local computations) or external ones (interactions with users or other services). Function calls return documents that may be active, and so may activate new subtasks. The focus of this article is on the verification of temporal properties of runs of Active XML systems, specified in a tree-pattern-based temporal logic, Tree-LTL, which allows expressing a rich class of semantic properties of the application. The main results establish the boundary of decidability and the complexity of automatic verification of Tree-LTL properties.

...read moreread less

59 citations

XML Metadata Interchange.

[...]

Michael Weiss

1 Jan 2009

TL;DR: In this article, a semiconductor device and method for fabricating the same, which can maintain a threshold voltage constant despite of decreased channel width, is disclosed, and the device including a first, and a second conductive type wells in a substrate, a first gate electrode on the first gate insulating film, the second gate electrode being doped with a secondconductive type except for edges of the first gateway electrode in a channel width direction counter, and isolating regions formed between the first-and second-gate electrodes.

...read moreread less

Abstract: Semiconductor device and method for fabricating the same, is disclosed, which can maintain a threshold voltage constant despite of decreased channel width, the device including a first, and a second conductive type wells in a substrate, a first, and a second gate insulating films on the first, and the second conductive type wells, a first gate electrode on the first gate insulating film, the first gate electrode being doped with a second conductive type except for edges of the first gate electrode in a channel width direction counter doped with a first conductive type, a second gate electrode on the second gate insulating film, the second gate electrode being doped with a first conductive type except for edges of the second gate electrode in a channel width direction counter doped with a second conductive type, and isolating regions formed between the first, and second conductive type wells, the first, and second gate insulating films, and the first, and second gate electrodes.

...read moreread less

51 citations

Journal Article•10.6688/JISE.2009.25.5.7•

Measuring and Evaluating a Design Complexity Metric for XML Schema Documents

[...]

Dilek Basci, Sanjay Misra

01 Sep 2009-Journal of Information Science and Engineering

TL;DR: An attempt has been made to evaluate the quality of XML schema documents (XSD) written in W3C XML Schema language with a metric, which measures the complexity due to the internal architecture of XSD components, and due to recursion.

...read moreread less

Abstract: The eXtensible Markup Language (XML) has been gaining extraordinary acceptance from many diverse enterprise software companies for their object repositories, data interchange, and development tools. Further, many different domains, organizations and content providers have been publishing and exchanging information via internet by the usage of XML and standard schemas. Efficient implementation of XML in these domains requires well designed XML schemas. In this point of view, design of XML schemas plays an extremely important role in software development process and needs to be quantified for ease of maintainability. In this paper, an attempt has been made to evaluate the quality of XML schema documents (XSD) written in W3C XML Schema language. We propose a metric, which measures the complexity due to the internal architecture of XSD components, and due to recursion. This is the single metric, which cover all major factors responsible for complexity of XSD. The metric has been empirically and theoretically validated, demonstrated with examples and supported by comparison with other well known structure metrics applied on XML schema documents.

...read moreread less

44 citations

Proceedings Article•

Boosting XML Filtering with a Scalable FPGA-based Architecture

[...]

Abhishek Mitra¹, Marcos R. Vieira¹, Petko Bakalov¹, Walid Najjar¹, Vassilis J. Tsotras¹ - Show less +1 more•Institutions (1)

University of California, Riverside¹

1 Dec 2009

TL;DR: This work proposes a "pure hardware" based solution, which utilizes XPath query blocks on FPGA to solve the filtering problem and achieves drastically better through put than the existing software or mixed (hardware/software) architectures.

...read moreread less

Abstract: growing amount of XML encoded data exchanged over the In- ternet increases the importance of XML based publish-subscribe (pub-sub) and content based routing systems. The input in such systems typically consists of a stream of XML documents and a set of user subscriptions expressed as XML queries. The pub-sub system then filters the published documents and passes them t o the subscribers. Pub-sub systems are characterized by very high input ratios, therefore the processing time is critical. In this p aper we propose a "pure hardware" based solution, which utilizes XPath query blocks on FPGA to solve the filtering problem. By utiliz - ing the high throughput that an FPGA provides for parallel pro- cessing, our approach achieves drastically better through put than the existing software or mixed (hardware/software) architectures. The XPath queries (subscriptions) are translated to regular expres- sions which are then mapped to FPGA devices. By introducing stacks within the FPGA we are able to express and process a wide range of path queries very efficiently, on a scalable environ ment. Moreover, the fact that the parser and the filter processing a re per- formed on the same FPGA chip, eliminates expensive communi- cation costs (that a multi-core system would need) thus enabling very fast and efficient pipelining. Our experimental evalua tion re- veals more than one order of magnitude improvement compared to traditional pub/sub systems.

...read moreread less

38 citations

Patent•

Apparatus, system, and method for efficient content indexing of streaming XML document content

[...]

James Peter Branigan¹, David P. Charboneau¹, Simon K. Johnston¹•Institutions (1)

IBM¹

1 Jun 2009

TL;DR: An apparatus, system, and method for efficient content indexing of streaming XML document content is described in this paper, where a tree generator generates XML pattern forests from a set of structured index path expressions, the XML pattern forest includes trees and twigs generated from structured index expressions uniquely associated with a namespace indicator for an XML node.

...read moreread less

Abstract: An apparatus, system, and method are disclosed for efficient content indexing of streaming XML document content A forest generator generates an XML pattern forest from a set of structured index path expressions, the XML pattern forest includes trees and twigs generated from structured index path expressions uniquely associated with a namespace indicator for an XML node The XML node is identified in a stream of at least one XML document A comparison module compares the XML node to nodes of trees and twigs of the XML pattern forest A determination module determines a match between the XML node and an index node in one of a tree and a twig of the XML pattern forest The index node has a path from an ancestor node to the index node that matches the axis steps of at least one of the structured index path expressions A storage module stores an index entry for the XML node in response to the determined match, the index entry includes a XML document identifier, an XML node name, a namespace indicator for the XML node, and XML node content

...read moreread less

36 citations

Journal Article•10.1016/J.JSS.2009.01.007•

Extending path summary and region encoding for efficient structural query processing in native XML databases

[...]

Su-Cheng Haw¹, Chien-Sing Lee¹•Institutions (1)

Multimedia University¹

01 Jun 2009-Journal of Systems and Software

TL;DR: TwigX-Guide is presented, a hybrid system, which takes advantage of the beautiful features of path summary in DataGuide and region encoding in TwigStack to improve complex query processing.

...read moreread less

36 citations

Querying XML : benchmarks and recursion

[...]

L. Afanasiev

1 Jan 2009

TL;DR: This dissertation aims to provide a history of web exceptionalism from 1989 to 2002, a period chosen in order to explore its roots as well as specific cases up to and including the year in which descriptions of “Web 2.0” began to circulate.

...read moreread less

Abstract: Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: http://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.

...read moreread less

34 citations

Journal Article•10.1016/J.IS.2008.09.003•

A methodology for coupling fragments of XPath with structural indexes for XML documents

[...]

George H. L. Fletcher¹, Dirk Van Gucht², Yuqing Wu², Marc Gyssens³, Sofia Brenes², Jan Paredaens⁴ - Show less +2 more•Institutions (4)

Washington State University Vancouver¹, Indiana University², University of Hasselt³, University of Antwerp⁴

01 Nov 2009-Information Systems

TL;DR: This work identifies XPath fragments which are ideally coupled with the newly introduced P(k)-partition which has its definition grounded in the well-known A(k) structural index and its associated partition.

...read moreread less

31 citations

Book Chapter•10.1007/978-3-642-00958-7_63•

Refining Keyword Queries for XML Retrieval by Combining Content and Structure

[...]

Desislava Petkova¹, W. Bruce Croft¹, Yanlei Diao¹•Institutions (1)

University of Massachusetts Amherst¹

18 Apr 2009

TL;DR: This work proposes an automatic query refinement method to transform a keyword query into structured XML queries that capture the original information need and conform to the underlying XML data.

...read moreread less

Abstract: The structural heterogeneity and complexity of XML repositories makes query formulation challenging for users who have little knowledge of XML. To assist its users, an XML retrieval system can have a keyword-based interface, relegating the task of combining textual and structural clues to the retrieval algorithm. In this work, we propose an automatic query refinement method to transform a keyword query into structured XML queries that capture the original information need and conform to the underlying XML data. We formulate query generation as a search problem, and show the effectiveness of the method in generating accurate content-and-structure queries.

...read moreread less

31 citations

Journal Article•10.1002/ASNA.200811233•

A standard transformation from XML to RDF via XSLT

[...]

Frank Breitling¹•Institutions (1)

Leibniz Institute for Astrophysics Potsdam¹

12 Jun 2009-arXiv: Instrumentation and Methods for Astrophysics

TL;DR: A generic transformation of XML data into the Resource Description Framework (RDF) and its implementation by XSLT transformations is presented to solve the problem of semantic computing.

...read moreread less

Abstract: A generic transformation of XML data into the Resource Description Framework (RDF) and its implementation by XSLT transformations is presented. It was developed by the grid integration project for robotic telescopes of AstroGrid-D to provide network communication through the Remote Telescope Markup Language (RTML) to its RDF based information service. The transformation's generality is explained by this example. It automates the transformation of XML data into RDF and thus solves this problem of semantic computing. Its design also permits the inverse transformation but this is not yet implemented.

...read moreread less

Journal Article•10.1109/TKDE.2009.26•

Locating XML Documents in a Peer-to-Peer Network Using Distributed Hash Tables

[...]

Praveen Rao¹, Bongki Moon²•Institutions (2)

University of Missouri–Kansas City¹, University of Arizona²

01 Dec 2009-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper addresses the problem of efficiently locating relevant XML documents in a P2P network, where a user poses queries in a language such as XPath, and develops a new system called psiX that runs on top of an existing distributed hashing framework.

...read moreread less

Abstract: One of the key challenges in a peer-to-peer (P2P) network is to efficiently locate relevant data sources across a large number of participating peers With the increasing popularity of the extensible markup language (XML) as a standard for information interchange on the Internet, XML is commonly used as an underlying data model for P2P applications to deal with the heterogeneity of data and enhance the expressiveness of queries In this paper, we address the problem of efficiently locating relevant XML documents in a P2P network, where a user poses queries in a language such as XPath We have developed a new system called psiX that runs on top of an existing distributed hashing framework Under the psiX system, each XML document is mapped into an algebraic signature that captures the structural summary of the document An XML query pattern is also mapped into a signature The query's signature is used to locate relevant document signatures Our signature scheme supports holistic processing of query patterns without breaking them into multiple path queries and processing them individually The participating peers in the network collectively maintain a collection of distributed hierarchical indexes for the document signatures Value indexes are built to handle numeric and textual values in XML documents These indexes are used to process queries with value predicates Our experimental study on PlanetLab demonstrates that psiX provides an efficient location service in a P2P network for a wide variety of XML documents

...read moreread less

Journal Article•10.1016/J.DATAK.2008.09.001•

S3: Evaluation of tree-pattern XML queries supported by structural summaries

[...]

Sayyed Kamyar Izadi¹, Theo Härder², Mostafa S. Haghjoo¹•Institutions (2)

Iran University of Science and Technology¹, Kaiserslautern University of Technology²

1 Jan 2009

TL;DR: In this paper, a novel method is proposed, called S^3, which can selectively process the document's nodes and substantially outperform previous QTP processing methods w.r.t. response time, I/O overhead, and memory consumption - critical parameters in any real multi-user environment.

...read moreread less

Abstract: XML queries are frequently based on path expressions where their elements are connected to each other in a tree-pattern structure, called query tree pattern (QTP). Therefore, a key operation in XML query processing is finding those elements which match the given QTP. In this paper, we propose a novel method, called S^3, which can selectively process the document's nodes. In S^3, unlike all previous methods, path expressions are not directly executed on the XML document, but first they are evaluated against a guidance structure, called QueryGuide. Enriched by information extracted from the QueryGuide, a query execution plan, called SMP, is generated to provide focused pattern matching and avoid document access as far as possible. Moreover, our experimental results confirm that S^3 and its optimized version OS^3 substantially outperform previous QTP processing methods w.r.t. response time, I/O overhead, and memory consumption - critical parameters in any real multi-user environment.

...read moreread less

Book Chapter•10.1007/978-1-4899-7993-3_1550-2•

XML Process Definition Language

[...]

Nathaniel Palmer

1 Jan 2009

TL;DR: A method for planarizing metal plugs for device interconnections by providing a semiconductor structure with at least one device thereon and planarized using a first chemical mechanical polishing process.

...read moreread less

Proceedings Article•10.1109/ICME.2009.5202458•

Efficient XML Interchange for rich internet applications

[...]

Daniel Peintner¹, Harald Kosch¹, Jörg Heuer²•Institutions (2)

University of Passau¹, Siemens²

28 Jun 2009

TL;DR: This paper evaluates the efficient access of rich internet applications especially in the domain of resource limited embedded devices such as digital picture frames and proposes a generic adaption of the EXI format to even increase the efficiency.

...read moreread less

Abstract: The tremendous acceptance of web applications, or more specifically, rich media applications are about to be extended to embedded devices such as mobile phones, digital picture frames or TV sets. The Extensible Markup Language (XML) is one important pillar when we deal with such internet applications. XML is known as the interchange language of the web. Besides its outstanding features, in the domain of embedded devices, XML is difficult to handle due to the processing overhead and the verbosity associated with its use. This paper evaluates the efficient access of rich internet applications especially in the domain of resource limited embedded devices such as digital picture frames. For this purpose, typical XML source information models for rich internet applications, such as Silverlight and SVG, are evaluated. In this context, the new Efficient XML Interchange (EXI) format is applied and studied. Finally a generic adaption of the EXI format is developed to even increase the efficiency. The paper concludes with the proposal for further studies on an integration of EXI-based typed interfaces to reduce the processing complexity for rich media applications on embedded devices.

...read moreread less

Proceedings Article•10.1145/1655121.1655129•

The curse of namespaces in the domain of XML signature

[...]

Meiko Jensen¹, Lijun Liao¹, Jörg Schwenk¹•Institutions (1)

Ruhr University Bochum¹

13 Nov 2009

TL;DR: It is shown that the interplay of XML Signature, XPath, and the XML namespace concept has severe flaws that can be exploited for an attack, and that XML namespaces in general pose real troubles to digital signatures in the XML domain.

...read moreread less

Abstract: The XML signature wrapping attack is one of the most discussed security issues of the Web Services security community during the last years. Until now, the issue has not been solved, and all countermeasure approaches proposed so far were shown to be insufficient.In this paper, we present yet another way to perform signature wrapping attacks by using the XML namespace injection technique. We show that the interplay of XML Signature, XPath, and the XML namespace concept has severe flaws that can be exploited for an attack, and that XML namespaces in general pose real troubles to digital signatures in the XML domain. Additionally, we present and discuss some new approaches in countering the proposed attack vector.

...read moreread less

Patent•

Efficient XML Tree Indexing Structure Over XML Content

[...]

Anguel Novoselsky¹, Zhen Hua Liu¹, Thomas Baby¹•Institutions (1)

Business International Corporation¹

30 Oct 2009

TL;DR: In this paper, a method and apparatus for building and using a persistent XML tree index for navigating an XML document is described, which is stored separately from the XML document content, and thus is able to optimize performance through the use of fixed-sized index entries.

...read moreread less

Abstract: A method and apparatus are provided for building and using a persistent XML tree index for navigating an XML document. The XML tree index is stored separately from the XML document content, and thus is able to optimize performance through the use of fixed-sized index entries. The XML document hierarchy need not be constructed in volatile memory, so creating and using the XML tree index scales even for large documents. To evaluate a path expression including descendent or ancestral syntax, navigation links can be read from persistent storage and used directly to find the nodes specified in the path expression. The use of an abstract navigational interface allows applications to be written that are independent of the storage implementation of the index and the content. Thus, the XML tree index can index documents stored at least in a database, a persistent file system, or as a sequence of in memory.

...read moreread less

Book Chapter•10.1007/978-3-642-03555-5_7•

A Data Parallel Algorithm for XML DOM Parsing

[...]

Bhavik Shah¹, Praveen Rao¹, Bongki Moon², Mohan Rajagopalan³•Institutions (3)

University of Missouri–Kansas City¹, University of Arizona², Intel³

21 Aug 2009

TL;DR: Through empirical evaluation, it is shown that ParDOM yields better scalability than PXP on commodity multicore processors, and can process a wide-variety of XML datasets with complex structures which PXP fails to parse.

...read moreread less

Abstract: The extensible markup language XML has become the de facto standard for information representation and interchange on the Internet. XML parsing is a core operation performed on an XML document for it to be accessed and manipulated. This operation is known to cause performance bottlenecks in applications and systems that process large volumes of XML data. We believe that parallelism is a natural way to boost performance. Leveraging multicore processors can offer a cost-effective solution, because future multicore processors will support hundreds of cores, and will offer a high degree of parallelism in hardware. We propose a data parallel algorithm called ParDOM for XML DOM parsing, that builds an in-memory tree structure for an XML document. ParDOM has two phases. In the first phase, an XML document is partitioned into chunks and parsed in parallel. In the second phase, partial DOM node tree structures created during the first phase, are linked together (in parallel) to build a complete DOM node tree. ParDOM offers fine-grained parallelism by adopting a flexible chunking scheme --- each chunk can contain an arbitrary number of start and end XML tags that are not necessarily matched. ParDOM can be conveniently implemented using a data parallel programming model that supports map and sort operations. Through empirical evaluation, we show that ParDOM yields better scalability than PXP [23] --- a recently proposed parallel DOM parsing algorithm --- on commodity multicore processors. Furthermore, ParDOM can process a wide-variety of XML datasets with complex structures which PXP fails to parse.

...read moreread less

Book Chapter•10.1007/978-3-642-00672-2_10•

Processing XML Keyword Search by Constructing Effective Structured Queries

[...]

Jianxin Li¹, Chengfei Liu¹, Rui Zhou¹, Bo Ning¹•Institutions (1)

Swinburne University of Technology¹

22 Mar 2009

TL;DR: This paper designs an adaptive XML keyword search approach, called XBridge, that can derive the semantics of a keyword query and generate a set of effective structured queries by analyzing the given keywords and the schemas of XML data sources.

...read moreread less

Abstract: Recently, keyword search has attracted a great deal of attention in XML database. It is hard to directly improve the relevancy of XML keyword search because lots of keyword-matched nodes may not contribute to the results. To address this challenge, in this paper we design an adaptive XML keyword search approach, called XBridge , that can derive the semantics of a keyword query and generate a set of effective structured queries by analyzing the given keyword query and the schemas of XML data sources. To efficiently answer keyword query, we only need to evaluate the generated structured queries over the XML data sources with any existing XQuery search engine. In addition, we extend our approach to process top-k keyword search based on the execution plan to be proposed. The quality of the returned answers can be measured using the context of the keyword-matched nodes and the contents of the nodes together. The effectiveness and efficiency of XBridge is demonstrated with an experimental performance study on real XML data.

...read moreread less

Proceedings Article•10.1109/GRID.2009.5353070•

Parallel and distributed approach for processing large-scale XML datasets

[...]

Zacharia Fadika¹, Michael R. Head¹, Madhusudhan Govindaraju¹•Institutions (1)

Binghamton University¹

11 Dec 2009

TL;DR: This work has adapted the Hadoop implementation to determine the threshold data sizes and computation work required per node, for a distributed solution to be effective and presents both a parallel and distributed approach to analyze how the scalability and performance requirements of large-scale XML-based data processing can be achieved.

...read moreread less

Abstract: An emerging trend is the use of XML as the data format for many distributed scientific applications, with the size of these documents ranging from tens of megabytes to hundreds of megabytes. Our earlier benchmarking results revealed that most of the widely available XML processing toolkits do not scale well for large sized XML data. A significant transformation is necessary in the design of XML processing for scientific applications so that the overall application turn-around time is not negatively affected. We present both a parallel and distributed approach to analyze how the scalability and performance requirements of large-scale XML-based data processing can be achieved. We have adapted the Hadoop implementation to determine the threshold data sizes and computation work required per node, for a distributed solution to be effective. We also present an analysis of parallelism using our Piximal toolkit for processing large-scale XML datasets that utilizes the capabilities for parallelism that are available in the emerging multi-core architectures. Multi-core processors are expected to be widely available in research clusters and scientific desktops, and it is critical to harness the opportunities for parallelism in the middleware, instead of passing on the task to application programmers. Our parallelization approach for a multi-core node is to employ a DFA-based parser that recognizes a useful subset of the XML specification, and convert the DFA into an NFA that can be applied to an arbitrary subset of the input. Speculative NFAs are scheduled on available cores in a node to effectively utilize the processing capabilities and achieve overall performance gains. We evaluate the efficacy of this approach in terms of potential speedup that can be achieved for representative XML data sets.

...read moreread less

Proceedings Article•10.1145/1526709.1526857•

Interactive search in XML data

[...]

Guoliang Li¹, Jianhua Feng¹, Lizhu Zhou¹•Institutions (1)

Tsinghua University¹

20 Apr 2009

TL;DR: This paper proposes a new information-access paradigm for XML data, called "Inks," in which the system searches on the underlying data "on the fly" as the user types in query keywords, and implemented the algorithm.

...read moreread less

Abstract: In a traditional keyword-search system over XML data, a user composes a keyword query, submits it to the system, and retrieves relevant subtrees. In the case where the user has limited knowledge about the data, often the user feels "left in the dark" when issuing queries, and has to use a try-and-see approach for finding information. In this paper, we study a new information-access paradigm for XML data, called "Inks," in which the system searches on the underlying data "on the fly" as the user types in query keywords. Inks extends existing XML keyword search methods by interactively answering queries. We propose effective indices, early-termination techniques, and efficient search algorithms to achieve a high interactive speed. We have implemented our algorithm, and the experimental results show that our method achieves high search efficiency and result quality.

...read moreread less

Journal Article•10.1177/0165551509104231•

A relational data harmonization approach to XML

[...]

Timo Niemi¹, Turkka Näppilä¹, Kalervo Järvelin¹•Institutions (1)

University of Tampere¹

01 Oct 2009-Journal of Information Science

TL;DR: This work proposes that semantically similar data are harmonized when extracting data from XML-based data sources and introduces a constructor algebra, which is a powerful tool in the harmonization of XML data.

...read moreread less

Abstract: There are numerous approaches for integrating data from heterogeneous data sources. A common background assumption is that the data sources remain quite stable and are known in advance. Hence an integration system can be built to manipulate them. In practice there is, however, often a demand for supporting ad hoc information needs concerning unexpected autonomous data sources containing volatile data. A different approach is therefore needed. We propose that semantically similar data are harmonized when extracting data from XML-based data sources. We introduce a constructor algebra, which is a powerful tool in the harmonization of XML data. This algebra is able to form for any XML data source a unique relational representation, called an XML relation. We demonstrate that the XML relation representation supports grouping and aggregation of data needed, for example, in OLAP online analytical processing -style applications.

...read moreread less

Journal Article•10.1145/1558334.1558338•

An X-ray on web-available XML schemas

[...]

Alberto H. F. Laender¹, Mirella M. Moro¹, Cristiano Nascimento¹, Patrícia S. Martins¹•Institutions (1)

Universidade Federal de Minas Gerais¹

24 Jun 2009

TL;DR: A general view, an X-Ray, on Web-available XSD files by identifying which XSD constructs are more and less frequently used, and an evolution perspective, showing results from X SD files collected in 2005 and 2008 are provided.

...read moreread less

Abstract: XML has conquered its place as the most used standard for representing Web data. An XML schema may be employed for similar purposes of those from database schemas. There are different languages to write an XML schema, such as DTD and XSD. In this paper, we provide a general view, an X-Ray, on Web-available XSD files by identifying which XSD constructs are more and less frequently used. Furthermore, we provide an evolution perspective, showing results from XSD files collected in 2005 and 2008. Hence, we can also draw some conclusions on what trends seem to exist in XSD usage. The results of such study provide relevant information for developers of XML applications, tools and algorithms in which the schema has a distinguished role.

...read moreread less

Journal Article•10.1145/1516539.1516542•

Retrieving XML data from heterogeneous sources through vague querying

[...]

Bettina Fazzinga¹, Sergio Flesca¹, Andrea Pugliese¹•Institutions (1)

University of Calabria¹

11 May 2009-ACM Transactions on Internet Technology

TL;DR: The framework ensures high autonomy to participating sources as it does not rely on a global schema or on semantic mappings between schemas, and defines a query language and its associated semantics that allows to collect as much information as possible from several heterogeneous XML sources.

...read moreread less

Abstract: We propose a framework for querying heterogeneous XML data sources. The framework ensures high autonomy to participating sources as it does not rely on a global schema or on semantic mappings between schemas. The basic intuition is that of extending traditional approaches for approximate query evaluation, by providing techniques for combining partial answers coming from different sources, possibly on the basis of limited knowledge about the local schemas (i.e., key constraints). We define a query language and its associated semantics, that allows us to collect as much information as possible from several heterogeneous XML sources. We provide algorithms for query evaluation and characterize the complexity of the query language. Finally, we validate the approach in a medical application scenario.

...read moreread less

Journal Article•

Two-Way Mapping between Object-Oriented Databases and XML

[...]

Taher Naser, Reda Alhajj, Mick Ridley

01 Jan 2009-Informatica (lithuanian Academy of Sciences)

TL;DR: A novel approach for mapping an existing object-oriented database into XML and vice versa, where the object graph is derived based on characteristics of the XML schema and the links are simulated in terms of nesting to get a simulated object graph.

...read moreread less

Abstract: This paper presents a novel approach for mapping an existing object-oriented database into XML and vice versa. The major motivation to carry out this study is the fact that it is necessary to facilitate platform independent exchange of the content of object oriented databases and the need to store XML in a structured database. There are more common features between the object-oriented model and XML and thus the the two-way mapping from object-oriented databases into XML (and vice versa) should be less problematic. To achieve the mapping, what we call the object graph is derived based on characteristics of the schema to be mapped. For object-oriented schema, the object graph simply summarizes and includes all nesting and inheritance links, which are the basics of the object-oriented model. Then, the inheritance is simulated in terms of nesting to get a simulated object graph. This way, everything in a simulated object graph is directly representable in XML format. Finally, we handle the mapping of the actual data from the objectoriented database into corresponding XML document(s). On the other hand, the common features between the object-oriented model and XML make it is more attractive to map from XML into object-oriented database; such mapping preserves database specifics. To achieve the mapping, the object graph is derived based on characteristics of the XML schema; it simply summarizes and includes all complex and simple elements and the links, which are the basics of the XML schema. Then, the links are simulated in terms of nesting to get a simulated object graph. This way, everything in a simulated object graph is directly representable in object-oriented database. Finally, we handle the mapping of the actual data from XML document(s) into the corresponding object-oriented database. Povzetek: Prispevek predstavlja izvirno dvostransko preslikavo med objektnimi podatkovnimi bazami in XML.

...read moreread less

Proceedings Article•10.1145/1529282.1529752•

Runtime monitoring of web service choreographies using streaming XML

[...]

Sylvain Hallé¹, Roger Villemaire²•Institutions (2)

University of California, Santa Barbara¹, Université du Québec à Montréal²

8 Mar 2009

TL;DR: It is shown that, given a suitable translation of LTL formulæ into XQuery expressions, such runtime monitoring of choreography constraints is possible by feeding the trace of messages to a streaming XQuery processor.

...read moreread less

Abstract: A wide range of web service choreography constraints on the content and sequentiality of messages can be translated into Linear Temporal Logic (LTL). Although they can be checked statically on abstractions of actual services, it is desirable that violations of these specifications be also detected at runtime. In this paper, we show that, given a suitable translation of LTL formulae into XQuery expressions, such runtime monitoring of choreography constraints is possible by feeding the trace of messages to a streaming XQuery processor. The forward-only fragment of LTL is introduced; it represents the fragment of LTL supported by available streaming engines.

...read moreread less

Journal Article•10.4018/JDM.2009040104•

Efficient Filtering of Branch Queries for High-Performance XML Data Services

[...]

Ryan H. Choi¹, Raymond K. Wong¹•Institutions (1)

University of New South Wales¹

01 Apr 2009-Journal of Database Management

TL;DR: This article considers the problem of filtering a streaming XML data efficiently against a large number of branch XPath queries, and presents how to efficiently return all matching elements for each matching branch query.

...read moreread less

Abstract: Efficient XML filtering has been the fundamental technique in recent Web service and XML publish/subscribe applications. In this article, we consider the problem of filtering a streaming XML data efficiently against a large number of branch XPath queries. To improve the performance of XML filtering, branch queries are grouped into similar queries, and the common paths between queries in the same group are identified. After performing structural matching of queries, queries are organized in a way that multiple queries can be evaluated simultaneously in the post-processing phase. In the post-processing phase, join operations are executed in a pipeline fashion, and intermediate join results are shared amongst the queries in the same group. As a result, the total number of join operations performed in the post-processing phase is significantly reduced. In addition, we also present how to efficiently return all matching elements for each matching branch query. Experiments show that our proposal is efficient and scalable compared to previous work.

...read moreread less

Proceedings Article•10.1109/ICDE.2009.18•

A Decade of XML Data Management: An Industrial Experience Report from Oracle

[...]

Zhen Hua Liu¹, Ravi Murthy¹•Institutions (1)

Oracle Corporation¹

29 Mar 2009

TL;DR: The value of managing XML in databases, the current challenges and improvements that will hopefully promote future research directions are shown and a timely checkpoint of XML data management from industrial perspective is provided with experience of developing and supporting Oracle XML products.

...read moreread less

Abstract: XML and its related technologies have now been in use for almost a decade. There has been considerable amount of effort both from research and industry focusing on XML, XQuery/XPath, XSLT and SQL/XML processing in the database. Many research prototypes and industrial products have been built to satisfy the XML use cases. This paper reviews several use cases where XML databases are leveraged to build real-world XML applications. We discuss the lessons learnt in supporting both data-centric and document-centric XMLDB applications within a single database system and the need for the implementation of different XML storage, index and query optimisation techniques for different XML use cases. We show the value of managing XML in databases, the current challenges and improvements that will hopefully promote future research directions. This paper also provides a timely checkpoint of XML data management from industrial perspective with experience of developing and supporting Oracle XML products.

...read moreread less

Journal Article•10.1145/1519103.1519115•

Modeling and querying probabilistic XML data

[...]

Benny Kimelfeld¹, Yehoshua Sagiv²•Institutions (2)

IBM¹, Hebrew University of Jerusalem²

20 Mar 2009

TL;DR: It turns out that efficient evaluation of a large class of queries is realizable in models where distributional nodes are probabilistically independent, which makes the evaluation of twig patterns with projection tractable in the most expressive family of p-documents, among those considered.

...read moreread less

Abstract: We survey recent results on modeling and querying probabilistic XML data. The literature contains a plethora of probabilistic XML models [2, 13, 14, 18, 21, 24, 27], and most of them can be represented by means of p-documents [18] that have, in addition to ordinary nodes, distributional nodes that specify the probabilistic process of generating a random document. The above models are families of p-documents that differ in the types of distributional nodes in use. The focus of this survey is on the tradeoff between the ability to express real-world probabilistic data (in particular, by taking correlations between atomic events into account) and the efficiency of query evaluation. We concentrate on two important issues. The first is the ability to efficiently translate a pdocument of one family into that of another. The second is the complexity of query evaluation over pdocuments (under the usual semantics of querying probabilistic data, e.g., [4, 9, 10]). It turns out that efficient evaluation of a large class of queries (i.e., twig patterns with projection and aggregate functions) is realizable in models where distributional nodes are probabilistically independent. In other models, the evaluation of a query with projection is very often intractable. In comparison, very simple conjunctive queries are intractable over probabilistic models of relational databases, even when the tuples are probabilistically independent [9, 10]. To handle the limitation exhibited by the above tradeoff, various approaches have been proposed. The first is to allow query answers to be approximate [18], which makes the evaluation of twig patterns with projection tractable in the most expressive family of p-documents, among those considered. This tractability, however, does not carry over to nonmonotonic queries, such as twig patterns with negation or aggregation. The approach presented in [7]

...read moreread less

Reverse Engineering from an XML Document into an Extended DTD Graph.

[...]

Herbert Shiu¹, Joseph Fong¹•Institutions (1)

City University of Hong Kong¹

1 Jan 2009

TL;DR: A systematic approach to reverse engineer arbitrary XML documents to their conceptual schema–extended DTD graphs―which is a DTD graph with data semantics, which determines the structure of the XML document, but also derives candidate data semantics from the XML element instances.

...read moreread less

Abstract: Extensible markup language (XML) has become a standard for persistent storage and data interchange via the Internet due to its openness, self-descriptiveness, and flexibility This article proposes a systematic approach to reverse engineer arbitrary XML documents to their conceptual schemaâ€“extended DTD graphs?which is a DTD graph with data semantics The proposed approach not only determines the structure of the XML document, but also derives candidate data semantics from the XML element instances by treating each XML element instance as a record in a table of a relational database One application of the determined data semantics is to verify the linkages among elements Implicit and explicit referential linkages are among XML elements modeled by the parent-children structure and ID/IDREF(S) respectively As a result, an arbitrary XML document can be reverse engineered into its conceptual schema in an extended DTD graph format

...read moreread less

...

Expand