Top 462 papers published in the topic of Efficient XML Interchange in 2009

Showing papers on "Efficient XML Interchange published in 2009"

Proceedings Article•

Comparison of JSON and XML Data Interchange Formats: A Case Study.

[...]

Nurzhan Nurseitov, Michael Paulson, Randall Reynolds, Clemente Izurieta¹•Institutions (1)

1 Jan 2009

TL;DR: This paper compares two data interchange formats currently used by industry applications; XML and JSON and finds that JSON is significantly faster than XML and is further record other resource-related metrics in the results.

...read moreread less

Abstract: This paper compares two data interchange formats currently used by industry applications; XML and JSON. The choice of an adequate data interchange format can have significant consequences on data transmission rates and performance. We describe the language specifications and their respective setting of use. A case study is then conducted to compare the resource utilization and the relative performance of applications that use the interchange formats. We find that JSON is significantly faster than XML and we further record other resource-related metrics in our results.

...read moreread less

347 citations

Proceedings Article•10.1109/ICDE.2009.16•

Effective XML Keyword Search with Relevance Oriented Ranking

[...]

Zhifeng Bao¹, Tok Wang Ling¹, Bo Chen¹, Jiaheng Lu²•Institutions (2)

National University of Singapore¹, Renmin University of China²

29 Mar 2009

TL;DR: This paper designs novel formulae to identify the search for nodes and search via nodes of a query, and presents a novel XML TF*IDF ranking strategy to rank the individual matches of all possible search intentions.

...read moreread less

Abstract: Inspired by the great success of information retrieval (IR) style keyword search on the web, keyword search on XML has emerged recently. The difference between text database and XML database results in three new challenges: (1) Identify the user search intention, i.e. identify the XML node types that user wants to search for and search via. (2) Resolve keyword ambiguity problems: a keyword can appear as both a tag name and a text value of some node; a keyword can appear as the text values of different XML node types and carry different meanings. (3) As the search results are sub-trees of the XML document, new scoring function is needed to estimate its relevance to a given query. However, existing methods cannot resolve these challenges, thus return low result quality in term of query relevance. In this paper, we propose an IR-style approach which basically utilizes the statistics of underlying XML data to address these challenges. We first propose specific guidelines that a search engine should meet in both search intention identification and relevance oriented ranking for search results. Then based on these guidelines, we design novel formulae to identify the search for nodes and search via nodes of a query, and present a novel XML TF*IDF ranking strategy to rank the individual matches of all possible search intentions. Lastly, the proposed techniques are implemented in an XML keyword search engine called XReal, and extensive experiments show the effectiveness of our approach.

...read moreread less

197 citations

Book Chapter•10.1007/978-3-642-04985-9_26•

Rules and Norms: Requirements for Rule Interchange Languages in the Legal Domain

[...]

Thomas F. Gordon¹, Guido Governatori², Antonino Rotolo³•Institutions (3)

Fokus¹, NICTA², University of Bologna³

4 Nov 2009

TL;DR: The requirements for rule interchange languages for applications in the legal domain are summarized and these requirements are used to evaluate RuleML, SBVR, SWRL and RIF and the Legal Knowledge Interchange Format (LKIF) is presented.

...read moreread less

Abstract: In this survey paper we summarize the requirements for rule interchange languages for applications in the legal domain and use these requirements to evaluate RuleML, SBVR, SWRL and RIF. We also present the Legal Knowledge Interchange Format (LKIF), a new rule interchange format developed specifically for applications in the legal domain.

...read moreread less

122 citations

Journal Article•10.1109/SURV.2009.090302•

XML and Web Services Security Standards

[...]

Nils Agne Nordbotten

01 Jul 2009-IEEE Communications Surveys and Tutorials

TL;DR: This paper provides a tutorial on current security standards for XML and Web services and discusses standards including XML Signature, XML Encryption, the XML Key Management Specification (XKMS), WS-Security, WS-Trust,WS-SecureConversation, Web Services Policy, and the Security Assertion Markup Language (SAML).

...read moreread less

Abstract: XML and Web services are widely used in current distributed systems. The security of the XML based communication, and the Web services themselves, is of great importance to the overall security of these systems. Furthermore, in order to facilitate interoperability, the security mechanisms should preferably be based on established standards. In this paper we provide a tutorial on current security standards for XML and Web services. The discussed standards include XML Signature, XML Encryption, the XML Key Management Specification (XKMS), WS-Security, WS-Trust, WS-SecureConversation, Web Services Policy, WS-SecurityPolicy, the eXtensible Access Control Markup Language (XACML), and the Security Assertion Markup Language (SAML).

...read moreread less

106 citations

Journal Article•10.1016/J.COSREV.2009.03.001•

Survey: An overview on XML similarity: Background, current trends and future directions

[...]

Joe Tekli¹, Richard Chbeir¹, Kokou Yetongnon¹•Institutions (1)

Centre national de la recherche scientifique¹

01 Aug 2009-Computer Science Review

TL;DR: This paper provides an overview of XML similarity/comparison by presenting existing research related to XML similarity by detailing the possible applications of XML comparison processes in various fields, ranging over data warehousing, data integration, classification/clustering and XML querying.

...read moreread less

94 citations

Proceedings Article•10.1145/1559845.1559921•

DDE: from dewey to a fully dynamic XML labeling scheme

[...]

Liang Xu¹, Tok Wang Ling¹, Huayu Wu¹, Zhifeng Bao¹•Institutions (1)

National University of Singapore¹

29 Jun 2009

TL;DR: A novel labeling scheme called DDE (for Dynamic DEwey) which is tailored for both static and dynamic XML documents which can completely avoid re-labeling and its label quality is most resilient to the number and order of insertions compared to the existing approaches.

...read moreread less

Abstract: Labeling schemes lie at the core of query processing for many XML database management systems. Designing labeling schemes for dynamic XML documents is an important problem that has received a lot of research attention. Existing dynamic labeling schemes, however, often sacrifice query performance and introduce additional labeling cost to facilitate arbitrary updates even when the documents actually seldom get updated. Since the line between static and dynamic XML documents is often blurred in practice, we believe it is important to design a labeling scheme that is compact and efficient regardless of whether the documents are frequently updated or not. In this paper, we propose a novel labeling scheme called DDE (for Dynamic DEwey) which is tailored for both static and dynamic XML documents. For static documents, the labels of DDE are the same as those of dewey which yield compact size and high query performance. When updates take place, DDE can completely avoid re-labeling and its label quality is most resilient to the number and order of insertions compared to the existing approaches. In addition, we introduce Compact DDE (CDDE) which is designed to optimize the performance of DDE for insertions. Both DDE and CDDE can be incorporated into existing systems and applications that are based on dewey labeling scheme with minimum efforts. Experiment results demonstrate the benefits of our proposed labeling schemes over the previous approaches.

...read moreread less

92 citations

Journal Article•10.1145/1620585.1620590•

Static analysis of active XML systems

[...]

Serge Abiteboul¹, Luc Segoufin², Victor Vianu³•Institutions (3)

French Institute for Research in Computer Science and Automation¹, École normale supérieure de Cachan², University of California, San Diego³

14 Dec 2009-ACM Transactions on Database Systems

TL;DR: This article focuses on the verification of temporal properties of runs of Active XML systems, specified in a tree-pattern-based temporal logic, Tree-LTL, which allows expressing a rich class of semantic properties of the application.

...read moreread less

Abstract: Active XML is a high-level specification language tailored to data-intensive, distributed, dynamic Web services. Active XML is based on XML documents with embedded function calls. The state of a document evolves depending on the result of internal function calls (local computations) or external ones (interactions with users or other services). Function calls return documents that may be active, and so may activate new subtasks. The focus of this article is on the verification of temporal properties of runs of Active XML systems, specified in a tree-pattern-based temporal logic, Tree-LTL, which allows expressing a rich class of semantic properties of the application. The main results establish the boundary of decidability and the complexity of automatic verification of Tree-LTL properties.

...read moreread less

59 citations

XML Metadata Interchange.

[...]

Michael Weiss

1 Jan 2009

TL;DR: In this article, a semiconductor device and method for fabricating the same, which can maintain a threshold voltage constant despite of decreased channel width, is disclosed, and the device including a first, and a second conductive type wells in a substrate, a first gate electrode on the first gate insulating film, the second gate electrode being doped with a secondconductive type except for edges of the first gateway electrode in a channel width direction counter, and isolating regions formed between the first-and second-gate electrodes.

...read moreread less

Abstract: Semiconductor device and method for fabricating the same, is disclosed, which can maintain a threshold voltage constant despite of decreased channel width, the device including a first, and a second conductive type wells in a substrate, a first, and a second gate insulating films on the first, and the second conductive type wells, a first gate electrode on the first gate insulating film, the first gate electrode being doped with a second conductive type except for edges of the first gate electrode in a channel width direction counter doped with a first conductive type, a second gate electrode on the second gate insulating film, the second gate electrode being doped with a first conductive type except for edges of the second gate electrode in a channel width direction counter doped with a second conductive type, and isolating regions formed between the first, and second conductive type wells, the first, and second gate insulating films, and the first, and second gate electrodes.

...read moreread less

51 citations

Journal Article•10.6688/JISE.2009.25.5.7•

Measuring and Evaluating a Design Complexity Metric for XML Schema Documents

[...]

Dilek Basci, Sanjay Misra

01 Sep 2009-Journal of Information Science and Engineering

TL;DR: An attempt has been made to evaluate the quality of XML schema documents (XSD) written in W3C XML Schema language with a metric, which measures the complexity due to the internal architecture of XSD components, and due to recursion.

...read moreread less

Abstract: The eXtensible Markup Language (XML) has been gaining extraordinary acceptance from many diverse enterprise software companies for their object repositories, data interchange, and development tools. Further, many different domains, organizations and content providers have been publishing and exchanging information via internet by the usage of XML and standard schemas. Efficient implementation of XML in these domains requires well designed XML schemas. In this point of view, design of XML schemas plays an extremely important role in software development process and needs to be quantified for ease of maintainability. In this paper, an attempt has been made to evaluate the quality of XML schema documents (XSD) written in W3C XML Schema language. We propose a metric, which measures the complexity due to the internal architecture of XSD components, and due to recursion. This is the single metric, which cover all major factors responsible for complexity of XSD. The metric has been empirically and theoretically validated, demonstrated with examples and supported by comparison with other well known structure metrics applied on XML schema documents.

...read moreread less

44 citations

Patent•

XML-based web feed for web access of remote resources

[...]

Kevin Scott London¹, Ido Ben-Shachar¹, Ray Reskusich¹, Erdogan Ersev Samim¹, Howe Travis¹ - Show less +1 more•Institutions (1)

Microsoft¹

30 Jan 2009

TL;DR: In this article, the authors describe techniques for XML (Extensible Markup Language) web feeds for web access of remote resources and present a method for obtaining information regarding one or more available resources from one or multiple resource hosts, rendering the information regarding available resources into an XML document, and providing the XML document to a user device.

...read moreread less

Abstract: Techniques for XML (Extensible Markup Language) web feeds for web access of remote resources are described. In one embodiment, a method includes obtaining information regarding one or more available resources from one or more resource hosts, rendering the information regarding one or more available resources into an Extensible Markup Language (XML) document, and providing the XML document to a user device.

...read moreread less

41 citations

Journal Article•10.1145/1815918.1815924•

XML: some papers in a haystack

[...]

Mirella M. Moro¹, Vanessa Braganholo², Carina F. Dorneles³, Denio Duarte, Renata Galante⁴, Ronaldo dos Santos Mello⁵ - Show less +2 more•Institutions (5)

Universidade Federal de Minas Gerais¹, Federal University of Rio de Janeiro², Universidade de Passo Fundo³, Universidade Federal do Rio Grande do Sul⁴, Universidade Federal de Santa Catarina⁵

26 Oct 2009

TL;DR: This paper presents some of the research topics on XML, namely: XML on relational databases, query processing, views, data matching, and schema evolution, and summarizes some (some!) of the most relevant or traditional papers on those subjects.

...read moreread less

Abstract: XML has been explored by both research and industry communities. More than 5500 papers were published on different aspects of XML. With so many publications, it is hard for someone to decide where to start. Hence, this paper presents some of the research topics on XML, namely: XML on relational databases, query processing, views, data matching, and schema evolution. It then summarizes some (some!) of the most relevant or traditional papers on those subjects.

...read moreread less

Proceedings Article•

Boosting XML Filtering with a Scalable FPGA-based Architecture

[...]

Abhishek Mitra¹, Marcos R. Vieira¹, Petko Bakalov¹, Walid Najjar¹, Vassilis J. Tsotras¹ - Show less +1 more•Institutions (1)

University of California, Riverside¹

1 Dec 2009

TL;DR: This work proposes a "pure hardware" based solution, which utilizes XPath query blocks on FPGA to solve the filtering problem and achieves drastically better through put than the existing software or mixed (hardware/software) architectures.

...read moreread less

Abstract: growing amount of XML encoded data exchanged over the In- ternet increases the importance of XML based publish-subscribe (pub-sub) and content based routing systems. The input in such systems typically consists of a stream of XML documents and a set of user subscriptions expressed as XML queries. The pub-sub system then filters the published documents and passes them t o the subscribers. Pub-sub systems are characterized by very high input ratios, therefore the processing time is critical. In this p aper we propose a "pure hardware" based solution, which utilizes XPath query blocks on FPGA to solve the filtering problem. By utiliz - ing the high throughput that an FPGA provides for parallel pro- cessing, our approach achieves drastically better through put than the existing software or mixed (hardware/software) architectures. The XPath queries (subscriptions) are translated to regular expres- sions which are then mapped to FPGA devices. By introducing stacks within the FPGA we are able to express and process a wide range of path queries very efficiently, on a scalable environ ment. Moreover, the fact that the parser and the filter processing a re per- formed on the same FPGA chip, eliminates expensive communi- cation costs (that a multi-core system would need) thus enabling very fast and efficient pipelining. Our experimental evalua tion re- veals more than one order of magnitude improvement compared to traditional pub/sub systems.

...read moreread less

Journal Article•10.1016/J.JSS.2009.01.007•

Extending path summary and region encoding for efficient structural query processing in native XML databases

[...]

Su-Cheng Haw¹, Chien-Sing Lee¹•Institutions (1)

Multimedia University¹

01 Jun 2009-Journal of Systems and Software

TL;DR: TwigX-Guide is presented, a hybrid system, which takes advantage of the beautiful features of path summary in DataGuide and region encoding in TwigStack to improve complex query processing.

...read moreread less

Journal Article•10.4103/0256-4602.49086•

Node Labeling Schemes in XML Query Optimization: A Survey and Trends

[...]

Haw Su-Cheng¹, Lee Chien-Sing¹•Institutions (1)

Multimedia University¹

01 Jan 2009-Iete Technical Review

TL;DR: This paper analyzes how each approach to labeling schemes works, as well as its advantages and disadvantages, and discusses some of the current trends in labeling methods, which indicate a clear shift towards hybrid approaches.

...read moreread less

Abstract: With the rapid emergence of XML as a data exchange and data transfer medium over the Web, querying XML data has become a major concern. Labeling schemes have been developed to optimize query retrieval, since they provide a quick way to determine the type of relationships that are present among the nodes. In this paper, we analyze how each approach works, as well as its advantages and disadvantages. In addition, we discuss some of the current trends in labeling methods, which indicate a clear shift towards hybrid approaches. Hybrid systems open the possibility of balancing one technology’s weakness with another technology’s strengths.

...read moreread less

Journal Article•10.1016/J.INS.2009.06.025•

SAIL: Structure-aware indexing for effective and progressive top-k keyword search over XML documents

[...]

Guoliang Li¹, Chen Li², Jianhua Feng¹, Lizhu Zhou¹•Institutions (2)

Tsinghua University¹, University of California, Irvine²

01 Oct 2009-Information Sciences

TL;DR: A novel method is developed, called SAIL, to index such structural relationships embedded in XML documents to facilitate the processing of keyword queries and devise structure-aware indices to maintain the structural relationships for efficiently identifying the minimal-cost trees.

...read moreread less

Book Chapter•10.1007/978-3-642-00887-0_55•

Materialized View Selection in XML Databases

[...]

Nan Tang, Jeffrey Xu Yu¹, Hao Tang², M. Tamer Özsu³, Peter Boncz - Show less +1 more•Institutions (3)

The Chinese University of Hong Kong¹, Renmin University of China², University of Waterloo³

16 Mar 2009

TL;DR: To facilitate intuitionistic view selection, a view graph is presented to structurally maintain all generated views and two view selection strategies are proposed, targeting at space-optimized and space-time tradeoff, respectively.

...read moreread less

Abstract: Materialized views, a rdbms silver bullet, demonstrate its efficacy in many applications, especially as a data warehousing/decison support system tool The pivot of playing materialized views efficiently is view selection Though studied for over thirty years in rdbms , the selection is hard to make in the context of xml databases, where both the semi-structured data and the expressiveness of xml query languages add challenges to the view selection problem We start our discussion on producing minimal xml views (in terms of size) as candidates for a given workload (a query set) To facilitate intuitionistic view selection, we present a view graph (called vcube ) to structurally maintain all generated views By basing our selection on vcube for materialization, we propose two view selection strategies, targeting at space-optimized and space-time tradeoff, respectively We built our implementation on top of Berkeley DB XML, demonstrating that significant performance improvement could be obtained using our proposed approaches

...read moreread less

Book Chapter•10.1007/978-3-642-00958-7_63•

Refining Keyword Queries for XML Retrieval by Combining Content and Structure

[...]

Desislava Petkova¹, W. Bruce Croft¹, Yanlei Diao¹•Institutions (1)

University of Massachusetts Amherst¹

18 Apr 2009

TL;DR: This work proposes an automatic query refinement method to transform a keyword query into structured XML queries that capture the original information need and conform to the underlying XML data.

...read moreread less

Abstract: The structural heterogeneity and complexity of XML repositories makes query formulation challenging for users who have little knowledge of XML. To assist its users, an XML retrieval system can have a keyword-based interface, relegating the task of combining textual and structural clues to the retrieval algorithm. In this work, we propose an automatic query refinement method to transform a keyword query into structured XML queries that capture the original information need and conform to the underlying XML data. We formulate query generation as a search problem, and show the effectiveness of the method in generating accurate content-and-structure queries.

...read moreread less

Journal Article•10.1109/TKDE.2009.26•

Locating XML Documents in a Peer-to-Peer Network Using Distributed Hash Tables

[...]

Praveen Rao¹, Bongki Moon²•Institutions (2)

University of Missouri–Kansas City¹, University of Arizona²

01 Dec 2009-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper addresses the problem of efficiently locating relevant XML documents in a P2P network, where a user poses queries in a language such as XPath, and develops a new system called psiX that runs on top of an existing distributed hashing framework.

...read moreread less

Abstract: One of the key challenges in a peer-to-peer (P2P) network is to efficiently locate relevant data sources across a large number of participating peers With the increasing popularity of the extensible markup language (XML) as a standard for information interchange on the Internet, XML is commonly used as an underlying data model for P2P applications to deal with the heterogeneity of data and enhance the expressiveness of queries In this paper, we address the problem of efficiently locating relevant XML documents in a P2P network, where a user poses queries in a language such as XPath We have developed a new system called psiX that runs on top of an existing distributed hashing framework Under the psiX system, each XML document is mapped into an algebraic signature that captures the structural summary of the document An XML query pattern is also mapped into a signature The query's signature is used to locate relevant document signatures Our signature scheme supports holistic processing of query patterns without breaking them into multiple path queries and processing them individually The participating peers in the network collectively maintain a collection of distributed hierarchical indexes for the document signatures Value indexes are built to handle numeric and textual values in XML documents These indexes are used to process queries with value predicates Our experimental study on PlanetLab demonstrates that psiX provides an efficient location service in a P2P network for a wide variety of XML documents

...read moreread less

Journal Article•10.1016/J.DATAK.2008.09.001•

S3: Evaluation of tree-pattern XML queries supported by structural summaries

[...]

Sayyed Kamyar Izadi¹, Theo Härder², Mostafa S. Haghjoo¹•Institutions (2)

Iran University of Science and Technology¹, Kaiserslautern University of Technology²

1 Jan 2009

TL;DR: In this paper, a novel method is proposed, called S^3, which can selectively process the document's nodes and substantially outperform previous QTP processing methods w.r.t. response time, I/O overhead, and memory consumption - critical parameters in any real multi-user environment.

...read moreread less

Abstract: XML queries are frequently based on path expressions where their elements are connected to each other in a tree-pattern structure, called query tree pattern (QTP). Therefore, a key operation in XML query processing is finding those elements which match the given QTP. In this paper, we propose a novel method, called S^3, which can selectively process the document's nodes. In S^3, unlike all previous methods, path expressions are not directly executed on the XML document, but first they are evaluated against a guidance structure, called QueryGuide. Enriched by information extracted from the QueryGuide, a query execution plan, called SMP, is generated to provide focused pattern matching and avoid document access as far as possible. Moreover, our experimental results confirm that S^3 and its optimized version OS^3 substantially outperform previous QTP processing methods w.r.t. response time, I/O overhead, and memory consumption - critical parameters in any real multi-user environment.

...read moreread less

Publishing XBRL as Linked Open Data

[...]

Roberto García, Rosa Gil

1 Jan 2009

TL;DR: This work has put into practice this approach mapping the XBRL filings available from the SEC’s EDGAR program to Resource Description Framework (RDF) and the XML Schema taxonomies these filings are based on to Web Ontology Language (OWL).

...read moreread less

Abstract: The XML Business Reporting Language (XBRL) is a standard for business and financial information reporting. It is based on XML so instance documents based on XBRL, e.g. a quarterly report, are highly constrained by the XML document-oriented nature. This makes more difficult to perform queries that mix information from filings from different dates, companies, or accounting principles than with a formalism based on a graph model instead of a tree model. Semantic Web technologies provide a graph model that facilitates mashing-up different XBRL sources. We have put into practice this approach mapping the XBRL filings available from the SEC’s EDGAR program to Resource Description Framework (RDF) and the XML Schema taxonomies these filings are based on to Web Ontology Language (OWL). The resulting semantic metadata, though highly tied to the XML structure it is mapped from, benefits from Semantic Web technologies and tools in order to facilitate integration and crossquerying, even together with other parts of the Web of Linked Data.

...read moreread less

Proceedings Article•10.1109/ICME.2009.5202458•

Efficient XML Interchange for rich internet applications

[...]

Daniel Peintner¹, Harald Kosch¹, Jörg Heuer²•Institutions (2)

University of Passau¹, Siemens²

28 Jun 2009

TL;DR: This paper evaluates the efficient access of rich internet applications especially in the domain of resource limited embedded devices such as digital picture frames and proposes a generic adaption of the EXI format to even increase the efficiency.

...read moreread less

Abstract: The tremendous acceptance of web applications, or more specifically, rich media applications are about to be extended to embedded devices such as mobile phones, digital picture frames or TV sets. The Extensible Markup Language (XML) is one important pillar when we deal with such internet applications. XML is known as the interchange language of the web. Besides its outstanding features, in the domain of embedded devices, XML is difficult to handle due to the processing overhead and the verbosity associated with its use. This paper evaluates the efficient access of rich internet applications especially in the domain of resource limited embedded devices such as digital picture frames. For this purpose, typical XML source information models for rich internet applications, such as Silverlight and SVG, are evaluated. In this context, the new Efficient XML Interchange (EXI) format is applied and studied. Finally a generic adaption of the EXI format is developed to even increase the efficiency. The paper concludes with the proposal for further studies on an integration of EXI-based typed interfaces to reduce the processing complexity for rich media applications on embedded devices.

...read moreread less

Mining tree-based association rules from XML documents.

[...]

Mirjana Mazuran¹, Elisa Quintarelli¹, Letizia Tanca¹•Institutions (1)

Polytechnic University of Milan¹

1 Jan 2009

TL;DR: An approach to extract Tree-based association rules from XML documents that provide approximate, intensional information on both the structure and the content of XML documents, and can be stored in XML format to be queried later on is described.

...read moreread less

Abstract: The increasing amount of very large XML datasets available to casual users is a most challenging problem for our community, and calls for an appropriate support to efficiently gather knowledge from these data. Data mining, already widely applied to extract frequent correlations of values from both structured and semi-structured datasets, is the appropriate tool for knowledge elicitation. In this work we describe an approach to extract Tree-based association rules from XML documents. Such rules provide approximate, intensional information on both the structure and the content of XML documents, and can be stored in XML format to be queried later on. The mined knowledge is used to provide: (i) quick, approximate answers to queries and (ii) information about structural regularities. A prototype system demonstrates the effectiveness of the approach.

...read moreread less

Book Chapter•10.1007/978-3-642-00887-0_44•

Hash-Search: An Efficient SLCA-Based Keyword Search Algorithm on XML Documents

[...]

Weiyan Wang¹, Xiaoling Wang², Aoying Zhou²•Institutions (2)

Fudan University¹, East China Normal University²

16 Mar 2009

TL;DR: This paper designs efficient index and proposes hash-based method to answer SLCA-based keyword search queries and outperforms Incremental Multiway-SLCA approach, which is the most efficient algorithms in the literature.

...read moreread less

Abstract: XML is a de-facto standard for exchanging and presenting information and keyword search over XML documents has become an interesting topic. However semi-structured XML data give rise to many challenges of conventional information retrieval technologies. In order to return highly-related data nodes and improve the quality of keyword search result, SLCA(Smallest Lowest Common Ancestor )-based keyword search on XML data is recently attracting more and more attention in the database community. In this paper, we design efficient index and propose hash-based method to answer SLCA-based keyword search queries. Our approach outperforms Incremental Multiway-SLCA approach , which is the most efficient algorithms in the literature. We demonstrate the effectiveness of our algorithms analytically and experimentally.

...read moreread less

Proceedings Article•10.1145/1655121.1655129•

The curse of namespaces in the domain of XML signature

[...]

Meiko Jensen¹, Lijun Liao¹, Jörg Schwenk¹•Institutions (1)

Ruhr University Bochum¹

13 Nov 2009

TL;DR: It is shown that the interplay of XML Signature, XPath, and the XML namespace concept has severe flaws that can be exploited for an attack, and that XML namespaces in general pose real troubles to digital signatures in the XML domain.

...read moreread less

Abstract: The XML signature wrapping attack is one of the most discussed security issues of the Web Services security community during the last years. Until now, the issue has not been solved, and all countermeasure approaches proposed so far were shown to be insufficient.In this paper, we present yet another way to perform signature wrapping attacks by using the XML namespace injection technique. We show that the interplay of XML Signature, XPath, and the XML namespace concept has severe flaws that can be exploited for an attack, and that XML namespaces in general pose real troubles to digital signatures in the XML domain. Additionally, we present and discuss some new approaches in countering the proposed attack vector.

...read moreread less

Patent•

Efficient XML Tree Indexing Structure Over XML Content

[...]

Anguel Novoselsky¹, Zhen Hua Liu¹, Thomas Baby¹•Institutions (1)

Business International Corporation¹

30 Oct 2009

TL;DR: In this paper, a method and apparatus for building and using a persistent XML tree index for navigating an XML document is described, which is stored separately from the XML document content, and thus is able to optimize performance through the use of fixed-sized index entries.

...read moreread less

Abstract: A method and apparatus are provided for building and using a persistent XML tree index for navigating an XML document. The XML tree index is stored separately from the XML document content, and thus is able to optimize performance through the use of fixed-sized index entries. The XML document hierarchy need not be constructed in volatile memory, so creating and using the XML tree index scales even for large documents. To evaluate a path expression including descendent or ancestral syntax, navigation links can be read from persistent storage and used directly to find the nodes specified in the path expression. The use of an abstract navigational interface allows applications to be written that are independent of the storage implementation of the index and the content. Thus, the XML tree index can index documents stored at least in a database, a persistent file system, or as a sequence of in memory.

...read moreread less

Book Chapter•10.1007/978-3-642-03555-5_7•

A Data Parallel Algorithm for XML DOM Parsing

[...]

Bhavik Shah¹, Praveen Rao¹, Bongki Moon², Mohan Rajagopalan³•Institutions (3)

University of Missouri–Kansas City¹, University of Arizona², Intel³

21 Aug 2009

TL;DR: Through empirical evaluation, it is shown that ParDOM yields better scalability than PXP on commodity multicore processors, and can process a wide-variety of XML datasets with complex structures which PXP fails to parse.

...read moreread less

Abstract: The extensible markup language XML has become the de facto standard for information representation and interchange on the Internet. XML parsing is a core operation performed on an XML document for it to be accessed and manipulated. This operation is known to cause performance bottlenecks in applications and systems that process large volumes of XML data. We believe that parallelism is a natural way to boost performance. Leveraging multicore processors can offer a cost-effective solution, because future multicore processors will support hundreds of cores, and will offer a high degree of parallelism in hardware. We propose a data parallel algorithm called ParDOM for XML DOM parsing, that builds an in-memory tree structure for an XML document. ParDOM has two phases. In the first phase, an XML document is partitioned into chunks and parsed in parallel. In the second phase, partial DOM node tree structures created during the first phase, are linked together (in parallel) to build a complete DOM node tree. ParDOM offers fine-grained parallelism by adopting a flexible chunking scheme --- each chunk can contain an arbitrary number of start and end XML tags that are not necessarily matched. ParDOM can be conveniently implemented using a data parallel programming model that supports map and sort operations. Through empirical evaluation, we show that ParDOM yields better scalability than PXP [23] --- a recently proposed parallel DOM parsing algorithm --- on commodity multicore processors. Furthermore, ParDOM can process a wide-variety of XML datasets with complex structures which PXP fails to parse.

...read moreread less

Book Chapter•10.1007/978-3-642-00672-2_10•

Processing XML Keyword Search by Constructing Effective Structured Queries

[...]

Jianxin Li¹, Chengfei Liu¹, Rui Zhou¹, Bo Ning¹•Institutions (1)

Swinburne University of Technology¹

22 Mar 2009

TL;DR: This paper designs an adaptive XML keyword search approach, called XBridge, that can derive the semantics of a keyword query and generate a set of effective structured queries by analyzing the given keywords and the schemas of XML data sources.

...read moreread less

Abstract: Recently, keyword search has attracted a great deal of attention in XML database. It is hard to directly improve the relevancy of XML keyword search because lots of keyword-matched nodes may not contribute to the results. To address this challenge, in this paper we design an adaptive XML keyword search approach, called XBridge , that can derive the semantics of a keyword query and generate a set of effective structured queries by analyzing the given keyword query and the schemas of XML data sources. To efficiently answer keyword query, we only need to evaluate the generated structured queries over the XML data sources with any existing XQuery search engine. In addition, we extend our approach to process top-k keyword search based on the execution plan to be proposed. The quality of the returned answers can be measured using the context of the keyword-matched nodes and the contents of the nodes together. The effectiveness and efficiency of XBridge is demonstrated with an experimental performance study on real XML data.

...read moreread less

Proceedings Article•10.1145/1516360.1516402•

Expressive, yet tractable XML keys

[...]

Sven Hartmann¹, Sebastian Link²•Institutions (2)

Clausthal University of Technology¹, Victoria University of Wellington²

24 Mar 2009

TL;DR: This work investigates XML keys which uniquely identify XML elements based on a very general notion of value-equality: isomorphic subtrees with the identity on data values, and establishes a sound and complete set of inference rules for this expressive fragment of XML keys.

...read moreread less

Abstract: Constraints are important for a variety of XML recommendations and applications. Consequently, there are numerous opportunities for advancing the treatment of XML semantics. In particular, suitable notions of keys will enhance XML's capabilities of modeling, managing and processing native XML data. However, the different ways of accessing and comparing XML elements make it challenging to balance expressiveness and tractability.We investigate XML keys which uniquely identify XML elements based on a very general notion of value-equality: isomorphic subtrees with the identity on data values. Previously, an XML key fragment has been recognised that is robust in the sense that its implication problem can be expressed as the reachability problem in a suitable digraph. We analyse the impact of extending this fragment by structural keys that uniquely identify XML elements independently of any data. We establish a sound and complete set of inference rules for this expressive fragment of XML keys, and encode these rules in an algorithm that decides the associated implication problem in time quadratic in the size of the input keys. Consequently, we gain significant expressiveness without any loss of efficiency in comparison to less expressive XML key fragments.

...read moreread less

Proceedings Article•10.1109/GRID.2009.5353070•

Parallel and distributed approach for processing large-scale XML datasets

[...]

Zacharia Fadika¹, Michael R. Head¹, Madhusudhan Govindaraju¹•Institutions (1)

Binghamton University¹

11 Dec 2009

TL;DR: This work has adapted the Hadoop implementation to determine the threshold data sizes and computation work required per node, for a distributed solution to be effective and presents both a parallel and distributed approach to analyze how the scalability and performance requirements of large-scale XML-based data processing can be achieved.

...read moreread less

Abstract: An emerging trend is the use of XML as the data format for many distributed scientific applications, with the size of these documents ranging from tens of megabytes to hundreds of megabytes. Our earlier benchmarking results revealed that most of the widely available XML processing toolkits do not scale well for large sized XML data. A significant transformation is necessary in the design of XML processing for scientific applications so that the overall application turn-around time is not negatively affected. We present both a parallel and distributed approach to analyze how the scalability and performance requirements of large-scale XML-based data processing can be achieved. We have adapted the Hadoop implementation to determine the threshold data sizes and computation work required per node, for a distributed solution to be effective. We also present an analysis of parallelism using our Piximal toolkit for processing large-scale XML datasets that utilizes the capabilities for parallelism that are available in the emerging multi-core architectures. Multi-core processors are expected to be widely available in research clusters and scientific desktops, and it is critical to harness the opportunities for parallelism in the middleware, instead of passing on the task to application programmers. Our parallelization approach for a multi-core node is to employ a DFA-based parser that recognizes a useful subset of the XML specification, and convert the DFA into an NFA that can be applied to an arbitrary subset of the input. Speculative NFAs are scheduled on available cores in a node to effectively utilize the processing capabilities and achieve overall performance gains. We evaluate the efficacy of this approach in terms of potential speedup that can be achieved for representative XML data sets.

...read moreread less

Proceedings Article•10.1145/1526709.1526857•

Interactive search in XML data

[...]

Guoliang Li¹, Jianhua Feng¹, Lizhu Zhou¹•Institutions (1)

Tsinghua University¹

20 Apr 2009

TL;DR: This paper proposes a new information-access paradigm for XML data, called "Inks," in which the system searches on the underlying data "on the fly" as the user types in query keywords, and implemented the algorithm.

...read moreread less

Abstract: In a traditional keyword-search system over XML data, a user composes a keyword query, submits it to the system, and retrieves relevant subtrees. In the case where the user has limited knowledge about the data, often the user feels "left in the dark" when issuing queries, and has to use a try-and-see approach for finding information. In this paper, we study a new information-access paradigm for XML data, called "Inks," in which the system searches on the underlying data "on the fly" as the user types in query keywords. Inks extends existing XML keyword search methods by interactively answering queries. We propose effective indices, early-termination techniques, and efficient search algorithms to achieve a high interactive speed. We have implemented our algorithm, and the experimental results show that our method achieves high search efficiency and result quality.

...read moreread less

...

Expand