TL;DR: An extensible programming framework to separate platform-specific optimizations from application codes, and to incrementally improve the performance of an existing application without messing up the code is proposed.
Abstract: This paper proposes an extensible programming framework to separate platform-specific optimizations from application codes. The framework allows programmers to define their own code translation rules for special demands of individual systems, compilers, libraries, and applications. Code translation rules associated with user-defined compiler directives are defined in an external file, and the application code is just annotated by the directives. For code transformations based on the rules, the framework exposes the abstract syntax tree (AST) of an application code as an XML document to expert programmers. Hence, the XML document of an AST can be transformed using any XML-based technologies. Our case studies using real applications demonstrate that the framework is effective to separate platform-specific optimizations from application codes, and to incrementally improve the performance of an existing application without messing up the code.
TL;DR: A bidirectional XML updatelanguage called BIFLUX (BIdirectional FunctionaL Updates for XML), inspired by the FLUX XML update language, is proposed, with a clear and well-behaved biddirectional semantics and a decidable static type system based on regular expression types.
Abstract: Different XML formats are widely used for data exchange and processing, being often necessary to mutually convert between them. Standard XML transformation languages, like XSLT or XQuery, are unsatisfactory for this purpose since they require writing a separate transformation for each direction. Existing bidirectional transformation languages mean to cover this gap, by allowing programmers to write a single program that denotes both transformations. However, they often 1) induce a more cumbersome programming style than their traditionally unidirectional relatives, to establish the link between source and target formats, and 2) offer limited configurability, by making implicit assumptions about how modifications to both formats should be translated that may not be easy to predict.This paper proposes a bidirectional XML update language called BIFLUX (BIdirectional FunctionaL Updates for XML), inspired by the FLUX XML update language. Our language adopts a novel bidirectional programming by update paradigm, where a program succinctly and precisely describes how to update a source document with a target document, in an intuitive way, such that there is a unique "inverse" source query for each update program. BIFLUX extends FLUX with bidirectional actions that describe the connection between source and target formats. We introduce a core BIFLUX language, with a clear and well-behaved bidirectional semantics and a decidable static type system based on regular expression types.
TL;DR: A novel two-tier index structure is proposed to facilitate the access of XML document in an on-demand broadcast system and provides the clients with an overall image of all the XML documents available at the server side and hence enables the clients to locate complete result sets accordingly.
Abstract: XML data broadcast is an efficient way to disseminate semistructured information in wireless mobile environments. In this paper, we propose a novel two-tier index structure to facilitate the access of XML document in an on-demand broadcast system. It provides the clients with an overall image of all the XML documents available at the server side and hence enables the clients to locate complete result sets accordingly. A pruning strategy is developed to cut down the index size and a two-tier structure is proposed to further remove any redundant information. In addition, two index distribution strategies, namely naive distribution and partial distribution, have been designed to interleave the index information with the XML documents in the wireless channels. Theoretical analysis and simulation experiments are also put forward to show the benefits of our indexing methods.
TL;DR: This paper will present how to shift from XML encryption to JSON encryption, a lightweight data format that is inter-changeable with a programming languages built-in data structures that eliminates translation time and reduces complexity and processing time.
Abstract: JavaScript Object Notation (JSON) is a lightweight data-interchange format. It is easy for humans to read and write. It has a data format that is inter-changeable with a programming languages built-in data structures that eliminates translation time and reduces complexity and processing time. Moreover, JSON has the same strengths of XML. Therefore, it's better to shift form XML security to JSON security. In this paper, we will present how to shift from XML encryption to JSON encryption.
TL;DR: A comparative analysis of the various schemes available to efficiently store and query the temporal and multi-versioned XML documents based on temporal, change management, versioning, and querying support is provided.
Abstract: Extensible Markup Language (XML) documents are associated with time in two ways: (1) XML documents evolve over time and (2) XML documents contain temporal information. The efficient management of the temporal and multi-versioned XML documents requires optimized use of storage and efficient processing of complex historical queries. This paper provides a comparative analysis of the various schemes available to efficiently store and query the temporal and multi-versioned XML documents based on temporal, change management, versioning, and querying support. Firstly, the paper studies the multi-versioning control schemes to detect, manage, and query change in dynamic XML documents. Secondly, it describes the storage structures used to efficiently store and retrieve XML documents. Thirdly, it provides a comparative analysis of the various commercial tools based on change management, versioning, collaborative editing, and validation support. Finally, the paper presents some future research and development directions for the multi-versioned XML documents.
TL;DR: An optimization approach that takes into consideration the semantics of the dataset in order to deal with the complexity of multi-disciplinary domains in Big Data, in particular when the data is represented as XML documents is adopted.
TL;DR: This chapter aims to give a reasonably comprehensive definition and motivation for the various aspects of the generic XML language and also to illustrate these aspects with some existing XML dialects or vocabularies.
Abstract: This chapter aims to give a reasonably comprehensive definition and motivation for the various aspects of the generic XML language and also to illustrate these aspects with some existing XML dialects or vocabularies We describe elements, attributes, child elements, and the hierarchical structure of XML We talk about “well-formedness” of an XML document and how to identify errors in a document’s structure We discuss the use of namespaces and end with a brief discussion of validating documents with respect to DTDs and XML Schema Readers already familiar with all aspects of XML can skip this chapter and read about the functions used to work with XML in R, which are the subject of each of Chapters 3, 4, 5, and 6
TL;DR: This paper adopts a novel dynamic encoding scheme which is tailored for both static and dynamic possibilistic XML documents to effectively avoid re-labeling after updates, and proposes an efficient algorithm to handle the problem of dynamic twig queries in Possibilism XML documents.
TL;DR: This paper gives formalized representations of XML data sources, including Document Type Definitions (DTDs), XML Schemas, and XML documents, and proposes formal approaches for transforming the XML data Sources into ontologies, and discusses the correctness of the transformations and provides several transformation examples.
Abstract: The eXtensible Markup Language (XML) has reached a wide acceptance as the relevant standardization for representing and exchanging data on the Web Unfortunately, XML covers the syntactic level but lacks semantics, and thus cannot be directly used for the Semantic Web Currently, finding a way to utilize XML data for the Semantic Web is challenging research As we have known that ontology can formally represent shared domain knowledge and enable semantics interoperability Therefore, in this paper, we investigate how to represent and reason about XML with ontologies Firstly, we give formalized representations of XML data sources, including Document Type Definitions (DTDs), XML Schemas, and XML documents On this basis, we propose formal approaches for transforming the XML data sources into ontologies, and we also discuss the correctness of the transformations and provide several transformation examples Furthermore, following the proposed approaches, we implement a prototype tool that can automatically transform XML into ontologies Finally, we apply the transformed ontologies for reasoning about XML, so that some reasoning problems of XML may be checked by the existing ontology reasoners
TL;DR: This paper surveys the existing XML fragmentation approaches in literature, comparing their features and highlighting their drawbacks, and establishes a map of the area to establish a consensus in the database community as to what an XML fragment is.
Abstract: Efficient document processing is a must when large volumes of XML data are involved In such critical scenarios, a well-known solution to this problem is to distribute (map) the data among several processing nodes, and then distribute the processing accordingly, taking advantage of parallelism This is the approach taken by distributed databases and MapReduce environments Fragmentation techniques play an important role in these scenarios They provide a way to "cut" the database into pieces and distribute the pieces over a network This way, queries can also be "cut" into sub-queries that run in parallel, thus achieving better performance when compared to the centralized environment However, there is no consensus in the database community as to what an XML fragment is In fact, several approaches in literature present definitions of XML fragments In addition to query processing, using XML fragmentation techniques may also be helpful when managing XML documents distributed along the web or clouds This paper surveys the existing XML fragmentation approaches in literature, comparing their features and highlighting their drawbacks Our contribution resides in establishing a map of the area
TL;DR: ReLab, a subtree based labeling scheme which generates labels using depth-first traversal is introduced, and it is indicated that ReLab outperformed Dietz and region numbering schemes in terms of time taken to generate labels for each XML nodes.
Abstract: XML has become the de facto standard in the real world application over the WWW. Thus, data or query processing is critical to ensure speed response time to cater user queries. Response time is often influenced by the complexity of labeling scheme which is not only used for unique identification of XML nodes, but for structural relationship purpose as well. The labeling scheme adopted is vital to ensure query processing is done flawlessly and promptly. In this paper, we introduce ReLab, a subtree based labeling scheme which generates labels using depth-first traversal. Our experimental evaluation indicated that ReLab outperformed Dietz and region numbering schemes in terms of time taken to generate labels for each XML nodes.
TL;DR: A novel Nearest Common Object Node semantics (NCON), which includes not just common object ancestors but also common object descendants is introduced, which outperforms the state-of-the-art approaches in terms of both effectiveness and efficiency.
Abstract: It is well known that some XML elements correspond to objects (in the sense of object-orientation) and others do not. The question we consider in this paper is what benefits we can derive from paying attention to such object semantics, particularly for the problem of keyword queries. Keyword queries against XML data have been studied extensively in recent years, with several lowest-common-ancestor based schemes proposed for this purpose, including SLCA, MLCA, VLCA, and ELCA. It can be seen that identifying objects can help these techniques return more meaningful answers than just the LCA node (or subtree) by returning objects instead of nodes. It is more interesting to see that object semantics can also be used to benefit the search itself. For this purpose, we introduce a novel Nearest Common Object Node semantics (NCON), which includes not just common object ancestors but also common object descendants. We have developed XRich, a system for our NCON-based approach, and used it in our extensive experimental evaluation. The experimental results show that our proposed approach outperforms the state-of-the-art approaches in terms of both effectiveness and efficiency.
TL;DR: XML transformation that focuses on each XML construct transforming to a class diagram is described, and can be used as an alternative solution to show a complete reverse XML schema.
Abstract: XML Reverse Engineering is a research that focuses on getting a conceptual model using an XML schema. In integration issue, previous XML reverse engineering researchers apply the reverse method of XML schema or document in order to generate a class diagram. How-ever, to generate a complete class diagram, XML constructs are not used entirely. Therefore, this paper describes XML transformation that focuses on each XML construct transforming to a class diagram. In order to generate a complete class diagram, formal method is used. There are several steps involved in constructing and transforming each XML into a class diagram. In order to ensure the formalization is complete, the ebXml case study is used and from the result obtained, this method can be used as an alternative solution to show a complete reverse XML schema.
TL;DR: A novel XML labeling scheme is proposed that helps quick determination of structural relationship among XML nodes and supports dynamic updates without relabeling nodes in case of update occurrences.
Abstract: Rapid development of XML technology over the World Wide Web has motivated the need for query optimization especially in a dynamic environment As such, a good XML labeling scheme to ensure fast query processing is crucial Although many labeling schemes were proposed in the past, only few support structural relationship efficiently Therefore, in this paper, we propose a novel XML labeling scheme that helps quick determination of structural relationship among XML nodes and supports dynamic updates without relabeling nodes in case of update occurrences
TL;DR: An intensive experimental evaluation on real-world benchmark XML corpora reveals a higher effectiveness of XML co-clustering in comparison with state-of-the-art approaches to XML clustering, by viewing the task as parametric with respect to the XML features.
Abstract: XML co-clustering is a promising method to overcome the effectiveness of traditional XML clustering approaches, due to the exploitation of the mutual relationships between XML documents and their respective XML features while clustering both simultaneously. To shed light on this so far unexplored research direction, we conduct a systematic study of the effectiveness of XML co-clustering, by viewing the task as parametric with respect to the XML features. Thus, the definition and exploitation of three distinct types of XML features, which are respectively informative of the content, structure and both aspects of the XML documents, allows an in-depth investigation of all three different instances of the XML co-clustering task, i.e., XML co-clustering by content alone, structure alone as well as both structure and content. XML co-clustering relies on a non-negative matrix trifactorization technique, that efficiently processes large-scale input data, which is especially useful with large corpora of text-centric XML documents. The relevance of the structural and content features of the XML documents is assessed through a new weighting scheme. An intensive experimental evaluation on real-world benchmark XML corpora reveals a higher effectiveness of XML co-clustering in comparison with state-of-the-art approaches to XML clustering. Insights are also provided on the effectiveness of XML feature clustering.
TL;DR: A secure and efficient XML labeling scheme called Secure Dewey Coding (SDC) is proposed that prevents information leak and assures minimal memory space and time and the generation time also decreased significantly.
Abstract: XML is the commonly utilized content specification format for data interchange over the Internet. In Publish/Subscribe model, producer is the source for an XML document and disseminates the XML content to the consumer using a mediator called publisher. Producer labels the XML document and defines access control policies for the consumers. Securely labeled XML document are encrypted and sent to the publisher with consumers access details. Encryption is used to provide confidentiality and integrity for XML content dissemination. Consumer queries the publisher for their accessible content. Here, XML label plays a vital role which locates the XML content uniquely. The objective is to design a secure label that has to identify each XML tag uniquely, should not reveal any additional information about the source XML document. Also, XML label size should be optimal with less label generation time. We proposed a secure and efficient XML labeling scheme called Secure Dewey Coding (SDC) that prevents information leak and assures minimal memory space and time. The implementation results of the proposed XML labeling scheme showed that the XML label size has been reduced to a maximum and an average of 68% and 59% respectively and the generation time also decreased significantly.
TL;DR: This work proposes a method to integrate XPath with keyword search so that users can express their search demands in more specific ways.
Abstract: Recently, a great deal of attention has been focusing on processing keyword search over static and XML streams. Keyword search is becoming more popular for its simplicity and its user-friendliness in querying XML databases. However, it is hard to express real search intention with just keyword search. There are many cases where the combination of path-based query and keyword search can deal with such issue. To address this problem, we propose a method to integrate XPath with keyword search so that users can express their search demands in more specific ways.
TL;DR: A new keyword search approach which basically utilizes the statistics of underlying XML data to decide the promising result types and then quickly retrieves the corresponding results with the help of selected promising results types is proposed.
Abstract: Keyword search enables inexperienced users to easily search XML database with no specific knowledge of complex structured query languages and XML data schemas. Existing work has addressed the problem of selecting data nodes that match keywords and connecting them in a meaningful way, e.g., SLCA and ELCA. However, it is time-consuming and unnecessary to serve all the connected subtrees to the users because in general the users are only interested in part of the relevant results. In this paper, we propose a new keyword search approach which basically utilizes the statistics of underlying XML data to decide the promising result types and then quickly retrieves the corresponding results with the help of selected promising result types. To guarantee the quality of the selected promising result types, we measure the correlations between result types and a keyword query by analyzing the distribution of relevant keywords and their structures within the XML data to be searched. In addition, relevant result types can be efficiently computed without keyword query evaluation and any schema information. To directly return top-k keyword search results that conform to the suggested promising result types, we design two new algorithms to adapt to the structural sensitivity of the keyword nodes over the keyword search results. Lastly, we implement all proposed approaches and present the relevant experimental results to show the effectiveness of our approach.
TL;DR: The method of full text data searches that can be integrated with XML database and the performance can be improved by implementing structured and text index based technique is furnished.
Abstract: A Semistructured data can be represented in a tree structure which is an efficient tool for describing different kinds of data. This paper furnishes the method of full text data searches that can be integrated with XML database and the performance can be improved by implementing structured and text index based technique. The searching keyword of structure index should be a node identifier whereas the searching keyword of text index should be a content fragment of an XML document. The experimental results exhibit the improved performance of document retrieval.
TL;DR: A novel privacy protection model for XML, and an algorithm for implementing this model, and a new privacy property, δ-dependency, which can be applied to both relational and XML data, and that takes into consideration the hierarchical nature of sensitive data.
TL;DR: This paper proposes a novel structure for streaming XML data called PS+Pre/Post by integrating the path summary technique and the pre/post labeling scheme to efficiently process different types of XML queries over the broadcast stream.
TL;DR: An approach based on Tree-Based Association Rules (TAR’s) mined rule is illustrated, which gives approximately expected information on both the content & structure of XML documents as well.
Abstract: Cutting in a recent technology we illustrate an approach based on Tree-Based Association Rules(TAR’s) mined rule, which gives approximately expected information on both the content & structure of XML documents as well.The XML documents can be access in a two ways Keyword-Based Search & Query Answering. The main idea of Association rules to offers briefly representations of XML documents and has been search in a several proposals either by using language jquery,xquery etc & techniques made in the xml context or implements in a graph or tree based algorithm, therefore we present a proposal for storing a TAR’s mining as a means to present intentional knowledge in a native xml.
TL;DR: An XML privacy protection model is proposed by separating the structure and content, and with cloud storage to save content information and Trusted Third Party (TTP) to help manage structure information.
TL;DR: Experimental results show that the use of the SecNode structure for secure XML data broadcast improves the performance of XML query processing in terms of tuning time and therefore reduces the power consumption at mobile clients.
TL;DR: This paper first explores the XML Rewriting Attack that can take place in Web Service communication, then investigates detection techniques and describes their limitations, and discusses general countermeasures for prevention and mitigation of XML Rewritten attacks.
Abstract: Making Web Services secure means making SOAP messages secure and keeping them secure wherever they go. Several security standards of Web Service Security (WS-Security), such as XML Digital Signature, are used to secure SOAP messages exchange in Web Service environment. However, the content of a SOAP message, protected with XML Digital Signature, can be changed without invalidating the signature. In this paper, we present a study on detection techniques of XML Rewriting attacks in Web Services. We first explore the XML Rewriting Attack that can take place in Web Service communication. We further investigate detection techniques and describe their limitations. Finally, we discuss general countermeasures for prevention and mitigation of XML Rewriting attacks.
TL;DR: An open Hive schema approach to XML data placement in Hadoop to build column-oriented and OLAP-focused XML data warehouse of heterogeneous content has benefits for data access, maintenance and scalability.
Abstract: Big data in XML format poses a challenge to distributed data systems. This paper proposes an open Hive schema approach to XML data placement in Hadoop. Placing XML data in Hive with this generic schema to build column-oriented and OLAP-focused XML data warehouse of heterogeneous content has benefits for data access, maintenance and scalability.
TL;DR: This paper presents architecture of a new XML Full Text Index, XQuery compile time and run time enhancements to efficiently support XQFT in SQL/XML, and presents the design rationale on how to exploit Information Retrieval (IR) techniques for XQ FT support in RDBMS.
Abstract: There has been more than decade of efforts of supporting storage, query and update XML documents in RDBMS. XML enabled RDBMS supports SQL/XML standard that defines XMLType as a SQL data type and allows XQuery/XPath embedded in XMLQuery(), XMLExists() and XMLTABLE() in SQL. In XML enabled RDBMS, both relational data and XML documents can be managed in one system and queried using SQL/XML language. However, the use case of management of document centric XML is not well-addressed due to the lacking of full text query constructs in XQuery. Recently, XQuery Full Text (XQFT) becomes the W3C recommendation. In this paper, we show how XQFT can be supported efficiently in SQL/XML for full text search of XML documents managed by XML enabled RDBMS, such as Oracle XMLDB. We present architecture of a new XML Full Text Index, XQuery compile time and run time enhancements to efficiently support XQFT in SQL/XML. We present our design rationale on how to exploit Information Retrieval (IR) techniques for XQFT support in RDBMS. The new XML Full Text Index can index common XML physical storage forms: such as text XML, binary XML, relational decomposition of the XML. Although our work is built within Oracle XMLDB, all of the presented principles and techniques in this paper are valuable enough to RDBMS industry that needs to effectively and efficiently support of XQFT over persisted XML documents.
TL;DR: A tool that aids schema developers and standard groups to track XML schema changes, log them, and help in the enhancement of a particular schema version is developed, called XSM, which efficiently stores and retrieves versioned XSDs and evaluates them based on the quality indicators defined for this purpose.
Abstract: The extensible Mark up Language (XML) is a meta language that is widely used to provide a non-proprietary universal format for sharing hierarchical data among different software systems and application domains. Moreover, many organizations and content providers have been publishing and sharing their information through XML and its standard schemas. In this context, it is extremely important when designing new schemas or enhancing current ones, there is a mechanism to ensure that the schemas will be well-designed versions. In this paper, we develop a tool that aids schema developers and standard groups to track XML schema changes, log them, and help in the enhancement of a particular schema version. We develop a schema monitoring tool called XSM, which efficiently stores and retrieves versioned XSDs and evaluates them based on the quality indicators defined for this purpose. The quality of delta changes in the schema versions is examined through a set of synthetic XSDs.
TL;DR: This paper presents an inference-based XML evolution approach using Prolog to deal with this problem, and composes multiple syntactic changes, which usually have a common purpose, to infer semantic changes.
Abstract: Applications are increasingly using XML to represent semi-structured data and, consequently, a large amount of XML documents is available worldwide. As XML documents evolve over time, comparing XML documents to understand their evolution becomes fundamental. The main focus of existing research for comparing XML documents resides in identifying syntactic changes. However, a deeper notion of the change meaning is usually desired. This paper presents an inference-based XML evolution approach using Prolog to deal with this problem. Differently from existing XML diff approaches, our approach composes multiple syntactic changes, which usually have a common purpose, to infer semantic changes. We evaluated our approach through ten versions of an employment XML document. In this evaluation, we could observe that each new version introduced syntactic changes that could be summarized into semantic changes.
TL;DR: This chapter presents a novel approach for securing financial XML transactions using an effective and intelligent fuzzy classification technique and verified tangible enhancements in encryption efficiency, processing-time reduction, and resulting XML message sizes.
Abstract: In this chapter we present a novel approach for securing financial XML transactions using an effective and intelligent fuzzy classification technique. Our approach defines the process of classifying XML content using a set of fuzzy variables. Upon fuzzy classification phase, a unique value is assigned to a defined attribute named "ImportanceLevel." Assigned value indicates the data sensitivity for each XML tag. The model also defines the process of securing classified financial XML message content by performing element-wise XML encryption on selected parts defined in fuzzy classification phase. Element-wise encryption is performed using symmetric encryption using AES algorithm with different key sizes. A key size of 128-bit is being used on tags classified with "Medium" importance level; a key size of 256-bit is being used on tags classified with "High" importance level. An implementation has been performed on a real-life environment using an online banking system to demonstrate system efficiency. Our experimental results verified tangible enhancements in encryption efficiency, processing-time reduction, and resulting XML message sizes.