TL;DR: The SPARQL2XQuery Framework as discussed by the authors provides a mapping model for the expression of OWL---RDF/S to XML Schema mappings as well as a method for SPARql to XQuery translation.
Abstract: In the context of the emergent Web of Data, a large number of organizations, institutes and companies (e.g., DBpedia, Data.gov, GeoNames, PubMed) adopt the Linked Data practices. Utilizing the Semantic Web (SW) technologies, they publish their data and offer SPARQL endpoints (i.e., SPARQL-based search services). On the other hand, the dominant standard for information exchange in the Web today is XML. Additionally, many international standards (e.g., Dublin Core, MPEG-7, METS, TEI, IEEE LOM) in several domains (e.g., Digital Libraries, GIS, Multimedia, e-Learning) have been expressed in XML Schema. The aforementioned have led to an increasing emphasis on XML data, accessed using the XQuery query language. The SW and XML worlds and their developed infrastructures are based on different data models, semantics and query languages. Thus, it is crucial to develop interoperability mechanisms that allow the Web of Data users to access XML datasets, using SPARQL, from their own working environments. It is unrealistic to expect that all the existing legacy data (e.g., Relational, XML, etc.) will be transformed into SW data. Therefore, publishing legacy data as Linked Data and providing SPARQL endpoints over them has become a major research challenge. In this direction, we introduce the SPARQL2XQuery Framework which creates an interoperable environment, where SPARQL queries are automatically translated to XQuery queries, in order to access XML data across the Web. The SPARQL2XQuery Framework provides a mapping model for the expression of OWL---RDF/S to XML Schema mappings as well as a method for SPARQL to XQuery translation. To this end, our Framework supports both manual and automatic mapping specification between ontologies and XML Schemas. In the automatic mapping specification scenario, the SPARQL2XQuery exploits the XS2OWL component which transforms XML Schemas into OWL ontologies. Finally, extensive experiments have been conducted in order to evaluate the schema transformation, mapping generation, query translation and query evaluation efficiency, using both real and synthetic datasets.
TL;DR: The GroupBased labelling scheme proposed in this thesis has a high performance in processing dynamic XML data updates, its uniform behaviour irrespective of whether the document is static or dynamic, ability to determine all structural relationships between nodes, and the improved query performance in both types of document.
Abstract: Documents that comply with the XML standard are characterised by inherent ordering and their modelling usually takes the form of a tree. Nowadays, applications generate massive amounts of XML data, which requires accurate and efficient query-able XML database systems. XML querying depends on XML labelling in much the same way as relational databases rely on indexes. Document order and structural information are encoded by labelling schemes, thus facilitating their use by queries without having to access the original XML document. Dynamic XML data, data which changes, complicates the labelling scheme. As demonstrated by much research efforts, it is difficult to allocate unique labels to nodes in a dynamic XML tree so that all structural relationships between the nodes are encoded by the labels.
Static XML documents are generally managed with labelling schemes that use simple labels. By contrast, dynamic labelling schemes have extra labelling costs and lower query performance to allow random updates irrespective of the document update frequency. Given that static and dynamic XML documents are often not clearly distinguished, a labelling scheme whose efficiency does not depend on updating frequency would be useful.
The GroupBased labelling scheme proposed in this thesis is compatible with static as well as dynamic XML documents. In particular, this scheme has a high performance in processing dynamic XML data updates. What differentiates it from other dynamic labelling schemes is its uniform behaviour irrespective of whether the document is static or dynamic, ability to determine all structural relationships between nodes, and the improved query performance in both types of document. The advantages of the GroupBased scheme in comparison to earlier schemes are highlighted by the experiment results.
TL;DR: This paper designs and builds a XACS (XML Access Control System), which is capable of making fined-grained access control, and suggests an empirical telemedicine application to confirm the adequacy and validity of the proposed method.
Abstract: XML can supply the standard data type in information exchange format on a lot of data generated in running database or applied programs for a company by using the advantage that it can describe meaningful information directly. Accordingly since there are increasing needs for the efficient management and telemedicine security of the massive volume of XML data, it is necessary to develop a secure access control mechanism for XML. The existing access control has not taken information structures and semantics into full consideration due to the fundamental limitations of HTML. In addition, access control for XML documents allows read operations only, and there are problems of slowing down the system performance due to the complex authorization evaluation process. To resolve this problem, this paper designs and builds a XACS (XML Access Control System), which is capable of making fined-grained access control. This only provides data corresponding to its users' authority levels by authorizing them to access only the specific items of XML documents when they are searching XML documents in telemedicine. To accomplish this, XACS eliminates certain parts of the documents that are inaccessible and transmits the parts accessible depending on the users' authority levels. In addition, it can be expanded to existing web servers because XML documents are used based on the normal web sites. The telemedicine secure and the guidelines are provided to enable quick and precise understanding of the information, and thus the safety enhancement gets improved. Ultimately, this paper suggests an empirical telemedicine application to confirm the adequacy and validity using the proposed method.
TL;DR: This work introduces an XML to RDF transformation approach, which is based on mappings comprising RDF triple templates that employ simple XPath expressions and shows that the time complexity of the mapping algorithm is linear in the size of the XML input and proves its practical efficiency with an evaluation on large real-world data.
Abstract: The Extensible Markup Language (XML) has become a widely adopted data interchange format. With the rise of Linked Data published using the Resource Description Framework (RDF), a number of tools for transforming XML to RDF have been developed. Specifying XML→RDF mappings for these tools often requires skills in programming languages such as XSLT or XQuery. Moreover, these tools are rarely able to deal with large XML inputs. We introduce an XML to RDF transformation approach, which is based on mappings comprising RDF triple templates that employ simple XPath expressions. Thanks to the restricted XPath expressions, which can be evaluated against a stream of XML data, our implementation can handle extremely large input XML files. To process the XML input efficiently, we employ XML filtering techniques and a strategy for selecting relevant XML nodes to generate RDF triples from. We show that the time complexity of our mapping algorithm is linear in the size of the XML input and also prove its practical efficiency with an evaluation on large real-world data.
TL;DR: A new approach to store and query an XML document using relational databases is presented, which decompose anxml document into three tables without using any XML schema or DTD, and achieves lower storage consumption.
Abstract: Due to Its simplicity, its flexibility and its expansion possibilities, XML can be adapted to multiple domains. Its self-described structure and nesting, allows XML to become the dominant standard for storing and transferring data through the World Wide Web. In addition the relational database systems are mature and extremely powerful. Therefore, many researches have been done to propose an efficient approach to store and query an XML document in Relational Database. In this paper we present a new approach to store and query an XML document using relational databases, which decompose an XML document into three tables without using any XML schema or DTD. Our approach supports efficiently the structural modifications to the XML tree, and achieves lower storage consumption. Also, we propose two powerful algorithms for mapping XML data to relational databases and from relational database to XML data.
TL;DR: This paper systematically analyze the chosenciphertext attacks on XML Encryption and design an algorithm to perform a vulnerability scan on arbitrary encrypted XML messages and automatically detect a vulnerability and exploit it to retrieve the plaintext of a message protected by XML Enc encryption.
Abstract: In the recent years, XML Encryption became a target of several new attacks [18, 17, 16]. These attacks belong to the family of adaptive chosen-ciphertext attacks, and allow an adversary to decrypt symmetric and asymmetric XML ciphertexts, without knowing the secret keys. In order to protect XML Encryption implementations, the World Wide Web Consortium (W3C) published an updated version of the standard.
Unfortunately, most of the current XML Encryption implementations do not support the newest XML Encryption specification and offer different XML Security configurations to protect confidentiality of the exchanged messages. Resulting from the attack complexity, evaluation of the security configuration correctness becomes tedious and error prone. Validation of the applied countermeasures can typically be made with numerous XML messages provoking incorrect behavior by decrypting XML content. Up to now, this validation was only manually possible.
In this paper, we systematically analyze the chosenciphertext attacks on XML Encryption and design an algorithm to perform a vulnerability scan on arbitrary encrypted XML messages. The algorithm can automatically detect a vulnerability and exploit it to retrieve the plaintext of a message protected by XML Encryption. To assess practicability of our approach, we implemented an open source attack plugin for Web Service attacking tool called WS-Attacker. With the plugin, we discovered new security problems in four out of five analyzed Web Service implementations, including IBM Datapower or Apache CXF.
TL;DR: This paper proposes an original method for measuring the structural similarity between an XML document and an XML grammar (DTD or XSD), considering their most common operators that designate constraints on the existence, repeatability and alternativeness of XML elements/attributes.
TL;DR: Dynamic XDAS is developed as a hybrid labeling scheme that combines the original XDAS with another labeling scheme called IBSL, and shows that dynamic XDAS still can identify the A-D, P-C and sibling relationships using logical operators with efficient label size and storage space.
TL;DR: This paper presents a technique, so-called structural bulk updates, that works in concert with the XQuery Update Facility to support efficient updates on the Pre/Dist/Size encoding and demonstrates the benefits in a detailed performance evaluation based on the XMark benchmark.
Abstract: In order to manage XML documents, native XML databases use specific encodings that map the hierarchical structure of a document to a flat representation. Several encodings have been proposed that differ in terms of their support for certain query workloads. While some encodings are optimized for query processing, others focus on data manipulation. For example, the Pre/Dist/Size XML encoding has been designed to support queries over all XPath axes efficiently, but processing atomic updates in XML documents can be costly. In this paper, we present a technique, so-called structural bulk updates, that works in concert with the XQuery Update Facility to support efficient updates on the Pre/Dist/Size encoding. We demonstrate the benefits of our technique in a detailed performance evaluation based on the XMark benchmark.
TL;DR: GKS (Generic Keyword Search) enables discovery of deeper insights (DI) in the XML data, found in the context of the search results, thus enabling the navigation of complex XML repositories with ease.
Abstract: Classical XML keyword search based on the Lowest Common Ancestor (LCA) framework requires users to be well versed with data and semantic relationships between the query keywords to extract meaningful response, restricting its applicability. GKS (Generic Keyword Search), on the other hand, allows users to browse and navigate XML data without such constraints. GKS enables discovery of deeper insights (DI) in the XML data, found in the context of the search results. Such insights not only expose patterns hidden in the search results but also help users tune their queries, thus enabling the navigation of complex XML repositories with ease. We further show how, for a search query, different insights can be discovered from the data by varying a single parameter.
TL;DR: HyXAC integrates the two most popular categories of XML access control enforcement mechanisms, and earns the benefits from both, and improves query processing efficiency while optimizes the use of system resources.
Abstract: With the increasing usage of XML on information sharing over the Internet, a mechanism for defining and enforcing XML access control is demanded, such that only authorized entities can access the sets of XML data that they are allowed to. The research interests in these areas have grown significantly in recent years. Various access control enforcement solutions have been proposed, each with its inherent advantages and disadvantages. Yet, there is still no solution that can provide superior performance in all situations. In this paper, we present HyXAC, a hybrid approach to enforce XML access control. HyXAC integrates the two most popular categories of XML access control enforcement mechanisms, and earns the benefits from both. In particular, HyXAC first preprocesses user queries by rewriting queries and removing parts violating access control rules, and evaluates the re-written queries using sub-views, if they are available. In HyXAC, views are not defined on a per-role basis. Instead, a sub-view is defined for each access control rule, and roles sharing identical rules will share sub-views. Moreover, HyXAC dynamically allocates memory and secondary storage resources to materialize and cache sub-views to improve query performance. We have conducted extensive experiments, and the results show that HyXAC improves query processing efficiency while optimizes the use of system resources.
TL;DR: A reversible anonymisation scheme for XML messages that supports fine-grained enforcement of XACML-based privacy policies and supports a shared secret based scheme, where stakeholders need to agree to disclose confidential information.
TL;DR: A new robust technique can be used that will generate the watermark by using a strong technique based on semantic and syntactic rules and will secure the water mark by a strong cryptographic method along with the use of SALT & will make it invisible.
Abstract: Due to growing era of internet and the ease of using it, the copyright protection and authentication of contents has become very important issue. For the same purpose digital watermarking has been used over a long time. This article gives a brief idea about the analysis of different watermarking techniques used for web pages. Both the HTML and XML techniques have been discussed along with their merits and demerits. A new robust technique can be used that will generate the watermark by using a strong technique based on semantic and syntactic rules and will secure the watermark by a strong cryptographic method along with the use of SALT & will make it invisible. The role of CA will determine the authorized author/user of the web content in case of dispute as the created watermark will be registered to CA in a secure fashion. And also this technique can be applied on any web language like HTML, and also for the web services like XML, JASON etc.
TL;DR: In this study, the most cited and the latest model-mapping approaches are reviewed in terms of the description, the technique used and the RDB schema produced using each approach, and a solution to these limitations is proposed.
Abstract: XML has become the dominant standard for data exchange and representation on the Web. The Relational Database RDB possesses is widely used as a storage and retrieval medium in the business field. With the expanding utilization of XML data on the Web, the size of this data type has increased rapidly, and more complicated queries are issued by users through this data. This expansion has prompted numerous researchers to propose various approaches in managing XML data through RDB. In this study, the most cited and the latest model-mapping approaches are reviewed in terms of the description, the technique used and the RDB schema produced using each approach. The limitations of these approaches are discussed, in terms of the storage space and query response time. At the end of this study, a solution to these limitations is proposed. It is hoped that this paper will give some insight into storing XML documents in RDB schema and contribute to the XML community.
TL;DR: The authors present an approach to efficiently compress XML OLAP cubes using a multidimensional snowflake schema of the cube as the basic physical configuration and apply a new compression technique named XCC to the three physical configurations.
Abstract: In this paper, the authors present an approach to efficiently compress XML OLAP cubes. They propose a multidimensional snowflake schema of the cube as the basic physical configuration. The cube is then composed of one XML fact document and as many XML documents as the dimension hierarchy members. The basic configuration is reorganized into two ways by adding data redundancy on purpose in order to achieve a better compression ratio on the one hand and to improve query response time on the other hand. In the second configuration, all the documents of the cube are merged into one single XML document. In the third configuration, each reference between the fact and the dimensions or between the members of a dimension hierarchy is replaced by the whole XML referenced fragments. To the three physical configurations of the cube, the authors apply a new compression technique named XCC. They demonstrate the efficiency of the third configuration before and after compression and they also show the efficiency of their compression technique when applied to XML OLAP cubes.
TL;DR: This paper provides comprehensive comparative analysis of various control schemes for change detection and querying dynamic XML documents.
Abstract: The efficient management of the dynamic XML documents is a complex area of research. The changes and size of the XML documents throughout its lifetime are limitless. Change detection is an important part of version management to identify difference between successive versions of a document. Document content is continuously evolving. Users wanted to be able to query previous versions, query changes in documents, as well as to retrieve a particular document version efficiently. In this paper we provide comprehensive comparative analysis of various control schemes for change detection and querying dynamic XML documents.
TL;DR: The results indicate that the subscriber-centric XML filtering architecture is a viable approach for disseminating semi-structured data streams to the various consuming applications.
Abstract: The vast amounts of data generated in near real-time due to prolific use of sensors, pervasive usage of mobile Internet, and popularity of social media platforms, necessitates the efficient dissemination of the semi-structured streaming data to the consuming applications. Towards this end, we introduce the subscriber-centric XML filtering approach for seamless and efficient XML stream replication/distribution mechanism. The subscriber-centric filtering architecture can be configured to support different topologies in order to support efficient message filtering for a large number of concurrent subscribers. It allows selective filtering on the various nodes that improves efficiency and provides applications with data on a need-to-know basis. Moreover, it supports inter-operability and allows semi-structured streams generated from multiple sources to be filtered. Our XML filtering network consists of decoupled data producers, message transformation agents and XML brokers that can be deployed in conventional data centers as well as in the public cloud environment. We provide detailed performance results of processing filtering queries in several use case scenarios with varying XML message loads and number of nodes involved in the replication/dissemination process. Our results indicate that the subscriber-centric XML filtering architecture is a viable approach for disseminating semi-structured data streams to the various consuming applications.
TL;DR: This paper investigates the problem of processing a large amount of encrypted documents in XML-like formats where a user may wish to search or compute based on certain elements in the XML tree and proposes a solution that makes use of index tables to allow for fast keyword and location queries.
Abstract: The need for privacy-protected searching has garnered increasing interest as industries continue to adopt cloud technologies. Much of the recent efforts have been towards incorporating more advanced searching techniques. Although many have proposed solutions for search and computations in unencrypted data, developing efficient solutions over encrypted documents remains difficult. In this paper, we investigate the problem of processing a large amount of encrypted documents in XML-like formats where a user may wish to search or compute based on certain elements in the XML tree. Our solution makes use of index tables to allow for fast keyword and location queries. To allow computations to be performed on an untrusted server, homomorphic encryption is proposed and used in conjunction with symmetric encryption to reduce computational and storage cost.
TL;DR: A parameter-free prototypical approach to XML partitioning, which projects the XML documents into a space of XML features representing fixed-length sequences of adjacent textual items in the context of root-to-leaf paths, and reveals a higher effectiveness than several state-of-the-art competitors.
Abstract: Conventional approaches to XML clustering by content and structure are generally affected by a limitation due to the adoption of the bag-of-word model for the representation of their textual contents. This choice may lead to consider structure-constrained textual items of separate XML documents as related, even though the actual meaning of such items in their respective contexts is different. To overcome such a limitation, we propose XML clustering by structure-constrained phrases. The latter is a previously unexplored method relying on the more accurate bag-of-phrase model of the XML textual content, with which to better preserve the meaning of the structure-constrained content items for improved clustering effectiveness. In order to conduct an in-depth and systematic study of the effectiveness of the proposed method, we develop a parameter-free prototypical approach to XML partitioning, which projects the XML documents into a space of XML features representing fixed-length sequences of adjacent textual items in the context of root-to-leaf paths. Feature selection without any tunable threshold is used to choose a subset of the XML features on the basis of their relevance to clustering, which is assessed through a new scoring scheme. A comparative experimentation on real-world benchmark XML corpora reveals a higher effectiveness than several state-of-the-art competitors.
TL;DR: The challenges involved in processing XML data in a critical context are explained, the choices in designing a secure XML validator are described, and how features of functional languages were used to enforce security requirements are detailed.
Abstract: While the use of XML is pervading all areas of IT, security challenges arise when XML files are used to transfer security data such as security policies. To tackle this issue, we have developed a lightweight secure XML validator and have chosen to base the development on the strongly typed functional language OCaml. The initial development took place as part of the LaFoSec Study which aimed at investigating the impact of using functional languages for security. We then turned the validator into an industrial application, which was successfully evaluated at EAL4+ level by independent assessors. In this paper, we explain the challenges involved in processing XML data in a critical context, we describe our choices in designing a secure XML validator, and we detail how we used features of functional languages to enforce security requirements.
TL;DR: This paper proposes a visual XQuery specification language called VXQ, which is easier to use and more expressive than previous proposals, and is also suitable for mobile devices where typing is not desired.
Abstract: XML is the standard way of representing and storing rapidly-growing semi-structured data on the Internet. While XQuery has been proposed by W3C as the standard query language for XML data, the complexity of the language is the major overhead for users to express the queries and for software to process the queries efficiently. Considering mobile devices are more popular than desktop computers, expressing and/or processing XQuery becomes even more cumbersome on mobile devices. This paper proposes a visual XQuery specification language called VXQ. By intuitive abstractions of XML and XQuery, the proposed system can generate XQuery queries for users with little knowledge about XML and the language. The proposed visual language is easier to use and more expressive than previous proposals, and is also suitable for mobile devices where typing is not desired. Furthermore, we extend our proposed visual XQuery to support query rewriting and optimization for multiple XQuery systems. Experiments show that, in practice, our query rewriting reduces the query execution time significantly.
TL;DR: An empirical analysis of various parsers DOM, SAX, PULL Parser, VTD, etc for an android based application and a new SRDOM based on structure recurrence is proposed, which shows that SRDOM performance is 9 times faster than DOM in the presence of redundant structure.
Abstract: In the various domains ranging from the web to desktop applications, XML has become the standard format for data representation and transfer. However, wide adoption of XML is mired by inefficient document-parsing methods. An XML parser is a very effective tool which reads an XML document and provides interface for user to access its content and structure and should be an integral part of every application that processes information from XML documents. Parsing is a core operation performed before an XML document can be navigated, queried or manipulated. Parsing is a costly operation that may deteriorate XML processing performance. In this paper we perform an empirical analysis of various parsers DOM, SAX, PULL Parser, VTD, etc for an android based application. In addition we also propose a new SRDOM based on structure recurrence. Evaluation results of our implementation shows that SRDOM performance is 9 times faster than DOM in the presence of redundant structure. The second application of multitasking indicates that the best parser for database is the DOM. We implemented our algorithm and present the performance results, which prove the validity of our approach.
TL;DR: In this article, a JSON call via an Extensible Markup Language (XML) Hypertext Transfer Protocol (HTTP) HTTP object is made against a data warehouse data item stored in a back end server.
Abstract: Aspects provide for automatic verification of JavaScript Object Notation (JSON) data by making a JSON call via an Extensible Markup Language (XML) Hypertext Transfer Protocol (HTTP) HTTP object against a data warehouse data item stored in a back end server. JSON response data returned from the back end server in response to the JSON call is converted into actual XML result data that includes a first plurality of XML statements. A Structured Query Language (SQL) query is executed against the data warehouse data item, and expected XML result data generated in response thereto that include a different (second) plurality of XML statements. The JSON response data returned from the back end server is thereby verified in response to matching the actual XML result data to the expected XML result data.
TL;DR: Four procedures to realize data synchronization between different databases are adopted to get incremental data of source by triggers, convert incremental data into XML file via XML mapping layer, send XML file to destination in a message format and parse data by XML parser.
Abstract: With the rapid development of information technology, lots of enterprises are being faced with resource share and information exchange under global heterogeneous environment. However, the MOM and XML technology can provide a better solution for the realization of information exchange and resource share of distributed heterogeneous environment. In this paper, we adopt four procedures to realize data synchronization between different databases. 1) get incremental data of source by triggers; 2) convert incremental data into XML file via XML mapping layer; 3) send XML file to destination in a message format; 4) parse data by XML parser. Keywords-MOM; XML; heterogeneous environment; information exchange;data synchronization; trigger
TL;DR: This paper surveys different approaches for summarizing XML documents regarding to both its structure and content.
Abstract: eXtensible Markup Language (XML) is one of the standard data representation nowadays. It can be used in various applications as its flexibility and easy to use so the need to summarize XML document become increasingly an important topic to save time and cost. For these reasons, there are more interest for developing tools for summarizing XML Documents. This paper surveys different approaches for summarizing XML documents regarding to both its structure and content.
TL;DR: This paper proposes a generator which enables to generate synthetic XML data with regards to a given set of XML queries expressed in XPath, and enables versatility of the output, its applicability for a particular situation, and, at the same time, simplicity of input parameters.
Abstract: Currently there exists a number of tools for generating synthetic XML data. But in this paper we approach this problem from a completely new direction. We propose a generator which enables to generate synthetic XML data with regards to a given set of XML queries expressed in XPath. Thus we enable versatility of the output, its applicability for a particular situation, and, at the same time, simplicity of input parameters. We have implemented a prototype of the generator, further optimized its performance, and experimentally demonstrated its properties.
TL;DR: This paper uses the technique of storing XML Schema versions in a relational database where the detection and storage of delta changes are employed on relational tables and provides a more meaningful description of the detected changes.
Abstract: Detecting changes in XML data has emerged as an important research issue in the last decade, but the majority of change detection algorithms focus on XML documents rather than on their schemas because documents that contain data are deemed more significant than the schema itself. However, the XML schema change detection tool is essential, especially in situations where we need to maintain related XML documents with evolving schema, sustain relational schema generated by schema-conscious approach for storing XML data and provide support for XML versioning. This paper focuses on XML Schema XSD changes and provides a more meaningful description of the detected changes. Our proposed algorithm XS-Diff uses the technique of storing XML Schema versions in a relational database where the detection and storage of delta changes are employed on relational tables. We demonstrate the correctness of the proposed algorithm through both synthetic and real data sets without deteriorating the execution time.
TL;DR: This paper proposes a labelling scheme which supports the dynamic update without relabelling the existing nodes and determines the structural relationships efficiently by looking at the labels.
Abstract: The increasing number of XML documents over the internet motivated us to develop indexing techniques to retrieve the XML data efficiently. Assigning unique labels to each node and determining the structural relationships is a critical problem in XML query processing. Labelling schemes designed for static XML documents will not support dynamic updates on XML documents. Some dynamic labelling schemes provide dynamic updates but, with a high cost and complexity. In this paper we propose a labelling scheme which supports the dynamic update without relabelling the existing nodes. It also determines the structural relationships efficiently by looking at the labels. A set of performance tests is carried to compute the time required to generate unique labels.
TL;DR: This paper proposes optimistic approach to Re-cluster multi-version XML documents which change in time by reassessing distance between them by using knowledge from initial clustering solution and changes stored in compressed delta.
Abstract: Today with Standardization of XML as an information exchange over web, huge amount of information is formatted in the XML document. XML documents are huge in size. The amount of information that has to be transmitted, processed, stored, and queried is often larger than that of other data formats. Also in real world applications XML documents are dynamic in nature. The versatile applicability of XML documents in different fields of information maintenance and management is increasing the demand to store different versions of XML documents with time. However, storage of all versions of an XML document may introduce the redundancy. Self describing nature of XML creates the problem of verbosity, in result documents are in huge size. This paper proposes optimistic approach to Re-cluster multi-version XML documents which change in time by reassessing distance between them by using knowledge from initial clustering solution and changes stored in compressed delta. Evolving size of XML document is reduced by applying homomorphic compression before clustering them which retains its original structure. Compressed delta stores the changes responsible for document versions, without decompressing them. Test results shows that our approach performs much better than using full pair-wise document comparison.
TL;DR: Results of time efficiency show that this replacement strategy based on the semantic cache contribution value supports environment of the XML algebra query and it has better time efficiency than both least frequency used (LFU) and least recently used (LRU).
Abstract: Aiming at the fact that traditional cache replacement strategy lacks pertinence to the semantic cache in the process of extensible markup language (XML) algebra query, a replacement strategy based on the semantic cache contribution value is proposed. First, pattern matching rules for XML algebra query and semantic caches are given. Second, the method of calculating the semantic cache contribution value is proposed. In XML documents with four different sizes, the experimental results of time efficiency show that this strategy supports environment of the XML algebra query and it has better time efficiency than both least frequency used (LFU) and least recently used (LRU).