TL;DR: A classification of the approaches used for the conversion of XML documents into OWL ontologies is provided, providing a clear description of the advantages and drawbacks belonging to each method.
Abstract: The aims of XML data conversion to ontologies are the indexing, integration and enrichment of existing ontologies with knowledge acquired from these sources. The contribution of this paper consists in providing a classification of the approaches used for the conversion of XML documents into OWL ontologies. This classification underlines the usage profile of each conversion method, providing a clear description of the advantages and drawbacks belonging to each method. Hence, this paper focuses on two main processes, which are ontology enrichment and ontology population using XML data. Ontology enrichment is related to the schema of the ontology TBox, and ontology population is related to an individual Abox. In addition, the ontologies described in these methods are based on formal languages of the Semantic Web such as OWL Ontology Web Language or RDF Resource Description Framework. These languages are formal because the semantics are formally defined and take advantage of the Description Logics. In contrast, XML data sources are without formal semantics. The XML language is used to store, export and share data between processes able to process the specific data structure. However, even if the semantics is not explicitly expressed, data structure contains the universe of discourse by using a qualified vocabulary regarding a consensual agreement. In order to formalize this semantics, the OWL language provides rich logical constraints. Therefore, these logical constraints are evolved in the transformation of XML documents into OWL documents, allowing the enrichment and the population of the target ontology. To design such a transformation, the current research field establishes connections between OWL constructs classes, predicates, simple or complex data types, etc. and XML constructs elements, attributes, element lists, etc.. Two different approaches for the transformation process are exposed. The instance approaches are based on XML documents without any schema associated. The validation approaches are based on the XML schema and document validated by the associated schema. The second approaches benefit from the schema definition to provide automated transformations with logic constraints. Both approaches are discussed in the text.
TL;DR: The SPARQL2XQuery Framework as discussed by the authors provides a mapping model for the expression of OWL---RDF/S to XML Schema mappings as well as a method for SPARql to XQuery translation.
Abstract: In the context of the emergent Web of Data, a large number of organizations, institutes and companies (e.g., DBpedia, Data.gov, GeoNames, PubMed) adopt the Linked Data practices. Utilizing the Semantic Web (SW) technologies, they publish their data and offer SPARQL endpoints (i.e., SPARQL-based search services). On the other hand, the dominant standard for information exchange in the Web today is XML. Additionally, many international standards (e.g., Dublin Core, MPEG-7, METS, TEI, IEEE LOM) in several domains (e.g., Digital Libraries, GIS, Multimedia, e-Learning) have been expressed in XML Schema. The aforementioned have led to an increasing emphasis on XML data, accessed using the XQuery query language. The SW and XML worlds and their developed infrastructures are based on different data models, semantics and query languages. Thus, it is crucial to develop interoperability mechanisms that allow the Web of Data users to access XML datasets, using SPARQL, from their own working environments. It is unrealistic to expect that all the existing legacy data (e.g., Relational, XML, etc.) will be transformed into SW data. Therefore, publishing legacy data as Linked Data and providing SPARQL endpoints over them has become a major research challenge. In this direction, we introduce the SPARQL2XQuery Framework which creates an interoperable environment, where SPARQL queries are automatically translated to XQuery queries, in order to access XML data across the Web. The SPARQL2XQuery Framework provides a mapping model for the expression of OWL---RDF/S to XML Schema mappings as well as a method for SPARQL to XQuery translation. To this end, our Framework supports both manual and automatic mapping specification between ontologies and XML Schemas. In the automatic mapping specification scenario, the SPARQL2XQuery exploits the XS2OWL component which transforms XML Schemas into OWL ontologies. Finally, extensive experiments have been conducted in order to evaluate the schema transformation, mapping generation, query translation and query evaluation efficiency, using both real and synthetic datasets.
TL;DR: The GroupBased labelling scheme proposed in this thesis has a high performance in processing dynamic XML data updates, its uniform behaviour irrespective of whether the document is static or dynamic, ability to determine all structural relationships between nodes, and the improved query performance in both types of document.
Abstract: Documents that comply with the XML standard are characterised by inherent ordering and their modelling usually takes the form of a tree. Nowadays, applications generate massive amounts of XML data, which requires accurate and efficient query-able XML database systems. XML querying depends on XML labelling in much the same way as relational databases rely on indexes. Document order and structural information are encoded by labelling schemes, thus facilitating their use by queries without having to access the original XML document. Dynamic XML data, data which changes, complicates the labelling scheme. As demonstrated by much research efforts, it is difficult to allocate unique labels to nodes in a dynamic XML tree so that all structural relationships between the nodes are encoded by the labels.
Static XML documents are generally managed with labelling schemes that use simple labels. By contrast, dynamic labelling schemes have extra labelling costs and lower query performance to allow random updates irrespective of the document update frequency. Given that static and dynamic XML documents are often not clearly distinguished, a labelling scheme whose efficiency does not depend on updating frequency would be useful.
The GroupBased labelling scheme proposed in this thesis is compatible with static as well as dynamic XML documents. In particular, this scheme has a high performance in processing dynamic XML data updates. What differentiates it from other dynamic labelling schemes is its uniform behaviour irrespective of whether the document is static or dynamic, ability to determine all structural relationships between nodes, and the improved query performance in both types of document. The advantages of the GroupBased scheme in comparison to earlier schemes are highlighted by the experiment results.
TL;DR: The goal of BonXai is to provide a simpler DTD-like alternative to schema designers that do not need the explicit use of types, and can be seen as a practical front-end for XML Schema.
Abstract: While the migration from DTD to XML Schema was driven by a need for increased expressivity and flexibility, the latter was also significantly more complex to use and understand. Whereas DTDs are characterized by their simplicity, XML Schema Definitions (XSDs) are notoriously difficult. In this paper, we introduce the XML specification language BonXai which possesses most features of XSDs, including its expressivity, while retaining the simplicity of DTDs. In brief, the latter is achieved by sacrificing the explicit use of types in favor of simple patterns expressing contexts for elements. The goal of BonXai is by no means to replace XML Schema, but rather to provide a simpler DTD-like alternative to schema designers that do not need the explicit use of types. Therefore, BonXai can be seen as a practical front-end for XML Schema. A particular strong point of BonXai is its solid foundation rooted in a decade of theoretical work around pattern-based schemas. We present in detail the formal model for BonXai and discuss translation algorithms to and from XML Schema.
TL;DR: The results show that the approach enables bridging XMLware with modelware and grammarware in several ways going beyond existing approaches and allows the automated generation of editors that are at least equivalent to editors manually built for XML-based languages.
Abstract: A multitude of Domain-Specific Languages (DSLs) have been implemented with XML Schemas. While such DSLs are well adopted and flexible, they miss modern DSL editor functionality. Moreover, since XML is primarily designed as a machine-processible format, artifacts defined with XML-based DSLs lack comprehensibility and, therefore, maintainability. In order to tackle these shortcomings, we propose a bridge between the XML Schema Definition (XSD) language and text-based metamodeling languages. This bridge exploits existing seams between the technical spaces XMLware, modelware, and grammarware as well as closes identified gaps. The resulting approach is able to generate Xtext-based editors from XSDs providing powerful editor functionality, customization options for the textual concrete syntax style, and round-trip transformations enabling the exchange of data between the involved technical spaces. We evaluate our approach by a case study on TOSCA, which is an XML-based standard for defining Cloud deployments. The results show that our approach enables bridging XMLware with modelware and grammarware in several ways going beyond existing approaches and allows the automated generation of editors that are at least equivalent to editors manually built for XML-based languages.
TL;DR: The authors may not be able to make you love reading, but xquery search across a variety of xml data will lead you to love reading starting from now.
Abstract: We may not be able to make you love reading, but xquery search across a variety of xml data will lead you to love reading starting from now. Book is the window to open the new world. The world that you want is in the better stage and level. World will always guide you to even the prestige stage of the life. You know, this is some of how reading will give you the kindness. In this case, more books you read more knowledge you know, but it can mean also the bore is full.
TL;DR: This work introduces an XML to RDF transformation approach, which is based on mappings comprising RDF triple templates that employ simple XPath expressions and shows that the time complexity of the mapping algorithm is linear in the size of the XML input and proves its practical efficiency with an evaluation on large real-world data.
Abstract: The Extensible Markup Language (XML) has become a widely adopted data interchange format. With the rise of Linked Data published using the Resource Description Framework (RDF), a number of tools for transforming XML to RDF have been developed. Specifying XML→RDF mappings for these tools often requires skills in programming languages such as XSLT or XQuery. Moreover, these tools are rarely able to deal with large XML inputs. We introduce an XML to RDF transformation approach, which is based on mappings comprising RDF triple templates that employ simple XPath expressions. Thanks to the restricted XPath expressions, which can be evaluated against a stream of XML data, our implementation can handle extremely large input XML files. To process the XML input efficiently, we employ XML filtering techniques and a strategy for selecting relevant XML nodes to generate RDF triples from. We show that the time complexity of our mapping algorithm is linear in the size of the XML input and also prove its practical efficiency with an evaluation on large real-world data.
TL;DR: A new approach to store and query an XML document using relational databases is presented, which decompose anxml document into three tables without using any XML schema or DTD, and achieves lower storage consumption.
Abstract: Due to Its simplicity, its flexibility and its expansion possibilities, XML can be adapted to multiple domains. Its self-described structure and nesting, allows XML to become the dominant standard for storing and transferring data through the World Wide Web. In addition the relational database systems are mature and extremely powerful. Therefore, many researches have been done to propose an efficient approach to store and query an XML document in Relational Database. In this paper we present a new approach to store and query an XML document using relational databases, which decompose an XML document into three tables without using any XML schema or DTD. Our approach supports efficiently the structural modifications to the XML tree, and achieves lower storage consumption. Also, we propose two powerful algorithms for mapping XML data to relational databases and from relational database to XML data.
TL;DR: This paper systematically analyze the chosenciphertext attacks on XML Encryption and design an algorithm to perform a vulnerability scan on arbitrary encrypted XML messages and automatically detect a vulnerability and exploit it to retrieve the plaintext of a message protected by XML Enc encryption.
Abstract: In the recent years, XML Encryption became a target of several new attacks [18, 17, 16]. These attacks belong to the family of adaptive chosen-ciphertext attacks, and allow an adversary to decrypt symmetric and asymmetric XML ciphertexts, without knowing the secret keys. In order to protect XML Encryption implementations, the World Wide Web Consortium (W3C) published an updated version of the standard.
Unfortunately, most of the current XML Encryption implementations do not support the newest XML Encryption specification and offer different XML Security configurations to protect confidentiality of the exchanged messages. Resulting from the attack complexity, evaluation of the security configuration correctness becomes tedious and error prone. Validation of the applied countermeasures can typically be made with numerous XML messages provoking incorrect behavior by decrypting XML content. Up to now, this validation was only manually possible.
In this paper, we systematically analyze the chosenciphertext attacks on XML Encryption and design an algorithm to perform a vulnerability scan on arbitrary encrypted XML messages. The algorithm can automatically detect a vulnerability and exploit it to retrieve the plaintext of a message protected by XML Encryption. To assess practicability of our approach, we implemented an open source attack plugin for Web Service attacking tool called WS-Attacker. With the plugin, we discovered new security problems in four out of five analyzed Web Service implementations, including IBM Datapower or Apache CXF.
TL;DR: This paper proposes an original method for measuring the structural similarity between an XML document and an XML grammar (DTD or XSD), considering their most common operators that designate constraints on the existence, repeatability and alternativeness of XML elements/attributes.
TL;DR: Through XML the exchange of data during application and documentation processes, such as agricultural supports, traceability, and quality assurance are simplified and automated, so efficiency of data processing is improved for all involved in agricultural production.
Abstract: At present, there are large quantities of domain specific agriculture databases are available and farmers are accessing these data, in addition many agricultural information systems are also exchanging data; It could be broadly used if databases and computer programs were friendly-user and accessible through Internet in any platform. The Documents XML (extensible markup language) are being widely used for the storage and exchange of data and XSLT documents (extensible stylesheet language transformation) are being developed for transforming XML documents. Through XML the exchange of data during application and documentation processes, such as agricultural supports, traceability, and quality assurance are simplified and automated, so once entered into the system, is available for all data exchanges and efficiency of data processing is improved for all involved in agricultural production.
TL;DR: A set of closely related C++ software tools for manipulating XML schemas and XML instance files and translating them into OWL (Web Ontology Language) class files and OWL instance files is described.
Abstract: This paper describes a set of closely related C++ software tools for manipulating XML (eXtensible Markup Language) schemas and XML instance files and translating them into OWL (Web Ontology Language) class files and OWL instance files. They include: (1) an XML schema parser, (2) an XML instance file parser generator, (3) the instance file parsers generated by the XML instance file parser generator, (4) an XML schema to OWL class generator, (5) a domain instance XML to OWL translator generator, and (6) the domain in stance XML to OWL translators generated by the domain instance XML to OWL translator generator. These tools have been applied to information models for kitting environments and kitting plans. The main focus is on the last three tools, which differ significantly from existing resources. The paper also discusses differences between OWL and XML schema that make translation difficult, and how the tools overcome the difficulties. The tools were built at the National Institute of Standards and Technology in support of the Agility Performance of Robotic Systems.
TL;DR: This paper sets up a framework based on continuous integration and automated deployment in order to perform conversions between XML formats and from XML to RDF, and discusses the benefits and how this framework contributes to improve both data quality and development.
Abstract: At Oxford University Press we build large amounts of XML and RDF data as it were software. However, established software development techniques like continuous integration, unit testing, and automated deployment are not always applied when converting XML and RDF since these formats are treated as data rather than software. In this paper we describe how we set up a framework based on continuous integration and automated deployment in order to perform conversions between XML formats and from XML to RDF. We discuss the benefits of this approach as well as how this framework contributes to improve both data quality and development.
TL;DR: This paper presents a technique, so-called structural bulk updates, that works in concert with the XQuery Update Facility to support efficient updates on the Pre/Dist/Size encoding and demonstrates the benefits in a detailed performance evaluation based on the XMark benchmark.
Abstract: In order to manage XML documents, native XML databases use specific encodings that map the hierarchical structure of a document to a flat representation. Several encodings have been proposed that differ in terms of their support for certain query workloads. While some encodings are optimized for query processing, others focus on data manipulation. For example, the Pre/Dist/Size XML encoding has been designed to support queries over all XPath axes efficiently, but processing atomic updates in XML documents can be costly. In this paper, we present a technique, so-called structural bulk updates, that works in concert with the XQuery Update Facility to support efficient updates on the Pre/Dist/Size encoding. We demonstrate the benefits of our technique in a detailed performance evaluation based on the XMark benchmark.
TL;DR: XForms, a language for describing interfaces to data, was originally designed for improving the handling of forms on the web, but has since been generalised to more general applications; version 2.0 is currently in preparation.
Abstract: htmlabstractXForms is a language for describing interfaces to data, designed at W3C by researchers from industry and academia. It is a declarative language, meaning it describes what has to be done, but largely not how. The interface it describes does not have to run locally on the machine producing the data, but can be run remotely over the network. Since Internet of Things (IoT) computers typically have little memory and are low-powered, this makes XForms ideally suited for the task.
One of the unexpected successes of HTML was its adoption for controlling devices with embedded computers, such as home Wi-Fi routers. To make an adjustment to such a device, the user directs the browser to the IP address from which it is running and a small web server on the device serves up web pages that allow the user to fill in and submit values to change the working of the device.
However, the tiny embedded computers that form part of the IoT typically have memory in kilobytes, not megabytes, and lack the power to run a web server that can serve and interpret web pages. This calls for a different approach.
One approach is for the devices to serve up only the data of the parameters, so that those values can then be injected into an interface served from elsewhere. XForms [1], a standard that we have helped develop at W3C, is designed for exactly this type of scenario: although it is a technology originally designed for improving the handling of forms on the web, it has since been generalised to more general applications; version 2.0 is currently in preparation [2].
TL;DR: GKS (Generic Keyword Search) enables discovery of deeper insights (DI) in the XML data, found in the context of the search results, thus enabling the navigation of complex XML repositories with ease.
Abstract: Classical XML keyword search based on the Lowest Common Ancestor (LCA) framework requires users to be well versed with data and semantic relationships between the query keywords to extract meaningful response, restricting its applicability. GKS (Generic Keyword Search), on the other hand, allows users to browse and navigate XML data without such constraints. GKS enables discovery of deeper insights (DI) in the XML data, found in the context of the search results. Such insights not only expose patterns hidden in the search results but also help users tune their queries, thus enabling the navigation of complex XML repositories with ease. We further show how, for a search query, different insights can be discovered from the data by varying a single parameter.
TL;DR: An enhanced Tree based Association Rule (TAR) is used to retrieve the results for the queries posed by the users on the XML document and a method to dynamically update the TAR files when the dataset changes is proposed.
Abstract: The database research field has concentrated on the Extensible Markup Language (XML) due to its flexible hierarchical nature which can use to represent huge amounts of data. XMLdoesn’t have a fixed schema, while having a possibly irregular and incomplete structure. In this recent world, we have seen digital information available on the web like e-business transaction, e -shopping, e-learning etc., These XML documents are enormous and so the datasets returned an answer query is also huge, which in turn is a complicated process for the retrieval of interpretable knowledge. In this paper, we use an enhanced Tree based Association Rule (TAR) to retrieve the results for the queries posed by the users on the XML document. TAR files provide intensional information on the structure and content of XML document. In addition we propose a method to dynamically update the TAR files when the dataset changes. KeywordsXML, Query-answering, Data mining, Intensional information, Tree-based association
TL;DR: HyXAC integrates the two most popular categories of XML access control enforcement mechanisms, and earns the benefits from both, and improves query processing efficiency while optimizes the use of system resources.
Abstract: With the increasing usage of XML on information sharing over the Internet, a mechanism for defining and enforcing XML access control is demanded, such that only authorized entities can access the sets of XML data that they are allowed to. The research interests in these areas have grown significantly in recent years. Various access control enforcement solutions have been proposed, each with its inherent advantages and disadvantages. Yet, there is still no solution that can provide superior performance in all situations. In this paper, we present HyXAC, a hybrid approach to enforce XML access control. HyXAC integrates the two most popular categories of XML access control enforcement mechanisms, and earns the benefits from both. In particular, HyXAC first preprocesses user queries by rewriting queries and removing parts violating access control rules, and evaluates the re-written queries using sub-views, if they are available. In HyXAC, views are not defined on a per-role basis. Instead, a sub-view is defined for each access control rule, and roles sharing identical rules will share sub-views. Moreover, HyXAC dynamically allocates memory and secondary storage resources to materialize and cache sub-views to improve query performance. We have conducted extensive experiments, and the results show that HyXAC improves query processing efficiency while optimizes the use of system resources.
TL;DR: In this study, the most cited and the latest model-mapping approaches are reviewed in terms of the description, the technique used and the RDB schema produced using each approach, and a solution to these limitations is proposed.
Abstract: XML has become the dominant standard for data exchange and representation on the Web. The Relational Database RDB possesses is widely used as a storage and retrieval medium in the business field. With the expanding utilization of XML data on the Web, the size of this data type has increased rapidly, and more complicated queries are issued by users through this data. This expansion has prompted numerous researchers to propose various approaches in managing XML data through RDB. In this study, the most cited and the latest model-mapping approaches are reviewed in terms of the description, the technique used and the RDB schema produced using each approach. The limitations of these approaches are discussed, in terms of the storage space and query response time. At the end of this study, a solution to these limitations is proposed. It is hoped that this paper will give some insight into storing XML documents in RDB schema and contribute to the XML community.
TL;DR: This paper provides comprehensive comparative analysis of various control schemes for change detection and querying dynamic XML documents.
Abstract: The efficient management of the dynamic XML documents is a complex area of research. The changes and size of the XML documents throughout its lifetime are limitless. Change detection is an important part of version management to identify difference between successive versions of a document. Document content is continuously evolving. Users wanted to be able to query previous versions, query changes in documents, as well as to retrieve a particular document version efficiently. In this paper we provide comprehensive comparative analysis of various control schemes for change detection and querying dynamic XML documents.
TL;DR: The results indicate that the subscriber-centric XML filtering architecture is a viable approach for disseminating semi-structured data streams to the various consuming applications.
Abstract: The vast amounts of data generated in near real-time due to prolific use of sensors, pervasive usage of mobile Internet, and popularity of social media platforms, necessitates the efficient dissemination of the semi-structured streaming data to the consuming applications. Towards this end, we introduce the subscriber-centric XML filtering approach for seamless and efficient XML stream replication/distribution mechanism. The subscriber-centric filtering architecture can be configured to support different topologies in order to support efficient message filtering for a large number of concurrent subscribers. It allows selective filtering on the various nodes that improves efficiency and provides applications with data on a need-to-know basis. Moreover, it supports inter-operability and allows semi-structured streams generated from multiple sources to be filtered. Our XML filtering network consists of decoupled data producers, message transformation agents and XML brokers that can be deployed in conventional data centers as well as in the public cloud environment. We provide detailed performance results of processing filtering queries in several use case scenarios with varying XML message loads and number of nodes involved in the replication/dissemination process. Our results indicate that the subscriber-centric XML filtering architecture is a viable approach for disseminating semi-structured data streams to the various consuming applications.
TL;DR: This paper investigates the problem of processing a large amount of encrypted documents in XML-like formats where a user may wish to search or compute based on certain elements in the XML tree and proposes a solution that makes use of index tables to allow for fast keyword and location queries.
Abstract: The need for privacy-protected searching has garnered increasing interest as industries continue to adopt cloud technologies. Much of the recent efforts have been towards incorporating more advanced searching techniques. Although many have proposed solutions for search and computations in unencrypted data, developing efficient solutions over encrypted documents remains difficult. In this paper, we investigate the problem of processing a large amount of encrypted documents in XML-like formats where a user may wish to search or compute based on certain elements in the XML tree. Our solution makes use of index tables to allow for fast keyword and location queries. To allow computations to be performed on an untrusted server, homomorphic encryption is proposed and used in conjunction with symmetric encryption to reduce computational and storage cost.
TL;DR: A parameter-free prototypical approach to XML partitioning, which projects the XML documents into a space of XML features representing fixed-length sequences of adjacent textual items in the context of root-to-leaf paths, and reveals a higher effectiveness than several state-of-the-art competitors.
Abstract: Conventional approaches to XML clustering by content and structure are generally affected by a limitation due to the adoption of the bag-of-word model for the representation of their textual contents. This choice may lead to consider structure-constrained textual items of separate XML documents as related, even though the actual meaning of such items in their respective contexts is different. To overcome such a limitation, we propose XML clustering by structure-constrained phrases. The latter is a previously unexplored method relying on the more accurate bag-of-phrase model of the XML textual content, with which to better preserve the meaning of the structure-constrained content items for improved clustering effectiveness. In order to conduct an in-depth and systematic study of the effectiveness of the proposed method, we develop a parameter-free prototypical approach to XML partitioning, which projects the XML documents into a space of XML features representing fixed-length sequences of adjacent textual items in the context of root-to-leaf paths. Feature selection without any tunable threshold is used to choose a subset of the XML features on the basis of their relevance to clustering, which is assessed through a new scoring scheme. A comparative experimentation on real-world benchmark XML corpora reveals a higher effectiveness than several state-of-the-art competitors.
TL;DR: The challenges involved in processing XML data in a critical context are explained, the choices in designing a secure XML validator are described, and how features of functional languages were used to enforce security requirements are detailed.
Abstract: While the use of XML is pervading all areas of IT, security challenges arise when XML files are used to transfer security data such as security policies. To tackle this issue, we have developed a lightweight secure XML validator and have chosen to base the development on the strongly typed functional language OCaml. The initial development took place as part of the LaFoSec Study which aimed at investigating the impact of using functional languages for security. We then turned the validator into an industrial application, which was successfully evaluated at EAL4+ level by independent assessors. In this paper, we explain the challenges involved in processing XML data in a critical context, we describe our choices in designing a secure XML validator, and we detail how we used features of functional languages to enforce security requirements.
TL;DR: This paper proposes a visual XQuery specification language called VXQ, which is easier to use and more expressive than previous proposals, and is also suitable for mobile devices where typing is not desired.
Abstract: XML is the standard way of representing and storing rapidly-growing semi-structured data on the Internet. While XQuery has been proposed by W3C as the standard query language for XML data, the complexity of the language is the major overhead for users to express the queries and for software to process the queries efficiently. Considering mobile devices are more popular than desktop computers, expressing and/or processing XQuery becomes even more cumbersome on mobile devices. This paper proposes a visual XQuery specification language called VXQ. By intuitive abstractions of XML and XQuery, the proposed system can generate XQuery queries for users with little knowledge about XML and the language. The proposed visual language is easier to use and more expressive than previous proposals, and is also suitable for mobile devices where typing is not desired. Furthermore, we extend our proposed visual XQuery to support query rewriting and optimization for multiple XQuery systems. Experiments show that, in practice, our query rewriting reduces the query execution time significantly.
TL;DR: A kind of parallel XML keyword search algorithm is proposed and realized on a MapReduce programming model and the results show that the proposed algorithm is applicable to keyword search of massive XML data.
Abstract: Keyword search for smallest lowest common ancestors (SLCAs) is an important approach to identify interesting data nodes in XML documents. With the rapid growth of XML data in Internet, how to effectively process massive XML data becomes an interesting topic. As an open-source cloud computing platform developed in recent years, Hadoop is a trend to process large-scale data, which makes possible massive storage and efficient search of XML data. In this paper, we first present two properties to improve the classical ILE algorithm. Then, a kind of parallel XML keyword search algorithm is proposed and realized on a MapReduce programming model. Two experiments on 4 datasets of different sizes in cluster are performed. The results show that our proposed algorithm is applicable to keyword search of massive XML data.
TL;DR: An empirical analysis of various parsers DOM, SAX, PULL Parser, VTD, etc for an android based application and a new SRDOM based on structure recurrence is proposed, which shows that SRDOM performance is 9 times faster than DOM in the presence of redundant structure.
Abstract: In the various domains ranging from the web to desktop applications, XML has become the standard format for data representation and transfer. However, wide adoption of XML is mired by inefficient document-parsing methods. An XML parser is a very effective tool which reads an XML document and provides interface for user to access its content and structure and should be an integral part of every application that processes information from XML documents. Parsing is a core operation performed before an XML document can be navigated, queried or manipulated. Parsing is a costly operation that may deteriorate XML processing performance. In this paper we perform an empirical analysis of various parsers DOM, SAX, PULL Parser, VTD, etc for an android based application. In addition we also propose a new SRDOM based on structure recurrence. Evaluation results of our implementation shows that SRDOM performance is 9 times faster than DOM in the presence of redundant structure. The second application of multitasking indicates that the best parser for database is the DOM. We implemented our algorithm and present the performance results, which prove the validity of our approach.
TL;DR: In this article, a JSON call via an Extensible Markup Language (XML) Hypertext Transfer Protocol (HTTP) HTTP object is made against a data warehouse data item stored in a back end server.
Abstract: Aspects provide for automatic verification of JavaScript Object Notation (JSON) data by making a JSON call via an Extensible Markup Language (XML) Hypertext Transfer Protocol (HTTP) HTTP object against a data warehouse data item stored in a back end server. JSON response data returned from the back end server in response to the JSON call is converted into actual XML result data that includes a first plurality of XML statements. A Structured Query Language (SQL) query is executed against the data warehouse data item, and expected XML result data generated in response thereto that include a different (second) plurality of XML statements. The JSON response data returned from the back end server is thereby verified in response to matching the actual XML result data to the expected XML result data.
TL;DR: A dynamic prefix encoding scheme based on fraction (DPESF), which uses the unlimited extensibility of fraction to implement XML document dynamic updating without the second encoding in according with retaining the excellent characteristics of Dewey encoding is proposed.
Abstract: In order to improve the efficiency of XML document query and support XML document dynamic update, etc, this paper proposes a dynamic prefix encoding scheme based on fraction (DPESF), which uses the unlimited extensibility of fraction to implement XML document dynamic updating without the second encoding in according with retaining the excellent characteristics of Dewey encoding. Finally, this paper implements the relevant experiment, the experiment results show that the DPESF encoding has better time and space performance compared to the existing the dynamic prefix encoding schemes.