TL;DR: This paper compares twelve libraries of object serialization from qualitative and quantitative aspects to show that there is no best solution and each library makes good in the context it was developed.
Abstract: This paper compares twelve libraries of object serialization from qualitative and quantitative aspects. Those are object serialization in XML, JSON and binary formats. Using each library, a common example is serialized to a file. The size of the serialized file and the processing time are measured during the execution to compare all object serialization libraries. Some libraries show the performance penalty. But it is clear that there is no best solution. Each library makes good in the context it was developed.
TL;DR: In this paper, a computer-implemented method includes obtaining an XML document template object in which a subset of fields of the XML document is designated by placeholders, and processing these fields in an instance of XML documents.
Abstract: A computer-implemented method includes obtaining an XML document template object in which a subset of fields of the XML document is designated by placeholders. The XML document template object is prepared based on a prior instance of the XML document. The method further involves processing the subset of fields in an instance of the XML document that are designated by placeholders in XML document template object.
TL;DR: This paper introduces a technique based on the principles of Model-Driven Development that ensures semi-automatic coherent propagation to all affected XML schemas (and vice versa) and provides a formal model of possible evolution changes and their propagation mechanism.
TL;DR: An approach based on Tree-Based Association Rules (TARs): mined rules, which provide approximate, intensional information on both the structure and the contents of Extensible Markup Language (XML) documents, and can be stored in XML format as well.
Abstract: Extracting information from semistructured documents is a very hard task, and is going to become more and more critical as the amount of digital information available on the Internet grows. Indeed, documents are often so large that the data set returned as answer to a query may be too big to convey interpretable knowledge. In this paper, we describe an approach based on Tree-Based Association Rules (TARs): mined rules, which provide approximate, intensional information on both the structure and the contents of Extensible Markup Language (XML) documents, and can be stored in XML format as well. This mined knowledge is later used to provide: 1) a concise idea-the gist-of both the structure and the content of the XML document and 2) quick, approximate answers to queries. In this paper, we focus on the second feature. A prototype system and experimental results demonstrate the effectiveness of the approach.
TL;DR: An extensible framework based on the concept of tree edit distance as an optimal technique to consider XML structure is proposed, integrating different matching criteria to capture all basic XML grammar characteristics, ranging over element semantic and syntactic similarities, cardinality and alternativeness constraints, as well as data-type correspondences and relative ordering.
TL;DR: SMPTE-TT XML files are used to embed into the ID3 tag with user defined languages and text information stored in multiple frames to achieve the subtitle feature for different languages in HTTP Live Streaming output streams.
Abstract: This document describes how the subtitle feature for different
languages can be achievable in HTTP Live Streaming output streams. In
order to achieve the goal, SMPTE-TT XML files are used to embed into
the ID3 tag with user defined languages and text information stored in
multiple frames.
TL;DR: This study presents a mechanism to ease the interpretation and automate the semantic transformation of XML healthcare data into the OWL ontology (S-Trans), which allows an easier and better semantic communication among hospital information systems.
Abstract: Most healthcare data are available in XML format, which mainly focuses on the structure level and lacks support for data representation. Therefore, a variety of medical applications and medical semantic search engines have difficulty understanding and integrating healthcare data in a highly heterogeneous environment. OWL (Web Ontology Language) and Semantic Web technologies provide an infrastructure that can solve these problems. The aim of our study is to present a mechanism to ease the interpretation and automate the semantic transformation of XML healthcare data into the OWL ontology (S-Trans), which allows an easier and better semantic communication among hospital information systems. On the basis of the XML schemas (XSD or DTD), we extract the document structure and add more descriptions for XML elements. Moreover, to classify the semantic level of duplicate elements in an XML schema, we propose novel metrics to measure the similarity between them. Experimental results show that the proposed method reliably predicts semantic similarity of duplicates and produces a better-quality OWL ontology.
TL;DR: Experimental results indicate that s-XML is robust in terms of database storage and data loading, and is able to support large and skew-structured dataset as compared to relational DTD, Attribute and Edge approaches.
Abstract: XML has recently emerged as the leading medium for data storage and data transfer over the World Wide Web due to its adaptable structure and flexibility in defining the tags. Many organizations had adopted XML as the principal facet in their online business applications. On the other hand, relational database is still widely used as the back-end database in most organizations. The diversity of these models need to be taken into account to ensure transparent and seamless integration. In this paper, we propose s-XML, an effective mapping scheme to bridge XML and relational database. Experimental results indicate that (1) s-XML is robust in terms of database storage and data loading; (2) s-XML processes query efficiently for complex chain and twig queries; and (3) s-XML is able to support large and skew-structured dataset as compared to relational DTD, Attribute and Edge approaches.
TL;DR: The experimental results show that NCIM is suitable for cloud computing environment, and the potential applications of NCIM to the fast query processing of enormous Internet documents are highlighted.
Abstract: With the increasing of data at an incredible rate, the development of cloud computing technologies is of critical importance to the advances of researches. The Apache Hadoop has become a widely used open source cloud computing framework that provides a distributed file system for large scale data processing. In this paper, we present a cloud computing implementation of an XML indexing method called NCIM (Node Clustering Indexing Method), which was developed by our research team, for indexing and querying a large number of big XML documents using MapReduce. The experimental results show that NCIM is suitable for cloud computing environment. The throughput of 1200 queries per second for huge amount of queries using a 15-node cluster signifies the potential applications of NCIM to the fast query processing of enormous Internet documents.
TL;DR: The authors show that XML-less EXI is highly efficient in RAM usage regardless of the size of an EXI stream and more compact in ROM size than other implementations.
Abstract: XML is a widely used as message serialization format in web-based open and heterogeneous systems because of its flexible data model. Internet-of-Things (IoT), or network with constrained nodes, is expected to be heterogeneous, and flexibility and expressiveness of XML are also good for IoT. However, RAM and bandwidth constraints on such nodes make handling of XML difficult. The authors are developing XML-less EXI to solve the problem. Our approach adopts Efficient XML Interchange (EXI) as alternative serialization form of XML. It solves the bandwidth problem of XML. At the same time, the authors apply code generation techniques to encode/decode EXI stream without XML data models on constrained nodes. Static state machines from a schema-informed EXI grammar enable constrained nodes to convert EXI data directly from/to its internal data. The authors show that XML-less EXI is highly efficient in RAM usage regardless of the size of an EXI stream and more compact in ROM size than other implementations. The authors also provide code size estimations for a set of schema-informed EXI grammars and insights on how to make the grammars compact.
TL;DR: A novel mapping of XML data into one wide table whose columns are sparsely populated is proposed that provides good performance for document types and queries that are observed in enterprise applications but are not supported efficiently by existing work.
Abstract: XML is commonly supported by SQL database systems. However, existing mappings of XML to tables can only deliver satisfactory query performance for limited use cases. In this paper, we propose a novel mapping of XML data into one wide table whose columns are sparsely populated. This mapping provides good performance for document types and queries that are observed in enterprise applications but are not supported efficiently by existing work. XML queries are evaluated by translating them into SQL queries over the wide sparsely-populated table. We show how to translate full XPath 1.0 into SQL. Based on the characteristics of the new mapping, we present rewriting optimizations that minimize the number of joins. Experiments demonstrate that query evaluation over the new mapping delivers considerable improvements over existing techniques for the target use cases.
TL;DR: A scalable store for managing a large corpora of XML documents built on top of off-the-shelf cloud infrastructure is presented and different indexing strategies are implemented to evaluate a query workload over the stored documents in the cloud.
Abstract: It has been by now widely accepted that an increasing part of the world's interesting data is either shared through the Web or directly produced through and for Web platforms using formats like XML (structured documents). We present a scalable store for managing a large corpora of XML documents built on top of off-the-shelf cloud infrastructure. We implement different indexing strategies to evaluate a query workload over the stored documents in the cloud. Moreover, each strategy presents different trade-offs between efficiency in query answering and cost for storing the index.
TL;DR: This paper focuses on integrating XML data based on multiple related XML schemas, to an equivalent data warehouse schemas based on relational online analytical processing (ROLAP) and a new data structure, Schema Graph has been proposed in the process.
Abstract: Data Warehouse is one of the most common ways for analyzing large data for decision based system. These data are often sourced from online transactional system. The transactional data are represented in different formats. XML is one of the worldwide standards to represent data in web based system. Numbers of organizations use XML for e-commerce and internet based applications. Integration of XML and data warehouse for the innovation of business logic and to enhance decision making has therefore emerged as a demanding area of research interest. This paper focuses on integrating XML data based on multiple related XML schemas, to an equivalent data warehouse schemas based on relational online analytical processing (ROLAP). This work bears a high relevance towards standardizing of the ETL phase (Extraction, Transformation, and Loading) of the OLAP projects. The novelty of the work is that more than one data warehouse schemas could be identified from a single related XML schema and each of them could be categorized as star schema or snowflake schema. Moreover if the individual schemas are found to be related according to the analysis, fact constellation could be identified. A new data structure, Schema Graph has been proposed in the process.
TL;DR: This paper proposes a novel structure-preserving way for representing tree-structured document instances as records in a standard flat data structure to enable applicability of a wider range of data analysis techniques.
Abstract: Mining of semi-structured data such as XML is a popular research topic due to many useful applications. The initial work focused mainly on values associated with tags, while most of recent developments focus on discovering association rules among tree structured data objects to preserve the structural information. Other data mining techniques have had limited use in tree-structured data analysis as they were mainly designed to process flat data format with no need to capture the structural properties of data objects. This paper proposes a novel structure-preserving way for representing tree-structured document instances as records in a standard flat data structure to enable applicability of a wider range of data analysis techniques. The experiments using synthetic and real world data demonstrate the effectiveness of the proposed approach.
TL;DR: This paper presents a comprehensive approach for privacy preserving access control based on the notion of purpose, which relies on usage access control models as well as the components that arebased on the notions of the purpose information used in subjects and objects.
TL;DR: The obtained evaluation results show that the developed concept and library provide the targeted robustness against all kinds of known XML Signature Wrapping attacks, and that these security merits are obtained at low efficiency and performance costs as well as remain compliant with the underlying standards.
Abstract: XML Encryption and XML Signature are fundamental security standards forming the core for many applications which require to process XML-based data. Due to the increased usage of XML in distributed systems and platforms such as in SOA and Cloud settings, the demand for robust and effective security mechanisms increased as well. Recent research work discovered, however, substantial vulnerabilities in these standards as well as in the vast majority of the available implementations. Amongst them, the so-called XML Signature Wrapping attack belongs to the most relevant ones. With the many possible instances of this attack type, it is feasible to annul security systems relying on XML Signature and to gain access to protected resources as has been successfully demonstrated lately for various Cloud infrastructures and services. This paper contributes a comprehensive approach to robust and effective XML Signatures for SOAP-based Web Services. An architecture is proposed, which integrates the r equired enhancements to ensure a fail-safe and robust signature generation and verification. Following this architecture, a hardened XML Signature library has been implemented. The obtained evaluation results show that the developed concept and library provide the targeted robustness against all kinds of known XML Signature Wrapping attacks. Furthermore the empirical results underline, that these security merits are obtained at low efficiency and performance costs as well as remain compliant with the underlying standards.
TL;DR: XML appliances/routers may be organized to implement one or more XML distribution rings to enable XML documents/messages to be distributed efficiently as discussed by the authors, the rings may be logical or physical.
Abstract: XML appliances/routers may be organized to implement one or more XML distribution rings to enable XML documents/messages to be distributed efficiently. The rings may be logical or physical. The XML distribution rings enable the XML documents/messages to be exchanged without requiring the XML appliances/routers to run a routing protocol to determine how XML documents/messages should be distributed through the network. Documents may be transmitted in one way on the ring or may be transmitted in both directions around the ring to enable the ring to tolerate failure of an XML appliance/router. Each XML appliance/router will receive all XML documents/messages and will make routing decisions for those clients that have provided the XML appliance/router with XML subscriptions. The subscriptions may be formed according to the XPath standard or in another manner.
TL;DR: The basic idea is that LotusX proposes "position-aware" and "auto-completion" features to help users to create tree-modeled queries (twig pattern) by providing the possible candidates on-the-fly.
Abstract: The existing query languages for XML (e.g., XQuery) require professional programming skills to be formulated, however, such complex query languages burden the query processing. In addition, when issuing an XML query, users are required to be familiar with the content (including the structural and textual information) of the hierarchical XML, which is diffcult for common users. The need for designing user friendly interfaces to reduce the burden of query formulation is fundamental to the spreading of XML community. We present a twig-based XML graphical search system, called LotusX, that provides a graphical interface to simplify the query processing without the need of learning query language and data schemas and the knowledge of the content of the XML document. The basic idea is that LotusX proposes "position-aware" and "auto-completion" features to help users to create tree-modeled queries (twig pattern) by providing the possible candidates on-the-fly. In addition, complex twig queries (including order sensitive queries) are supported in LotusX. Furthermore, a new ranking strategy and a query rewriting solution are implemented to rank and rewrite the query effectively. We provide an online demo for LotusX system: http://datasearch.ruc.edu.cn:8080/LotusX.
TL;DR: Experimental results are shown, showing that ViP2P, a platform for the distributed, parallel dissemination of XML data among peers, outperforms related systems by orders of magnitude in terms of data volumes, network size and data dissemination throughput.
Abstract: We consider the problem of efficiently sharing large volumes of XML data based on distributed hash table overlay networks. Over the last three years, we have built ViP2P (standing for Views in Peer-to-Peer), a platform for the distributed, parallel dissemination of XML data among peers. At the core of ViP2P stand distributed materialized XML views, defined as XML queries, filled in with data published anywhere in the network, and exploited to efficiently answer queries issued by any network peer. ViP2P is one of the very few fully implemented P2P platforms for XML sharing, deployed on hundreds of peers in a WAN. This paper describes the system architecture and modules, and the engineering lessons learned. We show experimental results, showing that our choices, outperf related systems by orders of magnitude in terms of data volumes, network size and data dissemination throughput.
TL;DR: This paper presents OrderBased labeling scheme which is dynamic, simple and compact yet able to identify structural relationships among nodes and a set of performance tests show promising labeling, querying, update performance and optimum label size.
Abstract: Need for robust and high performance XML database systems increased due to growing XML data produced by today’s applications. Like indexes in relational databases, XML labeling is the key to XML querying. Assigning unique labels to nodes of a dynamic XML tree in which the labels encode all structural relationships between the nodes is a challenging problem. Early labeling schemes designed for static XML document generate short labels; however, their performance degrades in update intensive environments due to the need for relabeling. On the other hand, dynamic labeling schemes achieve dynamicity at the cost of large label size or complexity which results in poor query performance. This paper presents OrderBased labeling scheme which is dynamic, simple and compact yet able to identify structural relationships among nodes. A set of performance tests show promising labeling, querying, update performance and optimum label size.
TL;DR: This work designs and implements FoXtrot, a system for filtering XML data that combines the strengths of automata for efficient filtering and distributed hash tables for building a fully distributed system, and performs an extensive experimental evaluation of it.
Abstract: Publish/subscribe systems have emerged in recent years as a promising paradigm for offering various popular notification services. In this context, many XML filtering systems have been proposed to efficiently identify XML data that matches user interests expressed as queries in an XML query language like XPath. However, in order to offer XML filtering functionality on an Internet-scale, we need to deploy such a service in a distributed environment, avoiding bottlenecks that can deteriorate performance. In this work, we design and implement FoXtrot, a system for filtering XML data that combines the strengths of automata for efficient filtering and distributed hash tables for building a fully distributed system. Apart from structural-matching, performed using automata, we also discuss different methods for evaluating value-based predicates. We perform an extensive experimental evaluation of our system, FoXtrot, on a local cluster and on the PlanetLab network and demonstrate that it can index millions of user queries, achieving a high indexing and filtering throughput. At the same time, FoXtrot exhibits very good load-balancing properties and improves its performance as we increase the size of the network.
TL;DR: This work tries to leverage Hadoop to solve the storage problem of massive electronic pedigrees, by the optimization of storing and accessing massive small XML files in HDFS.
Abstract: Benefiting from trustworthily tracking of the processes in the production, processing, storage, transportation and sale phases, an electronic pedigree system becomes an important technology of the Internet of Things. In an electronic pedigree system, small-sized but huge volume of electronic pedigrees in the XML format will be generated, stored, and retrieved. Unfortunately, study of these massive electronic pedigrees' storage in an electronic pedigree system, which is in the form of small XML files, is rarely concerned. We, therefore, try to leverage Hadoop to solve the storage problem of massive electronic pedigrees, by the optimization of storing and accessing massive small XML files in HDFS. First, all correlated small XML files of the same envelope are merged into a larger file to reduce the metadata occupation at NameNode. Second, a prefetching mechanism and a remerging mechanism are used to improve the efficiency of accessing small XML files. Finally, we implement a prototype to evaluate the effectiveness and efficiency comparing with the origin HDFS. The results show that the optimized approach is able to reduce the memory consumption of NameNodes by up to 50%, improve performance of storing by up to 91%, and accelerate accessing by up to 88% in Hadoop.
TL;DR: An efficient algorithm is proposed (XRel_Change_SQL) for detecting unordered changes between two XML data files stored in XRel as the underlying relational data model, using Structured Query Language (SQL).
Abstract: The dramatic increase in the evolution of XML data available on the Internet requires a change detection system to keep track of important changes occurring during their life time. In this paper, we introduce a novel approach of detecting changes between two versions of unordered XML data stored in a traditional relational database using approaches like XRel. Most of the existing work in the area of XML change detection is mainly focused on detecting changes between two versions of XML data by constructing their Document Object Model (DOM) trees and then comparing these two tree structures based on Longest Common Sequence (LCS) using minimum edit distances. The basic tree comparison approach is not efficient in handling large XML files due to the fact that (1) an equivalent XML DOM tree will be twice as large as the original document and (2) the entire trees of both versions have to be memory resident during the comparison process. These two issues are constrained by the available main memory. In addition, existing approaches fail to detect changes among versions of XML data stored in relational databases as reverse mapping is not loss-less. We propose an efficient algorithm (XRel_Change_SQL) for detecting unordered changes between two XML data files stored in XRel as the underlying relational data model, using Structured Query Language (SQL). We compare the efficiency and quality of our change detection algorithm with existing XML change detection tools like X-Diff, DeltaXML and XANDY. We provide an experimental evaluation of the results obtained from the benchmark datasets as well as some synthetic datasets to show that our approach is highly scalable, and results in a much better efficiency and delta quality than the aforementioned approaches and tools.
TL;DR: XRecursive as mentioned in this paper is an algorithm schema named XRecursive that translates XML documents to relational database according to the proposed storing structure, the steps and algorithm are given in details to describe how to use the storing structure to storage and query XML documents in relational database.
Abstract: Storing XML documents in a relational database is a promising solution because relational databases are mature and scale very well and they have the advantages that in a relational database XML data and structured data can coexist making it possible to build application that involve both kinds of data with little extra effort . In this paper, we propose an algorithm schema named XRecursive that translates XML documents to relational database according to the proposed storing structure. The steps and algorithm are given in details to describe how to use the storing structure to storage and query XML documents in relational database. Then we report our experimental results on a real database to show the performance of our method in some features.
TL;DR: This paper proposed a new storage strategy based on the modified tree model that has three pre-defined tables to store the main structural information within the tree structure respectively and labels nodes with specific information of parent id and position so that the relationship of nodes such as ancestors and parents can be kept.
Abstract: Due to self-defined labels and flexible structure, XML has a great advantage of storing data over the Internet. Thereby, how to effectively map XML documents to relational database which is mainstream database in the different domain becomes a hot topic for current researchers. This paper proposed a new storage strategy based on the modified tree model. Our approach has three pre-defined tables to store the main structural information within the tree structure respectively. By labeling nodes with specific information of parent id and position, the relationship of nodes such as ancestors and parents can be kept to support the query and reconstruction of the XML Document.
TL;DR: The SEML interpreter is a solution for relational databases similar to what X-Query is for XML databases, and can be used as a generic tool for extracting, transforming, and loading ETL purposes.
Abstract: Almost all enterprises use relational databases to handle real time business operations and most need to generate various XML documents for data exchanges internally among various departments and externally with business partners. Exporting data in a relational database to an XML document can be considered a data conversion process. Based on the four approaches for data conversion: Customized program, Interpretive transformer, Translator generator, and Logical level translation, this paper proposes a new interpretive approach using Structured Export Markup Language SEML interpreter for converting relational data into XML documents. The frameworks and languages proposed by other researchers are neither generic nor able to generate arbitrary XML documents. Therefore, SEML interpreter is a simple, user friendly, and complete solution with a new mark-up language ? SEML ? for data conversion. The solution can be used as a generic tool for extracting, transforming, and loading ETL purposes. In other words, the SEML interpreter is a solution for relational databases similar to what X-Query is for XML databases.
TL;DR: This paper presents an efficient mining algorithm, namely ebXMiner, to discover the frequent XML query patterns for ebXML applications, and proposes a new idea by collecting the equivalent XML queries and then enumerating the candidates from infrequent XML queries in the authors' ebXMiners.
Abstract: Providing efficient query to XML data for ebXML applications in e-commerce is crucial, as XML has become the most important technique to exchange data over the Internet. ebXML is a set of specifications for companies to exchange their data in e-commerce. Following the ebXML specifications, companies have a standard method to exchange business messages, communicate data, and business rules in e-commerce. Due to its tree-structure paradigm, XML is superior for its capability of storing and querying complex data for ebXML applications. Therefore, discovering frequent XML query patterns has become an interesting topic for XML data management in ebXML applications. In this paper, we present an efficient mining algorithm, namely ebXMiner, to discover the frequent XML query patterns for ebXML applications. Unlike the existing algorithms, we propose a new idea by collecting the equivalent XML queries and then enumerating the candidates from infrequent XML queries in our ebXMiner. Furthermore, our simulation results show that ebXMiner outperforms other algorithms in its execution time.
TL;DR: An analytical system composed by LMDQL, an analytical query language, is proposed to process XML data that contains XLink and the XLDM metamodel is given to deal with syntactic, semantic and structural heterogeneities commonly found in XML documents.
Abstract: Current commercial and academic OLAP tools do not process XML data that contains XLink. Aiming at overcoming this issue, this paper proposes an analytical system composed by LMDQL, an analytical query language. Also, the XLDM metamodel is given to model cubes of XML documents with XLink and to deal with syntactic, semantic and structural heterogeneities commonly found in XML documents. As current W3C query languages for navigating in XML documents do not support XLink, XLPath is discussed in this article to provide features for the LMDQL query processing. A prototype system enabling the analytical processing of XML documents that use XLink is also detailed. This prototype includes a driver, named sql2xquery, which performs the mapping of SQL queries into XQuery. To validate the proposed system, a case study and its performance evaluation are presented to analyze the impact of analytical processing over XML/XLink documents.
TL;DR: In this article, a method for storing XML data into a relational database, comprising the following steps: splitting an XML Schema into one or more mapping configuration files, each mapping configuration file corresponding to a relational table; parsing an XML text, and according to the associative relationship in the mapping configurations files, inserting the data in the XML text into the multiple relational database tables; and accessing the database to read the data.
Abstract: A method for storing XML data into a relational database, comprising the following steps: splitting an XML Schema into one or more mapping configuration files, each mapping configuration file corresponding to a relational database table; parsing an XML text, and according to the associative relationship in the mapping configuration files, inserting the data in the XML text into the multiple relational database tables; and accessing the database to read the data in the XML text. The method of the present invention stores XML file data into a relational database, and accelerates data reading and access speed.
TL;DR: A model mapping approach for storing XML data in relational database which use two tables in it: Node table and Data table, which stores all node id’s along with node names and corresponding node values.
Abstract: The Extensible Markup Language (XML) is used for representing data over the web. Storing XML documents in relational databases uses two kinds of approaches: Model mapping and Structured mapping. This paper explores a model mapping approach for storing XML data in relational database which use two tables in it: Node table and Data table. Node table stores all node id’s along with node names. Data table stores corresponding node values in it. We also propose an algorithm that shows how the nodes of the XML document are stored in terms of tables in database.