TL;DR: A three-way XML merge algorithm that is faster, uses less memory and is more precise than previous algorithms, which uses a specialized versioning tree data structure that supports node identity and change detection.
Abstract: XML has become the standard document representation for many popular tools in various domains. When multiple authors collaborate to produce a document, they must be able to work in parallel and periodically merge their efforts into a single work. While there exist a small number of three-way XML merging tools, their performance could be improved in several areas. We present a three-way XML merge algorithm that is faster, uses less memory and is more precise than previous algorithms. It uses a specialized versioning tree data structure that supports node identity and change detection. The algorithm applies the traditional three-way merge found in GNU diff3 to the children of changed nodes. The editing operations it supports are addition, deletion, update, and move. The algorithm is evaluated by comparing its performance to that of the previous algorithms, using synthetically generated XML documents of a range of sizes and modified by varying numbers of random editing operations. The prototype merge tool used in these tests also includes a simple graphical interface for visualizing and resolving conflicts.
TL;DR: This paper has proposed (1, Xm) method, which basically implies the XML data passing on to the multiple channels, and sent data by indexing onto two different channels: index and data channel.
Abstract: <p><span>This document explores the utilization of Apache Hive external tables for efficient XML data processing. XML (eXtensible Markup Language) is a widely used format for data interchange, and processing XML data efficiently poses challenges, especially when dealing with large datasets. Apache Hive, a data warehousing infrastructure built on top of Hadoop, offers a solution for processing structured data by providing a SQL-like interface. By leveraging Apache Hive external tables, XML data can be efficiently processed and queried in a distributed environment. This paper discusses the benefits of using external tables for XML data processing, provides a step-by-step guide for setting up and querying XML data in Apache Hive, and presents performance benchmarks demonstrating the efficiency of this approach.</span></p>
Abstract: XML (Extensible Mark up language) is emerging as a tool for representing and exchanging data over the internet. When we want to store and query XML data, we can use two approaches either by using native databases or XML enabled databases. In this paper we deal with XML enabled databases. We use relational databases to store XML documents. In this paper we focus on mapping of XML DTD into relations. Mapping needs three steps: 1) Simplify Complex DTD’s 2) Make DTD graph by using simplified DTD’s 3) Generate Relational schema. We present an inlining algorithm for generating relational schemas from available DTD’s. This algorithm also handles recursion in an XML document.
Abstract: <p>Very few research works have been done on XML security over relational databases despite that XML became the de facto standard for the data representation and exchange on the internet and a lot of XML documents are stored in RDBMS. In [14], the author proposed an access control model for schema-based storage of XML documents in relational storage and translating XML access control rules to relational access control rules. However, the proposed algorithms had performance drawbacks. In this paper, we will use the same access control model of [14] and try to overcome the drawbacks of [14] by proposing an efficient technique to store the XML access control rules in a relational storage of XML DTD. The mapping of the XML DTD to relational schema is proposed in [7]. We also propose an algorithm to translate XPath queries to SQL queries based on the mapping algorithm in [7].</p>
Abstract: Large volume of information is stored in XML format in the Web, and clustering is a management method for this documents. Most of current methods for clustering XML documents consider only one of these two aspects. In this paper, we propose SCEM (Expectation Maximization Structure and Content) for XML documents which is used to effectively cluster XML documents by combining content and structural features. The other contribution of this paper is that we used probabilistic distributions in such way that have probability parameters corresponding to one cluster. In this way, we obtained better effectiveness compared to other clustering methods due to generality. Experimental results on real datasets show effectiveness of proposed method, particularly when it is applied on large XML documents without schema. Also it can be used to improve accuracy and effectiveness of XML information retrieval.