TL;DR: In this paper, an extensible binary mark-up language for XML-based data storage and communications has been proposed, which is compatible with existing XML standards and provides significantly improved efficiencies for XML based data storage, particularly for narrow and low bandwidth communication media.
Abstract: An extensible binary mark-up language is disclosed that is compatible with existing XML standards yet provides significantly improved efficiencies for XML-based data storage and communications, particularly for narrow and low bandwidth communication media. A corresponding extensible non-binary mark-up language is also disclosed that is compatible with the XML standard. This dual-representation common message format (CMF) allows standard XML tools to be utilized in viewing and editing XML-based data and allows a CMF parser to be utilized to convert the XML formatted information into an extensible binary representation for actual communication through a medium or storage on a wide range of media. Advantages include a very compact, yet flexible and extensible binary data representation (CMF-B) for a corresponding extensible mark-up language (CMF-X), a data packaging scheme that allows for the effective transport of XML-based data over existing data channels, including narrow-bandwidth channels that utilize existing network protocols, and a CMF parser that allows for seamless conversion between CMF-B and CMF-X.
TL;DR: The semantics of query answering in this setting is defined and an algorithm for inferring precise mapping rules from informal schema correspondences is developed, which handles an expressive fragment of XQuery and works both along and against the direction of mapping rules.
Abstract: Peers in a peer-to-peer data management system often have heterogeneous schemas and no mediated global schema. To translate queries across peers, we assume each peer provides correspondences between its schema and a small number of other peer schemas. We focus on query reformulation in the presence of heterogeneous XML schemas, including data---metadata conflicts. We develop an algorithm for inferring precise mapping rules from informal schema correspondences. We define the semantics of query answering in this setting and develop query translation algorithm. Our translation handles an expressive fragment of XQuery and works both along and against the direction of mapping rules. We describe the HePToX heterogeneous P2P XML data management system which incorporates our results. We report the results of extensive experiments on HePToX on both synthetic and real datasets. We demonstrate our system utility and scalability on different P2P distributions.
TL;DR: In this article, horizontal and vertical fragmentation techniques are generalised from the relational datamodel to XML and splitting is introduced as a third kind of fragmentation, and it is shown how relational techniques for de ning reasonable fragments can be applied to the case of XML.
Abstract: The world-wide web (WWW) is often considered to be the world's largest database and the eXtensible Markup Language (XML) is then considered to provide its datamodel. Adopting this view we have to deal with a distributed database. This raises the question, how to obtain a suitable distribution design for XML documents. In this paper horizontal and vertical fragmentation techniques are generalised from the relational datamodel to XML. Furthermore, splitting will be introduced as a third kind of fragmentation. Then it is shown how relational techniques for de ning reasonable fragments can be applied to the case of XML.
TL;DR: The design of the first complete field programmable gate array (FPGA) accelerator capable of XML well-formed checking, schema validation, and tree construction at a throughput of 1 cycle per byte (CPB) is detailed.
Abstract: Extensible Markup Language (XML) is playing an increasing important role in web services and database systems. However, the task of XML parsing is often the bottleneck, and as a result, the target of acceleration using custom hardware or multicore CPUs. In this paper, we detail the design of the first complete field programmable gate array (FPGA) accelerator capable of XML well-formed checking, schema validation, and tree construction at a throughput of 1 cycle per byte (CPB). This is a significant advancement from 40 CPB, the best previous reported commercial result. We demonstrate our design on a Xilinx Virtex-5 board, which successfully saturates a 1 Gbps Ethernet link.
TL;DR: This article proposes a novel distributed index structure and a clustering strategy for streaming XML data that enables energy and latencyefficient broadcasting of XML data and demonstrates that it is effective for wireless broadcasting ofxml data and outperforms the previous methods.
Abstract: In this article, we address the problem of delayed query processing raised by tree-based index structures in wireless broadcast environments, which increases the access time of mobile clients. We propose a novel distributed index structure and a clustering strategy for streaming XML data that enables energy and latencyefficient broadcasting of XML data. We first define the DIX node structure to implement a fully distributed index structure which contains the tag name, attributes, and text content of an element, as well as its corresponding indices. By exploiting the index information in the DIX node stream, a mobile client can access the stream with shorter latency. We also suggest a method of clustering DIX nodes in the stream, which can further enhance the performance of query processing in the mobile clients. Through extensive experiments, we demonstrate that our approach is effective for wireless broadcasting of XML data and outperforms the previous methods.
TL;DR: This thesis describes the design of a full-fledged XML storage and query architecture, which represents the core of the Open Source database system BASEX, and introduces a survey on state-of-the-art XML query languages.
Abstract: After its introduction in 1998, XML has quickly emerged as the de facto exchange format for textual data. Only ten years later, the amount of information that is being processed day by day, locally and globally, has virtually exploded, and no end is in sight. Correspondingly, many XML documents and collections have become much too large for being retrieved in their raw form – and this is where database technology gets into the game. This thesis describes the design of a full-fledged XML storage and query architecture, which represents the core of the Open Source database system BASEX. In contrast to numerous other works on XML processing, which either focus on theoretical aspects or practical implementation details, we have tried to bring the two worlds together: well-established and novel concepts from database technology and compiler construction are consolidated to a powerful and extensible software architecture that is supposed to both withstand the demands of complex real-life applications and comply with all the intricacies of the W3C Recommendations. In the Storage chapter, existing tree encodings are explored, which allow XML documents to be mapped to a database. The Pre/Dist/Size triple is chosen as the most suitable encoding and further optimized by merging all XML node properties into a single tuple, compactifying redundant information, and inlining attributes and numeric values. The address ranges of numerous large-scale and real-life XML instances are analyzed to find an optimal tradeoff between maximum document and minimum database size. The process of building a database is described in detail, including the import of tree data other than XML and the creation of main memory database instances. As one of the distinguishing features, the resulting storage is enriched by light-weight structural, value and full-text indexes, which speed up query processing by orders of magnitudes. The Querying chapter is introduced with a survey on state of the art XML query languages. We give some insight into the design of an XQuery processor and then focus on the optimization of queries. Beside classical concepts, such as constant folding or static typing, many optimizations are specific to XML: location paths are rewritten to access less XML nodes, and FLWOR expressions are reorganized to reduce the algorithmic com-
TL;DR: A new relational schema named XLight is introduced for storing and processing XML data, and it is compared with some similar methods.
Abstract: Because of increasing use of XML data on the internet, the need for an efficient method of storing and querying XML data is vital. So far, two major types of system for XML data management have been introduced: XML Enabled systems and XML native systems. The former uses relational database system for storing and querying XML data and the latter is a special XML database system which is based on XML data model. Since relational database systems are more mature than XML native systems, it seems that the use of abilities and efficiencies of relational systems is more economical. In this article, we have introduced a new relational schema named XLight for storing and processing XML data, and we have compared it with some similar methods.
TL;DR: An overview of the recent research in dynamic XML labelling schemes is provided and a set of properties are defined that represent a more holistic dynamic labelling scheme and presented through an evaluation matrix for most of the existing schemes that provide update functionality.
Abstract: The adoption of XML as the default data interchange format and the standardisation of the XPath and XQuery languages has resulted in significant research in the development and implementation of XML databases capable of processing queries efficiently. The ever-increasing deployment of XML in industry and the real-world requirement to support efficient updates to XML documents has more recently prompted research in dynamic XML labelling schemes. In this paper, we provide an overview of the recent research in dynamic XML labelling schemes. Our motivation is to define a set of properties that represent a more holistic dynamic labelling scheme and present our findings through an evaluation matrix for most of the existing schemes that provide update functionality.
TL;DR: In this paper, the authors describe a technique for fast and scalable generation and aggregation of XML data, which is based on XML query evaluation and XML query generation and query aggregation, where the XML query is evaluated to determine XML results.
Abstract: Techniques for fast and scalable generation and aggregation of XML data are described. In an example embodiment, an XML query that requests data from XML documents is received. The XML query is evaluated to determine one or more XML results. For each particular XML result, evaluating the XML query comprises: instantiating a particular data structure that represents the particular XML result, where the particular data structure is encoded in accordance with tags specified in the XML query but does not store the tags; and storing, in the particular data structure, one or more locators that respectively point to one or more fragments in the XML documents, where the particular data structure stores the one or more locators but does not store the one or more fragments. On demand, in response to a request indicating the particular XML result, a serialized representation of the particular XML result is generated based at least on the particular data structure.
TL;DR: In this article, the authors propose an architecture that extends conventional computer programming languages that compile into an instance of an extensible markup language (XML) document object model (DOM) to provide support for XML literals in the underlying programming language.
Abstract: An architecture that that extends conventional computer programming languages that compile into an instance of an extensible markup language (XML) document object model (DOM) to provide support for XML literals in the underlying programming language. This architecture facilitates a convenient short cut by replacing the complex explicit construction required by conventional systems to create an instance of a DOM with a concise XML literal for which conventional compilers can translate into the appropriate code. The architecture allows these XML literals to be embedded with expressions, statement blocks or namespaces to further enrich the power and versatility. In accordance therewith, context information describing the position and data types that an XML DOM can accept can be provided to the programmer via, for example, an integrated development environment. Additionally, the architecture supports escaping XML identifiers, a reification mechanism, and a conversion mechanism to convert between collections and singletons.
TL;DR: This article presents a framework for editing, publishing and sharing XML content directly from within the browser, targeted at non XML speaking end users, since it preserves end users from XML syntax during editing.
Abstract: This article presents a framework for editing, publishing and sharing XML content directly from within the browser. It comes in two parts: XTiger XML and AXEL. XTiger XML is a document template specification language for creating document models. AXEL is a client-side Javascript library that turns the document template into a document editing application running in the browser. This framework is targeted at non XML speaking end users, since it preserves end users from XML syntax during editing. Its current implementation proposes a pseudo-WYSIWYG user interface where the document template provides a document-oriented editing metaphor, or a more form-oriented metaphor, depending on the template.
TL;DR: An implementation of a three-way XML merge algorithm that is faster, uses less memory and is more precise than existing tools is presented and a graphical interface for visualizing and resolving conflicts is provided.
Abstract: XML has become the standard document representation for many popular tools in various domains. When multiple authors collaborate to produce a document, they must be able to work in parallel and periodically merge their efforts into a single work. While there exist a small number of three-way XML merging tools, their performance could be improved in several areas and they lack any form of user interface for resolving conflicts.In this paper, we present an implementation of a three-way XML merge algorithm that is faster, uses less memory and is more precise than existing tools. It uses a specialized versioning tree data structure that supports node identity and change detection. The algorithm applies the traditional three-way merge found in GNU diff3 to the children of changed nodes. The editing operations it supports are addition, deletion, update, and move. A graphical interface for visualizing and resolving conflicts is also provided. An evaluation experiment was conducted comparing the proposed algorithm with three other tools on randomly generated XML data.
TL;DR: This paper presents the methodology for XQuery query processing over distributed XML databases, which comprises the steps of query decomposition, data localization, and global optimization.
Abstract: The increasing volume of data stored as XML documents makes fragmentation techniques an alternative to the performance issues in query processing. Fragmented databases are feasible only if there is a transparent way to query the distributed database. Fragments allow for intra-query parallel processing and data reduction. This paper presents our methodology for XQuery query processing over distributed XML databases. The methodology comprises the steps of query decomposition, data localization, and global optimization. This methodology can be used in an XML database or in a system that publishes homogeneous views of semi-autonomous databases. An implementation has been done and experimental results can achieve performance improvements of up to 95% when compared to the centralized environment.
TL;DR: This paper formulates the XML signature and encryption as the core of web services security technology, and describes how to create and verify XML signature, how to encrypt and decrypt XML data.
Abstract: With the development of web services application, some issues of web services security are increasingly prominent. As a platform-independent language, XML is widely used for its high expansibility. After analysis the traditional web services security technology, this paper formulates the XML signature and encryption as the core of web services security technology, and describes how to create and verify XML signature, how to encrypt and decrypt XML data. The application of XML signature and encryption in the Web services security is illustrated.
TL;DR: A set of transformation rules are specified that are able to automatically generate not only the corresponding XML structure of the DW from secure conceptual DW models, but also the security rules specified within the DW XML structure, thus allowing us to implement both aspects simultaneously.
Abstract: Data Warehouses (DWs) are currently considered to be the cornerstone of Business Intelligence (BI) systems. Security is a key issue in DWs since the business information that they manage is crucial and highly sensitive, and should be carefully protected. However, the increasing amount of data available on the Web signifies that more and more DW systems are considering the Web as the primary data source through which to populate their DWs. XML is therefore widely accepted as being the principal means through which to provide easier data and metadata interchange among heterogeneous data sources from the Web and the DW systems.Although security issues have been considered during the whole development process of traditional DWs, current research lacks approaches with which to consider security when the target platform is based on the Web and XML technologies. The idiosyncrasy of the unstructured and semi-structured data available on the Web definitely requires particular security rules that are specifically tailored to these systems in order to permit their particularities to be captured correctly.In order to tackle this situation, in this paper, we propose a methodological approach based on the Model Driven Architecture (MDA) for the development of Secure XML DWs. We therefore specify a set of transformation rules that are able to automatically generate not only the corresponding XML structure of the DW from secure conceptual DW models, but also the security rules specified within the DW XML structure, thus allowing us to implement both aspects simultaneously. A case study is provided at the end of the paper to show the benefits of our approach.
TL;DR: In this article, an efficient linear algorithm for mapping XML data to relational data is proposed, which can be easily adapted to other inlining algorithms and is based on our previous proposed inlining algorithm.
Abstract: XML has emerged as the standard for representing and exchanging data on the World Wide Web. It is critical to have efficient mechanisms to store and query XML data to exploit the full power of this new technology. Several researchers have proposed to use relational databases to store and query XML data. While several algorithms of schema mapping and query mapping have been proposed, the problem of mapping XML data to relational data, i.e., mapping an XML INSERT statement to a sequence of SQL INSERT statements, has not been addressed thoroughly in the literature. In this paper, we propose an efficient linear algorithm for mapping XML data to relational data. This algorithm is based on our previous proposed inlining algorithm for mapping DTDs to relational schemas and can be easily adapted to other inlining algorithms.
TL;DR: A code on XML node index based quadruple model as spatio-temporal data storage model is proposed and the Native XML Database (NXD) is used to store the spatIO-tem temporal data.
Abstract: This article uses the XML/GML language to describe a feature-based spatio-temporal data model(FBSTDM), but the traditional relational or object-oriented database does not actually support this semi-structured data storage; This paper proposes a code on XML node index based quadruple model as spatio-temporal data storage model and uses the Native XML Database(NXD) to store the spatio-temporal data.
TL;DR: This book aims to provide a single account of current studies in soft computing approaches to XML data management to provide the state of the art information to researchers, practitioners, and graduate students of the Web intelligence.
Abstract: This book covers in a great depth the fast growing topic of techniques, tools and applications of soft computing in XML data management. It is shown how XML data management (like model, query, integration) can be covered with a soft computing focus. This book aims to provide a single account of current studies in soft computing approaches to XML data management. The objective of the book is to provide the state of the art information to researchers, practitioners, and graduate students of the Web intelligence, and at the same time serving the information technology professional faced with non-traditional applications that make the application of conventional approaches difficult or impossible.
TL;DR: This paper proposes a general mediation framework to facilitate the storage of the new incoming data in XML format into the relational databases of the legacy health information systems and vice versa and has the capacity to preserve the integrity constraints of the relational schema.
Abstract: Providing a transparent and automatic communication between health information systems for the purpose of exchanging patients' data among healthcare professionals is deemed as one of the most challenging problems in eHealth. Indeed, data storage in health information systems is mainly performed in relational databases, whereas eXtensible Markup Language (XML) is seen as the de facto standard for exchanging data among health organizations. Automating data interchange between relational databases and XML documents remains however a challenge. In this paper, we propose a general mediation framework to facilitate the storage of the new incoming data in XML format into the relational databases of the legacy health information systems and vice versa. The proposed mediation architecture is based on the XML technology and its related languages and derivatives (XML Schema, eXtensible Stylesheet Language Transformations (XSLT)…), which provide powerful tools for sharing, converting and exchanging information. The adopted methodology consists in converting the database model into an XML schema and in performing an automatic, reliable and efficient mapping between the schemas representing the exchanged source and target data by means of the XSLT language. Our approach has the capacity to preserve the integrity constraints of the relational schema, which allows to check the XML infosets for anomalies or incoherencies before updating the relational database from the XML document. It also captures the hierarchy of the tables in the target database, which guarantees that the automatically generated Structured Query Language (SQL) queries will be correctly performed. Moreover, our mediator includes a rule base allowing a coherent and secure mapping between the exchanged data sources for ensuring the database integrity.
TL;DR: VIREX provides an interactive approach for querying and integrating relational databases to produce XML documents and the corresponding schemas and supports VRXQuery, which is a visual naive-users-oriented query language that allows users to specify queries and define views directly on the interactive diagram as a sequence of mouse clicks with minimum keyboard input.
Abstract: VIREX provides an interactive approach for querying and integrating relational databases to produce XML documents and the corresponding schemas. VIREX connects to each database specified by the user; analyzes the catalogue to derive an interactive diagram equivalent to the extended entity-relationship diagram; allows the user to display sample records from the tables in the database; allows the user to rename columns and relations by modifying directly the interactive diagram; facilitates the conversion of the relational database into XML; and derives the XML schema. VIREX works even when the catalogue of the relational database is missing; it extracts the required catalogue information by analyzing the database content. Further, VIREX supports VRXQuery, which is a visual naive-users-oriented query language that allows users to specify queries and define views directly on the interactive diagram as a sequence of mouse clicks with minimum keyboard input. The user is expected to interactively decide on certain factors to be considered in producing the XML result. Such factors include: 1) selecting the relations/attributes to be converted into XML; 2) specifying a predicate to be satisfied by the information to be converted into XML; 3) deciding on the order of nesting between the relations to be converted into XML; 4) ordering for the result. VRXQuery supports selection, projection, nesting/join, union, difference, and order-by. As the result of a query, VIREX displays on the screen the XML schema that satisfies the specified characteristics and generates colored (easy to read) XML document(s). Further, VIREX allows the user to display and review the SQL and XQuery equivalent to each query expressed in VRXQuery.
TL;DR: In this paper, column values that are to be stored for shredded XML documents are separately analyzed for a XML document to determine whether to store a particular column in column-major format or row major format, and what compression technique to use.
Abstract: A database server exploits the power of compression and a form of storing relational data referred to as column-major format, to store XML documents in shredded form. The column values that are to be stored for shredded XML documents are separately analyzed for a XML document to determine whether to store a particular column in column-major format or row-major format, and what compression technique to use, if any.
TL;DR: This paper addresses SLCA-based keyword search for continuous XML documents by Map-Reduce mechanism by using parallel algorithms to process plenty of XML documents in Hadoop environment and demonstrates the efficiency of the algorithms analytically and experimentally.
Abstract: Large scales of XML information comes continually from new Web applications, and SLCA (Smallest Lowest Common Ancestor)-based XML keyword search is one of the most important information retrieval approaches. Previous approaches focus on building index for XML documents. However in information dissemination scenario, it is impossible to build index in advance for continuous XML document streams. This paper addresses SLCA-based keyword search for continuous XML documents by Map-Reduce mechanism. We use parallel algorithms to process plenty of XML documents in Hadoop environment. A distributed SLCA computation method is designed, where each net node computes SLCA independently and just a little information needs be transmitted. A real Hadoop environment is built and we demonstrate the efficiency of our algorithms analytically and experimentally.
TL;DR: This work believes that the key contribution of this system is an improved schema-based clustering storage strategy efficient for both XML querying and updating, and powered by a novel memory management technique.
Abstract: We present a native XML database management system, Sedna, which is implemented from scratch as a full-featured database management system for storing large amounts of XML data. We believe that the key contribution of this system is an improved schema-based clustering storage strategy efficient for both XML querying and updating, and powered by a novel memory management technique. We position our approach with respect to state-of-the-art methods.
TL;DR: This paper integrates the XAC ML attribute model with an OWL ontology and describes a practical privacy filtering application able to filter out information from XML documents, according to a set of XACML semantic privacy policies.
Abstract: The OASIS eXtensible Access Control Language (XACML) provides an interoperable tool for writing and enforcing access control policies based on attributes, i.e. characteristics of the entities that take part to the access, such as subjects or actions. Unfortunately, the attribute based approach starts to show its limits when entities exhibit complex relationships, such as semantic relations, which would be easily captured using ontologies instead of attributes. This paper integrates the XACML attribute model with an OWL ontology and describes a practical privacy filtering application able to filter out information from XML documents, according to a set of XACML semantic privacy policies.
TL;DR: This work proposes object-level matching semantics called Interested Single Object (ISO) and Interested Related Object (IRO) to capture single object and multiple objects as user’s search targets respectively, and design a novel relevance oriented ranking framework for the matching results.
Abstract: Keyword search is widely recognized as a convenient way to retrieve information from XML data. In order to precisely meet users’ search concerns, we study how to effectively return the targets that users intend to search for. We model XML document as a set of interconnected object-trees, where each object contains a subtree to represent a concept in the real world. Based on this model, we propose object-level matching semantics called Interested Single Object (ISO) and Interested Related Object (IRO) to capture single object and multiple objects as user’s search targets respectively, and design a novel relevance oriented ranking framework for the matching results. We propose efficient algorithms to compute and rank the query results in one phase. Finally, comprehensive experiments show the efficiency and effectiveness of our approach, and an online demo of our system on DBLP data is available at http://xmldb.ddns.comp.nus.edu.sg.
TL;DR: CluX uses a grammar for sharing similar substructures within the XML tree structure and a cluster-based heuristics for greedily selecting the best compression options in the grammar, which makes CluX a promising technique for XML data exchange whenever the exchanged data volume is a bottleneck in enterprise information systems.
Abstract: XML has become the de facto standard for data exchange in enterprise information systems. But whenever XML data is stored or processed, e.g. in form of a DOM tree representation, the XML markup causes a huge blow-up of the memory consumption compared to the data, i.e., text and attribute values, contained in the XML document. In this paper, we present CluX, an XML compression approach based on clustering XML sub-trees. CluX uses a grammar for sharing similar substructures within the XML tree structure and a cluster-based heuristics for greedily selecting the best compression options in the grammar. Thereby, CluX allows for storing and exchanging XML data in a space efficient and still queryable way. We evaluate different strategies for XML structure sharing, and we show that CluX often compresses better than XMill, Gzip, and Bzip2, which makes CluX a promising technique for XML data exchange whenever the exchanged data volume is a bottleneck in enterprise information systems.
TL;DR: This paper proposes a technique whereby no parsing of the subtrees involved in XPath processing is needed at all unless they contain the nodes of the final query result, and proves that the correctness of XPathprocessing is guaranteed with this technique.
Abstract: The state-of-the-art techniques of storing XML data, modeled as an XML tree, are node-based in the sense that they are centered around XML node labeling and the storage unit is an XML node. In this paper, we propose a generalization of such techniques so that the storage unit is an XML subtree that consists of one or more nodes. Despite several advantages with such generalization, a major problem would be inefficiency in XPath processing where the stored subtrees are to be parsed on the fly in order for the nodes inside them to be accessed. We solve this problem, proposing a technique whereby no parsing of the subtrees involved in XPath processing is needed at all unless they contain the nodes of the final query result. We prove that the correctness of XPath processing is guaranteed with our technique. Through implementation and experiments, we also show that the overhead of our technique is acceptable.
TL;DR: This study shows how only XML files pass from client side to the server side through a central gateway with the help of web service applications and indicates that the XML file is delivered securely to its destination in the secured e-Commerce architecture which is mandatory for organizations like banking, insurance etc.
Abstract: Security is one of the main issues that must be taken into consideration before implementing e-Commerce architecture. The architecture can be developed by using some security factors namely confidentiality, integrity, authentication, and non-repudiation for XML web services. This paper has examined these factors for implementing the secured system by using suitable security services like XML Encryption, XML Decryption, XML Signatures, and XML Validations. Various algorithms, implementations, and coding have been developed for security services and web services for creating the secured system. The most important part of the system is the gateway or web service which is implemented with suitable technologies for passing only the XML file throughout the whole system. This study shows how only XML files pass from client side to the server side through a central gateway with the help of web service applications. The result indicates that the XML file is delivered securely to its destination in the secured e-Commerce architecture which is mandatory for organizations like banking, insurance etc.
TL;DR: An automatic method for the design of data mart schemas from XML documents that has the merit of automatically identifying all multidimensional elements, classifying them in terms of their analytical potential, and tracing them to the source which facilitates the definition of ETL procedures.
Abstract: Today’s international nature of commerce forced the opening of corporal information systems (IS) to accept data required for both transactional and decisional processes. To ensure the interoperabil...