TL;DR: The goal of BonXai is not to replace XML Schema but rather to provide a simpler alternative for users who want to go beyond the expressiveness and features of DTD but do not need the explicit use of types.
Abstract: While the migration from DTD to XML Schema was driven by a need for increased expressivity and flexibility, the latter was also significantly more complex to use and understand. Whereas DTDs are characterized by their simplicity, XML Schema Documents are notoriously difficult. In this article, we introduce the XML specification language BonXai, which incorporates many features of XML Schema but is arguably almost as easy to use as DTDs. In brief, the latter is achieved by sacrificing the explicit use of types in favor of simple patterns expressing contexts for elements. The goal of BonXai is not to replace XML Schema but rather to provide a simpler alternative for users who want to go beyond the expressiveness and features of DTD but do not need the explicit use of types. Furthermore, XML Schema processing tools can be used as a back-end for BonXai, since BonXai can be automatically converted into XML Schema. A particularly strong point of BonXai is its solid foundation rooted in a decade of theoretical work around pattern-based schemas. We present a formal model for a core fragment of BonXai and the translation algorithms to and from a core fragment of XML Schema. We prove that BonXai and XML Schema can be converted back-and-forth on the level of tree languages and we formally study the size trade-offs between the two languages.
TL;DR: This paper proposes an efficient mapping approach, the mini-XML, to mapping XML into the relational database, and path technique and position information are used to indicate the complex node relationship.
Abstract: In recent years, XML technology has won wide attention from both industry and academic. It can be used to mark data, define the data type and their own markup language. It is a cross-platform, context-dependent technology in the Internet environment and an effective tool for todays distributed structure information. The S-XML is a new approach for storing semi-structured data, and it supports query of the node in XML with SQL statements, which has shown impressive performance on many classic data sets. However, it is difficult to store XML data into a relational database, and the S-XML spends much more time and space to store the data. In this paper, we propose an efficient mapping approach, the mini-XML, to mapping XML into the relational database. In addition, path technique and position information are used to indicate the complex node relationship. Finally, two experiments are conducted to prove that the proposed method can achieve better performance in the decreasing of the storage time and storage space, especially dealing with the large amount of data.
TL;DR: A novel Prime-based Middle Fraction Labeling Scheme (PMFLS) is designed accordingly, in which a series of algorithms are proposed to obtain the structural relationships among nodes and to support updates and the results show that PMFLS is efficient in handling updates and also significantly improves the performance of the query processing with good scalability.
Abstract: XML data can be represented by a tree or graph and the query processing for XML data requires the structural information among nodes. Designing an efficient labeling scheme for the nodes of Order-Sensitive XML trees is one of the important methods to obtain the excellent management of XML data. Previous labeling schemes such as region and prefix often sacrifice updating performance and suffer increasing labeling space when inserting new nodes. To overcome these limitations, in this paper we propose a new labeling idea of separating structure from order. According to the proposed idea, a novel Prime-based Middle Fraction Labeling Scheme (PMFLS) is designed accordingly, in which a series of algorithms are proposed to obtain the structural relationships among nodes and to support updates. PMFLS combines the advantages of both prefix and region schemes in which the structural information and sequential information are separately expressed. PMFLS also supports Order-Sensitive updates without relabeling or recalculation, and its labeling space is stable. Experiments and analysis on several benchmarks are conducted and the results show that PMFLS is efficient in handling updates and also significantly improves the performance of the query processing with good scalability.
TL;DR: Two new algorithms and the associated indexing structures are developed and shown to perform correctly in processing both independent and/or inter-linked XML documents.
TL;DR: A new structure for streaming the XML data is proposed which guarantees confidentiality of thexml data over the wireless stream and an access mechanism is proposed to efficiently process XML queries over the encrypted XML stream.
TL;DR: In this study, three categories of XML node labelling will be analysed to address the open problem of each category and performance of time execution and storage space required for labelling XML tree is compared.
Abstract: The flexibility nature of XML documents has motivated researchers to use it for data transmission and storage in different domains. The hierarchical structure of XML documents is an attractive point to be researched for processing a user query based on labelling where each label describes the node structure in the tree. In this study, three categories of XML node labelling will be analysed to address the open problem of each category. A number of experiments are executed to compare performance of time execution and storage space required for labelling XML tree.
TL;DR: This work aims to define a system to extract data regardless of the nature of their model and make one query enough to retrieve data from different models, which are XML and relational in this case.
TL;DR: In this article, a RESTful web service system using open source APIs (Application program interface) is proposed. But the work is focused on usage of Open Source APIs for invoking web services as Restful Web services, applying XML signature and XML encryption, parsing the XML result after reverse process at receiver side and extracting the required nodes according to the user given condition.
Abstract: The aim of this system is to use open source APIs to invoke and process the results of the Web services and to apply confidentiality and integration to the result of the Web services. In this system Web Services are requested using URL (Uniform Resource Locator) query to the WWW instead of SOAP message. When invoking the web services using JAX-RPC (Java Api for Xml-Based Remote Procedure Call) with SOAP request, the result returned may not be received properly due to de-serialization problem of complex XML data sets. Therefore it is required to invoke such services by submitting the query as URL query and received the results as XML data set. This XML data set may be hampered by man-in-middle attack during transfer. Hence this system uses open source APIs for providing XML signature and XML encryption. At the receiver after decryption and integrity check, the required node from XML data set is extracted and presented to the user. This article is focused on usage of open source APIs (Application program interface) for invoking web services as Restful Web services, applying XML signature and XML encryption, parsing the XML result after reverse process at receiver side and extracting the required nodes according to the user given condition. The problem of de-serialization in receiving huge XML result is resolved in the proposed system.
TL;DR: This paper tries to make an attempt to review various XML keyword query processing techniques and highlight some of the important issues associated with respective techniques and improvements done in order to address the issues and thereby improving overall efficiency of the XML keyword search query processing.
Abstract: Keyword search is gaining popularity for querying XML data now days as it relieves user from understanding the complex schemas of XML document and query languages such as XQuery and XPath. Various query processing techniques and efficient algorithms have been proposed in recent days to address the keyword search over XML data. The most popular techniques for XML keyword search today use query semantics ELCA (Exclusive LCA) and SLCA (Smallest LCA), both based on LCA (Lowest Common Ancestor). Among these ELCA captures more meaningful results compared with LCA and ELCA. However these techniques can result in redundant computation due to problems like common-ancestor-repetition (CAR) and visiting-useless-node (VUN). Irregular schemas of given XML document and missing elements in it are also problems of consideration in keyword query processing over XML data. In this paper we try to make an attempt to review various XML keyword query processing techniques. We also highlight some of the important issues associated with respective techniques and improvements done in order to address the issues and thereby improving overall efficiency of the XML keyword search query processing.
TL;DR: The graph modleing, storage and processing possibilities of XML data are analysed and it is shown that modeling XML data as a graph and processing it with graph processors are benaficial in many contests.
Abstract: XML is a standard format for data exchange overinternet. Also huge amount of information is tagged and storedin XML format. Processing XML data has its difficulties due tothe schema centric and semi-structured nature of the majorportion of existing XML data. The data embeded tree stucturemakes it more complicated to process. XML processing usingRDBMS systems and Native XML databases like BaseX, eXist-DBhas its own limitations. Native XML databases are not suitablefor distributed processing. So they just have to bound withsingle systems resources, which are not enough for big dataprocessing. Graph databases and Graph database technologies areemmerging in the recent past. They are also suitable to process bigdata due to the extension of parallel processing features in graphdata processors. Modeling XML data as a graph and processingit with graph processors are benaficial in many contests. In thispaper the graph modleing, storage and processing possibilities ofXML data are analysed. The major graph database Neo4j andthe GraphX graph processor extension embeded with ApacheSpark distributed in-memory processing system are utilized forquerying XML data.
TL;DR: This paper surveys state-of-the-art XML indices and discusses the main issues, tradeoffs and future trends in XML indexing, and presents an in-dex that is specifically designed for the particular architecture of XML data warehouses.
Abstract: With XML becoming a standard for business information representation and exchange, stor-ing, indexing, and querying XML documents have rapidly become major issues in database research. In this context, query processing and optimization are primordial, native-XML data-bases not being mature yet. Data structures such as indices, which help enhance performances substantially, are extensively researched, especially since XML data bear numerous specifici-ties with respect to relational data. In this paper, we survey state-of-the-art XML indices and discuss the main issues, tradeoffs and future trends in XML indexing. We also present an in-dex that we specifically designed for the particular architecture of XML data warehouses.
TL;DR: This investigation has resulted in the proposal of a novel prefix-encoding method named “Elias-Fibonacci of order 3”, which has achieved the fastest encoding time of all prefix- Encoding methods studied in this thesis, whereas Fibonacci encoding was found to require the minimum storage.
Abstract: The flexibility and self-describing nature of XML has made it the most common mark-up language used for data representation over the Web. XML data is naturally modelled as a tree, where the structural tree information can be encoded into labels via XML labelling scheme in order to permit answers to queries without the need to access original XML files. As the transmission of XML data over the Internet has become vibrant, it has also become necessary to have an XML labelling scheme that supports dynamic XML data. For a large-scale and frequently updated XML document, existing dynamic XML labelling schemes still suffer from high growth rates in terms of their label size, which can result in overflow problems and/or ambiguous data/query retrievals.
This thesis considers the compression of XML labels. A novel XML labelling scheme, named “Base-9”, has been developed to generate labels that are as compact as possible and yet provide efficient support for queries to both static and dynamic XML data. A Fibonacci prefix-encoding method has been used for the first time to store Base-9’s XML labels in a compressed format, with the intention of minimising the storage space without degrading XML querying performance. The thesis also investigates the compression of XML labels using various existing prefix-encoding methods. This investigation has resulted in the proposal of a novel prefix-encoding method named “Elias-Fibonacci of order 3”, which has achieved the fastest encoding time of all prefix-encoding methods studied in this thesis, whereas Fibonacci encoding was found to require the minimum storage.
Unlike current XML labelling schemes, the new Base-9 labelling scheme ensures the generation of short labels even after large, frequent, skewed insertions. The advantages of such short labels as those generated by the combination of applying the Base-9 scheme and the use of Fibonacci encoding in terms of storing, updating, retrieving and querying XML data are supported by the experimental results reported herein.
TL;DR: A new index structure which combines siblings of the terminal nodes as one path which efficiently processes twig queries with less number of lookups and joins is proposed.
Abstract: Querying nested data has become one of the most challenging issues for retrieving desired information from the Web. Today diverse applications generate a tremendous amount of data in different formats. These data and information exchanged on the Web are commonly expressed as nested representation such as XML, JSON, etc. Unlike the traditional database system, they don't have a rigid schema. In general, the nested data is managed by storing data and its structures separately which significantly reduces the performance of data retrieving. Ensuring efficiency of processing queries which locates the exact positions of the elements has become a big challenging issue. There are different indexing structures which have been proposed in the literature to improve the performance of the query processing on the nested structure. Most of the past researches on nested structure concentrate on the structure alone. This paper proposes new index structure which combines siblings of the terminal nodes as one path which efficiently processes twig queries with less number of lookups and joins. The proposed approach is compared with some of the existing approaches. The results also show that they are processed with better performance compared to the existing ones.
TL;DR: SECRET is the first tool that allows the encryption of whole documents or arbitrary sub-parts thereof, uses a novel combination of tree-based OT with a structure preserving encryption, and requires only a modern browser without any extra software installation or browser extension.
Abstract: Real-time editing tools like Google Docs, Microsoft Office Online, or Etherpad have changed the way of collaboration. Many of these tools are based on Operational Transforms (OT), which guarantee that the views of different clients onto a document remain consistent over time. Usually, documents and operations are exposed to the server in plaintext -- and thus to administrators, governments, and potentially cyber criminals. Therefore, it is highly desirable to work collaboratively on encrypted documents. Previous implementations do not unleash the full potential of this idea: They either require large storage, network, and computation overhead, are not real-time collaborative, or do not take the structure of the document into account. The latter simplifies the approach since only OT algorithms for byte sequences are required, but the resulting ciphertexts are almost four times the size of the corresponding plaintexts. We present SECRET, the first secure, efficient, and collaborative real-time editor. In contrast to all previous works, SECRET is the first tool that (1.) allows the encryption of whole documents or arbitrary sub-parts thereof, (2.) uses a novel combination of tree-based OT with a structure preserving encryption, and (3.) requires only a modern browser without any extra software installation or browser extension. We evaluate our implementation and show that its encryption overhead is three times smaller in comparison to all previous approaches. SECRET can even be used by multiple users in a low-bandwidth scenario. The source code of SECRET is published on GitHub as an open-source project:https://github.com/RUB-NDS/SECRET/
TL;DR: A semi-automatic solution that is applied to introduce the semantics in the XML database and to enrich the answers to the queries, and which gave very encouraging preliminary results.
Abstract: The introduction of data semantics in various fields of science by referring to the ontological database is becoming more and more necessary. With the proliferation of domain ontologies and the large volume of data to be processed, it has become necessary to have data management systems based on ontological systems. Such a system can be exploited via the web as is the case with XML databases, which will allow us to: - use semantic databases via the Internet. - To enrich responses to XML queries by using domain terminology ontology. Also, XML present a flexible hierarchical model suitable to represent huge amounts of data with no absolute and fixed schema, In order to highlight the usefulness of introducing the semantics in the XML database and to enrich the answers to the queries, we have proposed a semi-automatic solution that we applied it for pharmaceutical databases, and which gave very encouraging preliminary results.
TL;DR: In this paper, a computer-implemented method for offloading extensible markup language (XML) data to a distributed file system may include receiving a command to populate an XML table of a database with XML tables.
Abstract: A computer-implemented method for offloading extensible markup language (XML) data to a distributed file system may include receiving a command to populate a distributed file system with an XML table of a database. The XML table may be queried in response to the command. The source data in the XML table may be offloaded, by a computer processor, to the distributed file system in response to the querying. The offloading may include converting the source data to a string version of the source data and converting the string version of the source data back into XML format.
TL;DR: Today, digital watermarking technology has emerged as an effective tool for relational databases and eXtensible Mark-up Language (XML) data in order to protect the copyright, detect tamper, trace traitor, and maintain the integrity of the data.
Abstract: Today, digital watermarking technology has emerged as an effective tool for relational databases and eXtensible Mark-up Language (XML) data in order to protect the copyright, detect tamper, trace traitor, and maintain the integrity of the data.
TL;DR: A query semantics Entity-Relationship Graph (ERG), which adopts the RDF subject-predicate-object semantics to capture the information of search entities along with associated attributes and the relationships between entities, is proposed.
Abstract: Keyword search in XML has gained popularity as it enables users to easily access XML data without the need of learning query languages and studying complex data schemas. In XML keyword search, query semantics is based on the concept of Lowest Common Ancestor (LCA), e.g., SLCA and ELCA. However, LCA-based search methods depend heavily on hierarchical structures of XML data, which may result in meaningless answers. To obtain desired answers, a successful system should be able to (i) match a semantic entity for each keyword, (ii) discover the relationships of the matched entities, (iii) support efficient query processing, (iv) release users from having the knowledge of the XML content, and (v) visualize the search results. None of the existing XML keyword search systems completely meet the above requirements. In this paper, we design a system called SpiderXto completely solves the above challenges. We propose a query semantics Entity-Relationship Graph (ERG), which adopts the RDF subject-predicate-object semantics to capture the information of search entities along with associated attributes and the relationships between entities. SpiderX proposes a novel index structure, which has small space cost by combining the optimizations of column databases and the data compression schemes. In addition, SpiderX processes queries in a bottom-up way to achieve high performance, which is about 100X faster than the state-of-the-art algorithms. To demonstrate the high performance of SpiderX, we implement an online demo for SpiderX, which operating on three real-life datasets. The demo also provides (1) query auto-completion to guide users to formulate queries; and (2) visualization panel to display the query answers, which interacts with users by providing zoom-in and zoom-out exploration features. Demo link: http://chunbinlin.com/spiderx.
TL;DR: A materialized view is built from an original document for each query, and auxiliary structures such as T-Bitmap and indexes are built to further accelerate query processing.
Abstract: With the widespread use of the eXtensible Markup Language (XML), more and more applications store and query XML documents in XML database systems. Thus, how to efficiently process a query and find the specified patterns conforming the query from XML documents is a crucial issue. In this paper, some processing methods are employed on XML documents to improve document retrieval. First, a materialized view is built from an original document for each query. Then, on each materialized view, auxiliary structures such as T-Bitmap and indexes are also built to further accelerate query processing. Finally, four experiments are conducted to show the superiority of the proposed approach.
TL;DR: This paper proposes an efficient storage scheme for XML documents called XQUICK, which exploits the high regularity of XML documents to compress the tree structure and describes a novel path-based querying approach that supports fast querying.
Abstract: Due to the inherent flexibility in both structure and semantics, XML documents are massive in nature. The ratio of the size of the XML document to the size of the text data in it is usually large. Apart from data values, the huge size of the XML document is contributed by its tree structure. The structure of the XML document tightly bounded with the data renders the original form of XML less efficient in terms of both time and space. The problem of designing a compressor for XML documents which facilitates both update and query operations has turned the attention of many. In this paper, we propose an efficient storage scheme for XML documents called XQUICK. XQUICK exploits the high regularity of XML documents to compress the tree structure. It also handles updates in an efficient manner with minimum space and time overhead. This paper also describes a novel path-based querying approach that supports fast querying. Additional mechanisms such as indexing are provided to elicit faster query processing. XQUICK can also be used in conjunction with standard parsers like DOM, SAX etc. Experimental results conform to the capabilities of proposed scheme.
TL;DR: Through researching the internal organizational structure of Word 2007 document, the significant component documents are extracted and the Java and XML technologies are adopted to extract the format mark and the element property of the documents as to further anatomize the document format.
Abstract: Through researching the internal organizational structure of Word 2007 document, the significant component documents are extracted, including document.xml, style.xml, header.xml and footer.xml. The compositions structure and the interaction of the common document formats in each XML file are analyzed. Accordingly on that basis, the Java and XML technologies are adopted to extract the format mark and the element property of the documents as to further anatomize the document format. Such combined technology shall be adopted to check the format of degree papers as to further boost the information progress of the education management for college and university.
TL;DR: This study proposes an improved XML digital signature with RSA algorithm, as a novel algorithmic framework that improves the authentication strength of XMLdigital signature in the B2C e-commerce in a cloud-based environment.
Abstract: The reliance of e-commerce infrastructure on cloud computing environment has undoubtedly increased the security challenges in web-based e-commerce portals. This has necessitated the need for a built-in security feature, essentially to improve the authentication mechanism, during the execution of its dependent transactions. Comparative analysis of the existing works and studies on XML-based authentication and non-XML signaturebased security mechanisms for authentication in Business to Consumer (B2C) e-commerce showed the advantage of using XML-based authentication, and its inherent weaknesses and limitations. It is against this background that this study, based on review and meta-analysis of previous works, proposes an improved XML digital signature with RSA algorithm, as a novel algorithmic framework that improves the authentication strength of XML digital signature in the B2C e-commerce in a cloud-based environment. Our future works include testing and validation, and simulation, of the proposed authentication framework in Cisco’s XML Management Interface with inbuilt feature of NETCONF. The evaluation will be done in conformity to international standard and guideline –such as W3C and NIST.
TL;DR: In this article, a technique on implementing printing application system report forms with B/S architecture based on XML document is introduced in detail, in order to solve this problem, performing data exchange using XML to come true printing is explored.
Abstract: Printing good-sized complicated practical report forms is one puzzle in application system with B/S architecture. In order to solve this problem, performing data exchange using XML to come true printing is explored. Windows Form in the framework of .NET is embedded in FrontPage. By self-programming, XML document is analyzed and client-side printing is controlled. Fundamental technique on implementing printing application system report forms with B/S architecture based on XML document is introduced in detail.
TL;DR: In this paper, basic ideas for mentioned mapping are presented, and this mapping is prerequisite for setting the future approach to XML schema quality measuring with object-oriented metrics, which can be used for measuring quality of UML models or XML Schemas.
Abstract: Measuring quality of IT solutions is a priority in software engineering. Although numerous metrics for measuring object-oriented code already exist, measuring quality of UML models or XML Schemas is still developing. One of the research questions in the overall research leaded by ideas described in this paper is whether we can apply already defined object-oriented design metrics on XML schemas based on predefined mappings. In this paper, basic ideas for mentioned mapping are presented. This mapping is prerequisite for setting the future approach to XML schema quality measuring with object-oriented metrics.
TL;DR: In this paper, an XML encryption assembly, an XML signature assembly and an access control assembly are combined for use to form a set of communication encryption transmission mechanism for simple object access protocols, in combination with XML encryption and XML signature.
Abstract: The invention discloses a network encryption transmission method based on XML and a system. According to the method, an XML encryption assembly, an XML signature assembly and an access control assembly are combined for use to form a set of communication encryption transmission mechanism for simple object access protocols, a widely-supported XML standard characteristic is utilized, in combination with XML encryption and XML signature, and encryption and signature processing on message transmission is carried out. The method is advantaged in that message end-to-end safety protection can be realized, message confidentiality, message integrity, non repudiation, identity verification and authorization are guaranteed, access control safety protection for messages can be realized, encryption efficiency is high, and system performance requirements are low.