TL;DR: In this article , various developments in the era of multi-signature schemes as well as its application areas are discussed, and the authors discuss the need of studying various multisignature schemes.
Abstract: Extensible markup language (XML) is a widely used data exchange format in various fields for data transmission over the web. But due to security risk, the privacy of XML documents cannot be ensured, and hence sharing of data had become the major challenge. Web vulnerabilities cause threats to XML data confidentiality and integrity. There is a set of security mechanisms such as encryption, decryption, digital signature, and validations, which are applied to XML documents. Signing a document is the commonly used method for ensuring the authentication, non-repudiation, and integrity in which only a single signer is responsible for signing a document. But nowadays, the responsibility of signing a document is shared among multiple signers instead of a single signer which results in increasing the popularity of using multi-signature schemes. Therefore, a document may have multiple signatures and the integrity of the entire document depends upon it. This arises the need of studying various multi-signature schemes. Here, in this paper, various developments in the era of multi-signature schemes as well as its application areas are discussed.
TL;DR: In this article , a parallel approach on XML parsing based on NEM-XML is presented, and the experimental results show that their parallel XML parsing algorithm improves XML parsing performance significantly and scales well.
Abstract: As the de facto data representation and data exchange standard, XML has become very popular over Internet. How to improve XML parsing performance is the key to promote its further development and application. Parallel computing is a key technology for solving problems with huge computation. This paper presents a parallel approach on XML parsing based on NEM-XML. The experimental results show that our parallel XML parsing algorithm improves XML parsing performance significantly and scales well.
TL;DR: In this paper , an XLink extension "dbxlink" has been proposed, which allows for modeling interlinked XML instances as integrated views where XLinks are resolved in a transparent way.
Abstract: XML (eXtensible Markup Language) is the de-facto standard for exchanging information and for representing data in the World Wide Web. In contrast to the document-centric perspective given by the well-known language HTML which defines the human-readable content and the layout of web pages, XML offers more flexibility and expressiveness.XML documents are not required to be self-contained but may rather have links to other XML resources. For expressing such links between XML documents, the W3C (World Wide Web Consortium) proposed XLink - but mainly for browsing purposes. If the linked documents are considered from the data-centric viewpoint, it shows that XLink does not specify how the referenced instances should be handled. Especially, it is not possible to query along links though the W3C XML Query (XQuery) Requirements explicitly state that this has to be guaranteed.In order to cope with these issues, an XLink extension "dbxlink" has been proposed. It allows for modeling interlinked XML instances as integrated views where XLinks are resolved in a transparent way. In particular, it is possible to query these instances with XPath and XQuery.In this work, the dbxlink model is described and it is investigated how to query distributed XML instances interlinked with a simple kind of XLinks according to this approach. Different strategies are analyzed and emerging problems like the handling of cyclic instances are treated. It is shown how to extend XPath-based query systems in order to be able to handle queries wrt. dbxlink. Furthermore, optimizing techniques like special caching strategies are proposed. The results of these investigations have been used to conduct a proof-of-concept implementation of the dbxlink approach as an extension to the open source XML database system eXist.
TL;DR: In this article , the authors proposed an efficient prefix-based labeling scheme that uses a hexagonal pattern, which avoids the need for node relabeling when XML documents are updated at random locations, avoids duplicated labels by creating a new label for every inserted node, and reduces the size and time costs of the updated labels.
Abstract: To improve XML query processing, it is necessary to label XML documents efficiently for the indexing process because it allows the structural relationships between the XML nodes to be preserved without having to access the original document. However, XML data on the Web is updated as time passes, which means that the dynamic updating of XML data is an issue that may need to be handled by a XML labeling scheme specifically designed for dynamic updates. Previous XML labeling schemes have limitations when updates take place. For example, a lot of node labels need to be relabeled, a lot of duplicate labels occur during this relabeling process, and the size and time costs of the updated labels are high. Therefore, this paper proposes an efficient prefix-based labeling scheme that uses a hexagonal pattern. The proposed labeling scheme has three main advantages: (i) it avoids the need for node relabeling when XML documents are updated at random locations, (ii) it avoids duplicated labels by creating a new label for every inserted node, and (iii) it reduces the size and time costs of the updated labels. The proposed scheme is evaluated against the three most recent prefix-based labeling schemes in terms of the size and time costs of the updated labels. In addition, the ability of the proposed labeling scheme to handle several updates (such as insertions) in XML documents is also evaluated. The evaluations show that the proposed labeling scheme outperforms previously developed prefix-based labeling schemes in terms of both size and time costs, particularly for large-scale XML datasets, resulting in improved query processing performance. Moreover, the proposed scheme efficiently supports frequent updates at arbitrary positions. The paper concludes with several suggestions for further research.
TL;DR: In this article , the authors proposed two methods: vectorization representation of XML documents and further feature extraction, and the experiment results show that the method of all path feature extraction for XML document can represent the main feature of XML document effectively, and is an important work for handling XML documents of power grid efficiently.
Abstract: XML document stores the information of the new power system source load interaction. It has the characteristics of self-description, extensibility, structure, and content, which makes it widely used. Improving the method of extracting elements from XML documents is very helpful in solving the problem of distributed object operation measurement in the power grid. To classify or analyze XML documents better, based on the theoretical analysis of principal component analysis and the study of the text representation model, this paper proposes two methods: vectorization representation of XML documents and further feature extraction. The experiment result shows that the method of all path feature extraction for XML document can represent the main feature of XML document effectively, and is an important work for latter handling XML documents of power grid efficiently
TL;DR: This study focuses on constructing a separate XML document validator and validating XML documents against the defined XSD rules and the critical differences between XSD and DTD.
Abstract: Extensible Markup Language (XML) is a markup language that is developed to organize the structure of information in a text file. The data in XML formatted documents are represented by specifying a number of tags and determining the structural relationship between those tags. It has a simple structure and can be handled by any text editor. Therefore, XML formatted data is being commonly used to transfer and share data between different applications and organizations without having to convert the format of the data (Yang, 2019).
In the XML world, “well-formed” and “valid” are the two most frequently used terms. A well-formed XML document is free from errors that can cause the document to not parse, such as: spelling, punctuation, grammar, and syntax errors. While in addition to having a well-formed markup, a valid XML must conform to a document type definition, this means the document must be semantically correct and matches a described standard of schemas and relationships (Appel, 2020).There are two standards of document type definition that can be used to validate an XML document, one is DTD or Document Type Definition which is used to identify the legal structure and names the legal elements of an XML document (Dykes and Tittel, 2011), and the other is XSD or XML Schema Definition. XSD is a diagrammatic representation that defines the valid structure of an XML document, it enables specifying the building blocks of an XML data set such as elements and attributes and their data types, number of child elements, fixed and default values of the elements and attributes that can appear in the documents (XML Schema Tutorial, 2020). In some applications the process of validating XML documents is combined with parsing the document. However, in some other cases the process of parsing and validating the XML documents need to be separated. This study focuses on constructing a separate XML document validator and validating XML documents against the defined XSD rules. A Java program is used to perform this experiment. Furthermore, the critical differences between XSD and DTD are also mentioned.
TL;DR: In this article , the authors present a formal description of Simple Link and Extended Link semantics, based on a specification as an abstract data type (ADT), and providing Extended Links with a 3rd Party Link semantics.
Abstract: XML (short for eXtensible Markup Language) is a meta-language for the representation of digital data. XML has had an enormous impact on modern computer science and IT industry since its advent in 1997, for several reasons: XML is simple and easily accessible. Using Unicode as encoding, XML can be viewed and authored/edited with common text editors, and due to the context-free and well-formed structure of XML document types, it is easy to provide efficient parsers for processing XML documents. Also, XML"s concept of definable document types enables for a structured representation of almost arbitrary digital data, with the document type modeling the domain of the data, which makes XML a very powerful and flexible standard for data representation, particularly regarding the Web. The XLink standard is an extension to XML for defining references between XML documents, inspired by the hyperlink concept from hypertext. XLink defines two types of links: Simple Links are unidirectional links from one document to another, similar to HTML hyperlinks. Extended Links create graph-based relationships (arcs) between portions of XML (resources) over multiple XML documents. Within the LinXIS project, models and query evaluation for XLink have been investigated: in a logical data model, a Simple Link is given the semantics of an embedded view that "imports" the referenced data from a remote document into the link-defining document. The participating XML data, together with the Simple Links define a virtual instance (a single-document view on the distributed data) according to the logical data model. Extended Links define relations between XML resources, but in contrast to Simple Links, they are not defined inside the participating resources but apart of them. This allows to define a semantics for Extended Links, with an Extended Link defining views that combine and extend the participating resources from a 3rd party perspective, without need for write access to them, and thus extending the Simple Links logical data model. The above described logical data model provides a semantics for the evaluation of XPath queries over distributed XML data: A query may be evaluated not on a (physical) XML document, but on the virtual instance defined by the given Simple and Extended Links. The query evaluation may "follow" along a Simple Link, continuing the evaluation process on the referenced, physically remote data. For Extended Links, queries can be evaluated on the integrated view combining the sources referenced by an Extended Link, based on the 3rd party semantics of the link. A previous PhD thesis, which also emerged from the LinXIS project, introduced the data model for Simple Links and investigated techniques and algorithms for XPath query evaluation on the linked XML data. As part of the work, the data model was implemented on base of the Open Source XML database system eXist, thus creating a Simple-Link-enhanced XML database prototype. The present work extends the focus from Simple to Extended Links: The work includes a formal description of both Simple Link and Extended Link semantics, based on a specification as an abstract data type (ADT), and providing Extended Links with a 3rd Party Link semantics. Also, the basic concepts for query evaluation with respect to 3rd Party Links are investigated. The algorithms as well as the logical data model for 3rd Party Links are implemented by further enhancement of the eXist-based prototype, providing the query evaluation unit with that semantics. The prototype is tested within a case study, evaluating the prototype"s functional behavior and performance. The case study is followed by a discussion of the proposed 3rd Party Link approach, addressing its applicability in terms of its design, performance and its relevance within a rapidly evolving Web infrastructure. The work is completed by a conclusion addressing the previously discussed issues, and giving an overview over related research as well as over perspectives and further work.
TL;DR: The experimental results show that the scheme not only effectively protects XML sensitive data but also reduces the storage pressure on the server side; at the same time, from the response time, the authors know that it is beneficial for the rapid search and information positioning.
Abstract: In order to protect the sensitive data represented as XML documents in a trusted collaborative system where sensitive data are not shared, an XML privacy-preserving data disclosure decision scheme was proposed under the assumption of a trusted server. This scheme is inspired by the idea of separating storage structure and content. Temporary access matrix is used to represent structure authorization and the vector represents the content authorization of leaf node. According to the conversion rules, access matrix not only represents access authorization of all nodes but also keeps the main structure of the XML document. With the combination of the vector and matrix, it can provide different access views for different group users with different purposes. In addition, start-end encoding is used to encode all the nodes for locating nodes and the content; privilege matrix solves the problem of privacy synchronization change for all users. At the same time, authentication polynomials are used to verify different users and improve the security level. The experimental results show that the scheme not only effectively protects XML sensitive data but also reduces the storage pressure on the server side; at the same time, from the response time, we know that it is beneficial for the rapid search and information positioning.
Abstract: A distributed XML document is an XML document that spans several machines. We assume that a distribution design of the document tree is given, consisting of an XML kernel-document T[f1,...,fn] where some leaves are "docking points" for external resources providing XML subtrees (f1,...,fn, standing, e.g., for Web services or peers at remote locations). The top-down design problem consists in, given a type (a schema document that may vary from a DTD to a tree automaton) for the distributed document, "propagating" locally this type into a collection of types, that we call typing, while preserving desirable properties. We also consider the bottom-up design which consists in, given a type for each external resource, exhibiting a global type that is enforced by the local types, again with natural desirable properties. In the article, we lay out the fundamentals of a theory of distributed XML design, analyze problems concerning typing issues in this setting, and study their complexity.