TL;DR: A large-scale analysis of 30 different XML parsers of six different programming languages and an evaluation framework that applies different variants of 17 XML parser attacks to provide a valuable insight into a parser's configuration is conducted.
Abstract: The Extensible Markup Language (XML) has become a widely used data structure for web services, Single-Sign On, and various desktop applications. The core of the entire XML processing is the XML parser. Attacks on XML parsers, such as the Billion Laughs and the XML External Entity (XXE) Attack are known since 2002. Nevertheless even experienced companies such as Google, and Facebook were recently affected by such vulnerabilities.
In this paper we systematically analyze known attacks on XML parsers and deal with challenges and solutions of them. Moreover, as a result of our in-depth analysis we found three novel attacks.
We conducted a large-scale analysis of 30 different XML parsers of six different programming languages. We created an evaluation framework that applies different variants of 17 XML parser attacks and executed a total of 1459 attack vectors to provide a valuable insight into a parser's configuration. We found vulnerabilities in 66 % of the default configuration of all tested parses. In addition, we comprehensively inspected parser features to prevent the attacks, show their unexpected side effects, and propose secure configurations.
TL;DR: A new mapping approach, known as XAncestor, which consists of two algorithms: an XML mapping algorithm (XtoDB) and a query mapping algorithm that translates XPath queries into corresponding SQL queries based on the constructed RDB in order to reduce the query response time.
Abstract: XML has become a common language for data exchange on the Web, so it needs to be managed effectively. There are four central problems in XML data management: capture, storage, retrieval, and exchange. Even though numerous database systems are available, the relational database (RDB) is often used to store and query the content of XML documents. Therefore the processes of mapping from XML to RDB and vice versa occur frequently. Numerous researchers have proposed approaches to map hierarchically structured XML documents into the tabular format of a RDB. However, the previously developed approaches have faced problems in terms of storage and query response time. If the design of a RDB is inefficient, the number of join operations between tables increases when a query is executed, which affects the query response time. To overcome this limitation, this paper proposes a new mapping approach, known as XAncestor, which consists of two algorithms: an XML mapping algorithm (XtoDB) and a query mapping algorithm (XtoSQL). XtoDB maps XML documents to a fixed RDB with less storage space. XtoSQL translates XPath queries into corresponding SQL queries based on the constructed RDB in order to reduce the query response time i.e., the time taken to execute the translated SQL query. XAncestor is then developed as a prototype in order to test its effectiveness. The results of XAncestor are compared with those produced by five similar approaches. The comparison proves that XAncestor performs better than the previously developed approaches in terms of effectiveness and scalability. The correctness of XAncestor is also verified. The paper concludes with some recommendations for further work.
TL;DR: Two algorithms that are based on either traditional inverted lists or newly proposed LLists to improve the overall performance and several algorithms based on hash search to simplify the operation of finding CA nodes from all involved LLists are proposed.
Abstract: Efficiently answering XML keyword queries has attracted much research effort in the last decade. The key factors resulting in the inefficiency of existing methods are the common-ancestor-repetition (CAR) and visiting-useless-nodes (VUN) problems. To address the CAR problem, we propose a generic top-down processing strategy to answer a given keyword query w.r.t. LCA/SLCA/ELCA semantics. By “ top-down ”, we mean that we visit all common ancestor (CA) nodes in a depth-first, left-to-right order; by “ generic ”, we mean that our method is independent of the query semantics. To address the VUN problem, we propose to use child nodes, rather than descendant nodes to test the satisfiability of a node $v$ w.r.t. the given semantics. We propose two algorithms that are based on either traditional inverted lists or our newly proposed LLists to improve the overall performance. We further propose several algorithms that are based on hash search to simplify the operation of finding CA nodes from all involved LLists. The experimental results verify the benefits of our methods according to various evaluation metrics.
TL;DR: The proposed technique proved that using xml entities and XSLT transforms is more efficient in terms of coding effort and deployment complexity when compared to mapping the schema using object oriented scripting language such as C#.
Abstract: This paper proposed xml entities based architectural implementation to improve integration between multiple third party vendor software systems with incompatible xml schema. The xml entity architecture implementation showed that the lines of code change required for mapping the schema between in house software and three other vendor schema, decreased by 5.2%, indicating an improvement in quality. The schema mapping development time decreased by 3.8% and overall release time decreased by 5.3%, indicating an improvement in productivity. The proposed technique proved that using xml entities and XSLT transforms is more efficient in terms of coding effort and deployment complexity when compared to mapping the schema using object oriented scripting language such as C#.
TL;DR: A security specification model (SecFHIR) is proposed to support the development of intuitive policy schemes that are mapping directly to the healthcare environment and efficiently simplify the security administration and achieve fine-grained access control.
Abstract: Patients taking medical treatment in distinct healthcare institutions have their information deeply fragmented between very different locations. All this information --- probably with different formats --- may be used or exchanged to deliver professional healthcare services. As the exchange of information/ interoperability is a key requirement for the success of healthcare process, various predefined e-health standards have been developed. Such standards are designed to facilitate information interoperability in common formats. Fast Healthcare Interoperability Resources (FHIR) is a newly open healthcare data standard that aims to providing electronic healthcare interoperability. FHIR was coined in 2014 to address limitations caused by the ad-hoc implementation and the distributed nature of modern medical care information systems. Patient’s data or resources are structured and standard in FHIR through a highly readable format such as XML or JSON. However, despite the unique features of FHIR, it is not a security protocol, nor does it provide any security-related functionality. In this paper, we propose a security specification model (SecFHIR) to support the development of intuitive policy schemes that are mapping directly to the healthcare environment. The formal semantics for SecFHIR are based on the well-established typing and the independent platform properties of XML. Specifically, patients’ data are modeled in FHIR using XML documents. In our model, we assume that these XML resources are defined by a set of schemes. Since XML Schema is a well-formed XML document, the permission specification can be easily integrated to the schema itself, then the specified permissions are applied to instance objects without any change. In other words, our security model (SecFHIR) defines permissions on XML schemes level, which implicitly specify the permissions on XML resources. Using these schemes, SecFHIR can combine them to support complex constraints over XML resources. This will result in reusable permissions, which efficiently simplify the security administration and achieve fine-grained access control. We also discuss the core elements of the proposed model, as well as the integration with the FHIR framework
TL;DR: This paper presents a model driven approach for creating preliminary GUI source code of Android application from windows navigation diagrams using XML and Java files for model transformation to promote the reusability due to the use of UI layout template.
Abstract: In recent years, the growth of smartphone market has led to the increasing development of mobile application. The rapid approach of mobile application development would respond to the market growth. This paper presents a model driven approach for creating preliminary GUI source code of Android application from windows navigation diagrams. The input diagram is converted to XML used as the metadata for model transformation. The final results of XML and Java files will be obtained for each UI layout where the XML file contains UI element information, and the Java file contains the actions. The proposed methodology would promote the reusability due to the use of UI layout template. The automation with model transformation would also ensure the integrity of interfaces generated from the design with windows navigation diagrams.
TL;DR: An SDN labeling algorithm and a distributed hierarchical index using DHTs are introduced and the efficiency and effectiveness of the proposed parallel XML data approach using Hadoop are shown.
Abstract: MapReduce is a widely adopted computing framework for data-intensive applications running on clusters. This paper proposed an approach to exploit data parallelisms in XML processing using MapReduce in Hadoop. The authors' solution seamlessly integrates data storage, labeling, indexing, and parallel queries to process a massive amount of XML data. Specifically, the authors introduce an SDN labeling algorithm and a distributed hierarchical index using DHTs. More importantly, an advanced two-phase MapReduce solution are designed that is able to efficiently address the issues of labeling, indexing, and query processing on big XML data. The experimental results show the efficiency and effectiveness of the proposed parallel XML data approach using Hadoop.
TL;DR: A survey on keyword search over XML document mainly focuses on the topics of defining semantics for XML keyword search and the corresponding algorithms to find answers based on these semantics.
Abstract: Since XML has become a standard for information exchange over the Internet, more and more data are represented as XML. XML keyword search has been attracted a lot of interests because it provides a simple and user-friendly interface to query XML documents. This paper provides a survey on keyword search over XML document. We mainly focus on the topics of defining semantics for XML keyword search and the corresponding algorithms to find answers based on these semantics. We classify existing works for XML keyword search into three main types, which are tree-based approaches, graph-based approaches and semantics-based approaches. For each type of approaches, we further classify works into sub-classes and especially we summarize, make comparison and point out the relationships among sub-classes. In addition, for each type of approach, we point out the common problems they suffer
TL;DR: This book describes the popular XML and JSON data-interchangelanguages and learns how to parse/create XML-based documents and parse JSON-based Documents via various Java APIs viaVarious Java APIs.
Abstract: This book describes the popular XML and JSON data-interchangelanguages. You'll explore each language and learn how to parse/create XML-based documents and parse JSON-based documents via various Java APIs. You will also learn how XML and JSON are applied and used in AJAX (and AJAJ), Android, Big Data, and Web Services contexts, all from the Java perspective. Each chapter ends with select exercises designed to challenge yourgrasp of the chapter's content. An appendix provides the answers to these exercises. A second appendix presents a list of developer questions about XML and JSON along with my answers to these questions. What you'll learn How to use Java, JSON and XML together to build services, big data How to use XML; parse XML documents with SAX, DOM, StAX; selecting nodes with XPath; and transform XML documents with XSLT What is JSON and how to explore parsing JSON content with Google GSON, Jackson, Quick JSON How to roll your own JSON APIs How to use XML and JSON with Ajax, Android, big data and web services Who this book is for This book is for intermediate or advanced Java programmers/developers.
TL;DR: An XML file format for storing data from computations in algebra and geometry is described and a formal specification based on a RELAX-NG schema is presented.
Abstract: We describe an XML file format for storing data from computations in algebra and geometry. We also present a formal specification based on a RELAX-NG schema.
TL;DR: This chapter discusses the basic concepts of XML and ontologies, as well as their clinical applications, and the creation of these standardized repositories greatly facilitates clinical research in related fields.
Abstract: The development of information technology has resulted in its penetration into every area of clinical research. Various clinical systems have been developed, which produce increasing volumes of clinical data. However, saving, exchanging, querying, and exploiting these data are challenging issues. The development of Extensible Markup Language (XML) has allowed the generation of flexible information formats to facilitate the electronic sharing of structured data via networks, and it has been used widely for clinical data processing. In particular, XML is very useful in the fields of data standardization, data exchange, and data integration. Moreover, ontologies have been attracting increased attention in various clinical fields in recent years. An ontology is the basic level of a knowledge representation scheme, and various ontology repositories have been developed, such as Gene Ontology and BioPortal. The creation of these standardized repositories greatly facilitates clinical research in related fields. In this chapter, we discuss the basic concepts of XML and ontologies, as well as their clinical applications.
TL;DR: This paper argues that the LCA based techniques still require users to be well versed with the XML schema and also the data to be able to obtain meaningful query results, and presents a novel system, Generic Keyword Search (GKS), which returns ‘meaningful’ information from any XML node, which contains a subset of keywords in the search query Q.
Abstract: XML and JSON have become the default formats to exchange the information for web application or within enterprises. Keyword Search over XML data has been motivated by the need to relieve users from writing difficult XQueries since otherwise users are required to know the complex XML schema. In existing XML keyword search techniques the XML nodes returned for a keyword query are the Lowest Common Ancestor (LCA) nodes for the query keywords. In this paper, we argue that the LCA based techniques still require users to be well versed with the XML schema and also the data to be able to obtain meaningful query results. To address these shortcomings, we present a novel system, Generic Keyword Search (GKS), for a given keyword query Q, instead of identifying (and returning information) only from LCA nodes, GKS returns ‘meaningful’ information from any XML node, which contains a subset of keywords in the search query Q. GKS response includes LCA nodes, if any, that would have been returned by LCA based techniques. GKS is also able to find highly relevant keywords and XML schema elements, deeper analytical insights called DI in the XML data in the context of the user query. DI enables users to navigate the XML data and to refine their queries even if they are not familiar with the data and the schema. Our experiments on real data sets show that GKS is able to return highly relevant responses to keyword queries efficiently.
TL;DR: This paper identifies the problems of existing keyword search methods and points out that the main reason of these problems is due to the unawareness of the Object-Relationship-Attribute (ORA) semantics in XML/RDB, and proposes an ORA-Semantics based keyword search in XML and RDB.
Abstract: Keyword search in XML and relational databases (RDB) has gained popularity as it provides a user-friendly way to explore structured data. Existing works on XML and RDB keyword search only rely on the structures of XML/RDB data and/or schemas, and this causes serious problems of returning incomplete answers, meaningless answers and overwhelming answers. In this paper, we identify the problems of existing keyword search methods and point out that the main reason of these problems is due to the unawareness of the Object-Relationship-Attribute (ORA) semantics in XML/RDB. We exploit the ORA semantics in XML and RDB, and capture these semantics by constructing the Object tree for XML, and the Object-Relationship-Mixed (ORM) data graph for RDB, respectively. Based on the Object tree and the ORM data graph, we propose an ORA-Semantics based keyword search in XML and RDB. Our semantic approach can avoid the problems of existing methods and improves the completeness and correctness of keyword search. In addition, we extend the keyword query language to include keywords that match the metadata, i.e., the names of tags in XML and the names of relations and attributes in RDB. These keywords reduce the ambiguities of queries and enable us to infer user' search intention more precisely. Finally, we incorporate aggregate functions and GROUPBY into keyword queries to retrieve statistical information from XML and RDB.
TL;DR: This paper presents architecture of querying process of spatiotemporal data, and illustrates how to query spatiotmporal data using XQuery, and investigates query result processing by listing three query examples to show the practicality and compatibility by using X query.
Abstract: With the rapid development of the Internet, XML is rapidly emerging and has been the de-facto standard for representing and exchanging data on the Web due to its simplicity, readability, and portability. Researches on spatiotemporal data based on XML received much attention since a considerable amount of data emerging in spatiotemporal applications both from academia and industry. However, although XML has been employed to model and handle spatiotemporal data, the study of spatiotemporal XML data has only recently started and still merits further attention. In this paper, we study spatiotemporal operations using XQuery. After presenting architecture of querying process of spatiotemporal data, we illustrate how to query spatiotemporal data using XQuery. Furthermore, we investigate query result processing by listing three query examples to show the practicality and compatibility by using XQuery.
TL;DR: This paper proposes a methodology to convert a XML schema respecting a DTD (Document Type Definition) into a schema of object-relational model, which is reversible so that the result of conversion can be used to rebuild the initial XML schema.
Abstract: XML is a standard for data exchanging between sites and heterogeneous applications. To exploit these data by database systems based on relational model, algorithms and methods of conversion have been developed. To do same with object-relational systems representing an extension of relational systems we propose in this paper a methodology to convert a XML schema respecting a DTD (Document Type Definition) into a schema of object-relational model. This methodology is reversible so that the result of conversion can be used to rebuild the initial XML schema.
TL;DR: S2CX is presented, an approach that allows to efficiently evaluate SQL/XML queries on any relational database system, no matter whether it supports SQL/ XML or not, and whose approach to query evaluation scales better, i.e., the larger the dataset, the faster is the approach compared to SQL/xML query evaluation in Oracle 11 g and in DB2.
TL;DR: It turns out that projection can speed up the evaluation of navigational XPath queries on Xml streams by a factor of 4i¾?in average on the usual XPath benchmarks.
Abstract: We present an evaluator for navigational XPath on Xml streams with projection. The idea is to project away those parts of an Xml stream that are irrelevant for evaluating a given XPath query. This task is relevant for processing Xml streams in general since all Xml standard languages are based on XPath. The best existing streaming algorithm for navigational XPath queries runs nested word automata. Therefore, we develop a projection algorithm for nested word automata, for the first time to the best of our knowledge. It turns out that projection can speed up the evaluation of navigational XPath queries on Xml streams by a factor of 4i¾?in average on the usual XPath benchmarks.
TL;DR: This paper presents TwigStack-MR, which simultaneously processes several twig pattern queries for a massive volume of XML data based on MapReduce framework, and uses the MapReduced framework, full characteristics of distributed environments, to process twig query efficiently.
Abstract: Twig pattern query is the core operation of XML process, which directly affects the efficiency of XML data query. It is a challenge to manipulate massive XML data, especially on distributed cluster, such as how to effectively ensure the completeness and correctness of the query results, and minimize communication costs between the various machines. In this paper, we present TwigStack-MR, which simultaneously processes several twig pattern queries for a massive volume of XML data based on MapReduce framework. We first split the large scale XML data file into file-splits as input to the distributed storage system. Then we present the distributed twig algorithm, processing different subtrees of the document tree in parallel. Finally we use the MapReduce framework, full characteristics of distributed environments, to process twig query efficiently. The experimental results show that our approach is efficient and scalable on this issue.
TL;DR: A model and algebra containing logical structure of spatiotemporal database, data type system, and querying operations are proposed and shown, showing that the model andgebra lay a firm foundation for managing spatiotsemporal XML data.
Abstract: A formal algebra is essential for applying standard database-style query optimization to XML queries. We propose a spatiotemporal XML data model and develop such an algebra based on Native XML, for manipulating spatiotemporal XML data. After studying NXD spatiotemporal database and query framework, formal representation of spatiotemporal query algebra is investigated, containing logical structure of spatiotemporal database, data type system, and querying operations. It shows that the model and algebra lay a firm foundation for managing spatiotemporal XML data.
TL;DR: XML-based publish/subscribe (pub/sub) systems have been receiving a great deal of attention from the academic community and the industry as mentioned in this paper, however, not much research has considered using the system or the communication model in the context of XML publication messages delivery.
TL;DR: This paper proposes an approach for managing changes to XML namespaces defined in XML Schemas, and their effects on XML documents that are valid to these schemas, while keeping track of all XML schema and XML instance versions.
Abstract: In XML databases, several works have dealt with changes of basic components of XML Schemas: element and attribute declarations, simple types, and complex type definitions. However, there is no work that has dealt with changes to advanced concepts of XML Schemas like XML namespaces, local/global qualified/unqualified declarations, and schema definition styles. In this paper, we deal with XML namespace evolution. To the best of our knowledge, we are the first to study such a topic (and in an environment that supports schema versioning). More precisely, we propose an approach for managing changes to XML namespaces defined in XML Schemas, and their effects on XML documents that are valid to these schemas, while keeping track of all XML schema and XML instance versions.
TL;DR: A temporal extension of the W3C XQuery Update Facility (XUF) language, named tauXUF (Temporal XUF), which allows manipulating temporal XML data in tauZSchema, and both the syntax and the semantics of the update expressions of the XUF language are extended to support temporal aspects.
Abstract: Although temporal XML data are being stored and manipulated by several XML-based applications in different domains (e.g., e-commerce, e-health), there is neither a temporal XML update language proposed by researchers nor built-in support provided by existing XML DBMSs and tools, for maintaining such data. Furthermore, in the well known temporal XML framework tauXSchema, there are no features for inserting, deleting or updating temporal XML instances. In this paper, we bridge these gaps by proposing a temporal extension of the W3C XQuery Update Facility (XUF) language, named tauXUF (Temporal XUF), which allows manipulating temporal XML data in tauXSchema. With tauXUF both the syntax and the semantics of the update expressions of the XUF language are extended to support temporal aspects. Examples are also provided to motivate and illustrate our proposal.
TL;DR: The xml data mining models methods and applications is universally compatible with any devices to read and is available in the book collection an online access to it is set as public so you can download it instantly.
Abstract: Thank you for downloading xml data mining models methods and applications. As you may know, people have search hundreds times for their favorite novels like this xml data mining models methods and applications, but end up in harmful downloads. Rather than reading a good book with a cup of tea in the afternoon, instead they juggled with some harmful bugs inside their laptop. xml data mining models methods and applications is available in our book collection an online access to it is set as public so you can download it instantly. Our books collection spans in multiple locations, allowing you to get the most less latency time to download any of our books like this one. Kindly say, the xml data mining models methods and applications is universally compatible with any devices to read.
TL;DR: A flexible method to enrich and populate an existing OWL ontology from XML data based on RDF rules that allows users to reuse rules for other conversions and populations.
Abstract: The paper presents a flexible method to enrich and populate an existing OWL ontology from XML data based on RDF rules. Theses rules are defined in order to populate automatically the new version of the OWL ontology. Basic rules are defined to identify elements in XML schemas and an OWL schema. Advanced mapping rules are based on basic rules in order to define the mapping between XML schemas elements and OWL schema elements. In addition, this flexible method allows users to reuse rules for other conversions and populations.
TL;DR: This work introduces an SDN labelling algorithm and a distributed hierarchical index using DHTs, and develops an efficient data retrieval approach called B-SLCA, an advanced two-phase MapReduce solution that is able to efficiently address the issues of labelling, indexing, and query processing on big XML data.
Abstract: MapReduce is a widely adopted computing framework for data-intensive applications running on clusters. We propose an approach to exploit data parallelisms in XML processing using MapReduce in Hadoop. Our solution seamlessly integrates data storage, labelling, indexing, and parallel queries to process a massive amount of XML data. Specifically, we introduce an SDN labelling algorithm and a distributed hierarchical index using DHTs, we develop an efficient data retrieval approach called B-SLCA. More importantly, we design an advanced two-phase MapReduce solution that is able to efficiently address the issues of labelling, indexing, and query processing on big XML data. We implemented our solution on a real-world Hadoop cluster processing the real-world datasets. Our experimental results show that SDN outperforms NCIM by up to a factor of 1.36 with an average of 1.17, our BSLCA outperforms BwdSLCA by up to a factor of 1.96 with an average of 1.2.
TL;DR: This paper implements the method to identify DO and sibling relationship using EDC and SDC labels for various real-time XML documents and results show the identification of DO andibling relationship using S DC labels performs better than EDC labels for processing XML queries.
Abstract: XML emerged as a de-facto standard for data representation and information exchange over the World Wide Web. By utilizing document object model DOM, XML document can be viewed as XML DOM tree. Nodes of an XML tree are labeled to uniquely identify every node by following a labeling scheme. This paper proposes a method to efficiently identify the two structural relationships namely document order DO and sibling relationship that exist between the XML nodes using two secure labeling schemes specifically enhanced Dewey coding EDC and secure Dewey coding SDC. These structural relationships influence the performance of XML queries so they need to be identified in efficient time. This paper implements the method to identify DO and sibling relationship using EDC and SDC labels for various real-time XML documents. Experiment results show the identification of DO and sibling relationship using SDC labels performs better than EDC labels for processing XML queries.
TL;DR: The workflow implemented to convert a dictionary saved as a PDF file into an XML document and posterior importation into a XML aware database, and the process to edit, add and delete new entries is described.
Abstract: In this article we describe the workflow implemented to convert a dictionary saved as a PDF file into an XML document and posterior importation into an XML aware database, and the process to edit, add and delete new entries. The conversion process was challenging given the format of the PDF file, and the fine grained detail of the XML schema that was used. For that, an iterative filtering approach was used. To store the dictionary we decided to use an XML aware database (eXist-DB), that stores each dictionary entry as a separate resource. It can be queried used a web interface developed using XQuery. The lexicographers can edit entries using the oXygen XML editor, reading and storing them directly in the database. In order to guarantee incremental backups, it was defined a mechanism to import the XML database into a GIT repository. Finally, a couple of programs were created in order to prepare regular reports on the dictionary revision process, as well as to backup it in a GIT repository.
TL;DR: This work proposes two new approaches for XML data compression and compares their solutions with three algorithms: WAP Binary Extensible Markup Language (WBXML), Xmill and Efficient XML Interchange (EXI).
Abstract: Integration of information systems is essential to organizations. Therefore, it is necessary to make different technologies interoperate. Extensible Markup Language (XML) is often used for data exchange because it is self-descriptive and platform-independent. However, XML is a verbose language which may bring problems related to the size of documents. This work proposes two new approaches for XML data compression and compares our solutions with three algorithms: WAP Binary Extensible Markup Language (WBXML), Xmill and Efficient XML Interchange (EXI). The comparison is based on compression rate and compression time for files with different sizes.
TL;DR: The paper advocates against a direct approach based on XQuery, and proposes a more powerful strategy that first extracts a structured representation of music notation from score encodings, and then manipulates this representation in closed form with dedicated operators.
Abstract: The paper addresses issues related to the design of query languages for searching and restructuring collections of XML-encoded music scores. We advocate against a direct approach based on XQuery, and propose a more powerful strategy that first extracts a structured representation of music notation from score encodings, and then manipulates this representation in closed form with dedicated operators. The paper exposes the content model, the resulting language, and describes our implementation on top of a large Digital Score Library (DSL).