Top 25 papers published in the topic of Streaming XML in 2022

Showing papers on "Streaming XML published in 2022"

Proceedings Article•10.4242/balisagevol27.kay01•

XSLT Extensions for JSON Processing

[...]

30 Jul 2022

TL;DR: XSLT 3.0 as discussed by the authors contains basic facilities for transforming JSON as well as XML, but looking at actual use cases, it is clear that some things are a lot harder than they need to be.

...read moreread less

Abstract: XSLT 3.0 contains basic facilities for transforming JSON as well as XML. But looking at actual use cases, it’s clear that some things are a lot harder than they need to be. How could we extend XSLT to make JSON transformations as easy as XML transformations, using the same rule-based tree-walking paradigm? Some of these extensions are already implemented in current Saxon releases, so we are starting to get user feedback.

...read moreread less

2 citations

Proceedings Article•10.1117/12.2644557•

The data detection platform based on CIM/XML power grid model standard

[...]

Shan-Guo Li, Wei Zhang, Yanke Dong, Zhidu Huang

10 Nov 2022

TL;DR: In this article , a method of analyzing and detecting the power grid model file compiled by CIM/XML model standard is studied, which is carried out by the structure of Resource Description Framework (RDF) data.

...read moreread less

Abstract: Common information model (CIM) is an important part of IEC61970/IEC61968 series standard and grid data standard. In this paper, the method of analyzing and detecting the power grid model file compiled by CIM/XML model standard is studied. The data mining of CIM file is carried out by the structure of Resource Description Framework (RDF) data. Based on the XML files, the CIM files are completely expressed and the CIM/XML model file is parsed. According to the IEC61970/IEC61968 standard, the input CIM model data streams are effectively detected. It depends on whether they conform to the grid standard and have smooth interoperability.

...read moreread less

2 citations

Journal Article•10.34028/iajit/19/4/2•

XAPP: An Implementation of SAX-Based Method for Mapping XML Document to and from a Relational Database

[...]

Yetunde Akinwumi, Joshua Ayobami Ayeni, S. A. Arekete, Mba Obasi Odim, Adewale Opeoluwa Ogunde, Bosede Oyenike Oguntunde - Show less +2 more

01 Jan 2022-The International Arab Journal of Information Technology

TL;DR: In this research, a new lightweight application that adopts a novel model mapping approach was developed using Simple API for XML (SAX) parser and proves to perform significantly better than the DOM-based algorithm in terms of mapping and reconstruction time, and memory efficiency.

...read moreread less

Abstract: Extensible Markup Language (XML) is the standard medium for data exchange among businesses over the Internet, hence the need for effective management. However, since XML was not designed for storage and retrieval, its management has become an open research area in the database community. Existing mapping techniques for XML-to-relational database adopt either the structural mapping or the model mapping. Though numerous mapping approaches have been developed, mapping and reconstruction time had been problematic, especially when the document size is large and can hardly fit into main memory. In this research, an application codenamed XAPP, a new lightweight application that adopts a novel model mapping approach was developed using Simple API for XML (SAX) parser. XAPP accepts a document with or without Document Type Definition (DTD). It implements two algorithms: one maps XML data to a relational database and improves mapping time, and the other reconstructs an XML document from a relational database to improve reconstruction time and minimise memory usage. The performance of XAPP was analysed and compared with the Document Object Model (DOM) algorithm. XAPP proves to perform significantly better than the DOM-based algorithm in terms of mapping and reconstruction time, and memory efficiency. The correctness of XAPP was also verified.

...read moreread less

1 citations

Journal Article•10.14716/ijtech.v13i5.5871•

Performance Evaluation of XML Dynamic Labeling Schemes on Relational Database

[...]

Su-Cheng Haw, Aisyah Amin, Palanichamy Naveen, Kok-Why Ng

19 Oct 2022-International Journal of Technology: IJ Tech

TL;DR: The dynamic labeling schemes such as ORDPath, ME Labeling, and ORD-GAP are reviewed and the XML annotated labeling schemes are transformed into RDB storage to determine which labeling scheme is more robust and efficient to support storage and query retrieval.

...read moreread less

Abstract: . eXtensible Markup Language (XML), in its semi-structured format has been employed for the data exchange purpose over the Internet due to its expressivity, flexibility, and capability to accommodate both structured and unstructured data. Due to the vast amount of data being transacted and updated frequently, it is essential to have a solution that can efficiently store and query the data. Hence, a robust and persistent labeling scheme that can sustain the need to re-labeling the entire document is desirable. Relational Database (RDB) has emerged since the 1970s and has been widely used as back-end storage in most industries. Since XML and RDB are in different formats, an efficient mapping technique is required. Several labeling and mapping schemes have been proposed, yet, there is no comparison of the performance of these schemes implemented in the RDB storage. In this paper, we first review the dynamic labeling schemes such as ORDPath, ME Labeling, and ORD-GAP in addressing these two needs. Secondly, the XML annotated labeling schemes are transformed into RDB storage. Finally, the performance evaluations are carried out to determine which labeling scheme is more robust and efficient to support storage and query retrieval.

...read moreread less

1 citations

Proceedings Article•10.4242/balisagevol27.udeshani01•

Getting Useful XML out of Microsoft Excel

[...]

A. Riechokainen

30 Jul 2022

TL;DR: In this article , a solution that transforms a Microsoft Excel Open XML Spreadsheet (XLSX) file into a shallow-structured XML file used at Typefi, called Content XML (CXML), using XSLT and XProc is presented.

...read moreread less

Abstract: This paper presents a solution that transforms a Microsoft Excel Open XML Spreadsheet (XLSX) file into a shallow-structured XML file used at Typefi, called Content XML (CXML), using XSLT and XProc. This solution has three main research areas: The XProc pipeline to read the Excel file content. Transform Excel tables to a CALS table using XSLT functions. Transform Excel charts and embedded images. Significant information is read from the chart.xml and converted to a Scalable Vector Graphics (SVG) file, and then referenced as an image in the output XML. The XLSX file can contain various elements such as tables, charts, and graphics. This solution does not yet use all the features available within the Excel XML but is a work in progress and future improvements will be guided by customer requests.

...read moreread less

1 citations

Proceedings Article•10.1109/icnisc57059.2022.00092•

A Parallel XML Parsing Algorithm Based on NEM-XML

[...]

Mohammad Sabuj¹•Institutions (1)

Suzhou Industrial Park Institute of Services Outsourcing¹

1 Sep 2022

TL;DR: In this article , a parallel approach on XML parsing based on NEM-XML is presented, and the experimental results show that their parallel XML parsing algorithm improves XML parsing performance significantly and scales well.

...read moreread less

Abstract: As the de facto data representation and data exchange standard, XML has become very popular over Internet. How to improve XML parsing performance is the key to promote its further development and application. Parallel computing is a key technology for solving problems with huge computation. This paper presents a parallel approach on XML parsing based on NEM-XML. The experimental results show that our parallel XML parsing algorithm improves XML parsing performance significantly and scales well.

...read moreread less

1 citations

Dissertation•10.53846/goediss-2472•

Evaluation of Queries on Linked Distributed XML Data

[...]

Bengkel Mecca Medina

20 Feb 2022

TL;DR: In this paper , an XLink extension "dbxlink" has been proposed, which allows for modeling interlinked XML instances as integrated views where XLinks are resolved in a transparent way.

...read moreread less

Abstract: XML (eXtensible Markup Language) is the de-facto standard for exchanging information and for representing data in the World Wide Web. In contrast to the document-centric perspective given by the well-known language HTML which defines the human-readable content and the layout of web pages, XML offers more flexibility and expressiveness.XML documents are not required to be self-contained but may rather have links to other XML resources. For expressing such links between XML documents, the W3C (World Wide Web Consortium) proposed XLink - but mainly for browsing purposes. If the linked documents are considered from the data-centric viewpoint, it shows that XLink does not specify how the referenced instances should be handled. Especially, it is not possible to query along links though the W3C XML Query (XQuery) Requirements explicitly state that this has to be guaranteed.In order to cope with these issues, an XLink extension "dbxlink" has been proposed. It allows for modeling interlinked XML instances as integrated views where XLinks are resolved in a transparent way. In particular, it is possible to query these instances with XPath and XQuery.In this work, the dbxlink model is described and it is investigated how to query distributed XML instances interlinked with a simple kind of XLinks according to this approach. Different strategies are analyzed and emerging problems like the handling of cyclic instances are treated. It is shown how to extend XPath-based query systems in order to be able to handle queries wrt. dbxlink. Furthermore, optimizing techniques like special caching strategies are proposed. The results of these investigations have been used to conduct a proof-of-concept implementation of the dbxlink approach as an extension to the open source XML database system eXist.

...read moreread less

1 citations

Journal Article•10.1109/access.2022.3178438•

An Efficient Prefix-Based Labeling Scheme for XML Dynamic Updates Using Hexagonal Pattern

[...]

01 Jan 2022-IEEE Access

TL;DR: In this article , the authors proposed an efficient prefix-based labeling scheme that uses a hexagonal pattern, which avoids the need for node relabeling when XML documents are updated at random locations, avoids duplicated labels by creating a new label for every inserted node, and reduces the size and time costs of the updated labels.

...read moreread less

Abstract: To improve XML query processing, it is necessary to label XML documents efficiently for the indexing process because it allows the structural relationships between the XML nodes to be preserved without having to access the original document. However, XML data on the Web is updated as time passes, which means that the dynamic updating of XML data is an issue that may need to be handled by a XML labeling scheme specifically designed for dynamic updates. Previous XML labeling schemes have limitations when updates take place. For example, a lot of node labels need to be relabeled, a lot of duplicate labels occur during this relabeling process, and the size and time costs of the updated labels are high. Therefore, this paper proposes an efficient prefix-based labeling scheme that uses a hexagonal pattern. The proposed labeling scheme has three main advantages: (i) it avoids the need for node relabeling when XML documents are updated at random locations, (ii) it avoids duplicated labels by creating a new label for every inserted node, and (iii) it reduces the size and time costs of the updated labels. The proposed scheme is evaluated against the three most recent prefix-based labeling schemes in terms of the size and time costs of the updated labels. In addition, the ability of the proposed labeling scheme to handle several updates (such as insertions) in XML documents is also evaluated. The evaluations show that the proposed labeling scheme outperforms previously developed prefix-based labeling schemes in terms of both size and time costs, particularly for large-scale XML datasets, resulting in improved query processing performance. Moreover, the proposed scheme efficiently supports frequent updates at arbitrary positions. The paper concludes with several suggestions for further research.

...read moreread less

1 citations

Journal Article•10.1088/1742-6596/2384/1/012028•

A feature extraction method for XML documents based on PCA

[...]

Zhiwen Yu, Ruifeng Zhao, Kaiwen Zeng, Wenjie Zheng, Hao Liu, Ting Yang - Show less +2 more

01 Dec 2022-Journal of Physics: Conference Series

TL;DR: In this article , the authors proposed two methods: vectorization representation of XML documents and further feature extraction, and the experiment results show that the method of all path feature extraction for XML document can represent the main feature of XML document effectively, and is an important work for handling XML documents of power grid efficiently.

...read moreread less

Abstract: XML document stores the information of the new power system source load interaction. It has the characteristics of self-description, extensibility, structure, and content, which makes it widely used. Improving the method of extracting elements from XML documents is very helpful in solving the problem of distributed object operation measurement in the power grid. To classify or analyze XML documents better, based on the theoretical analysis of principal component analysis and the study of the text representation model, this paper proposes two methods: vectorization representation of XML documents and further feature extraction. The experiment result shows that the method of all path feature extraction for XML document can represent the main feature of XML document effectively, and is an important work for latter handling XML documents of power grid efficiently

...read moreread less

1 citations

Proceedings Article•10.4242/balisagevol27.lenz01•

XML in an AsciiDoc World: SaxonJS to the Rescue

[...]

Zheng Qiao

30 Jul 2022

TL;DR: Antora as discussed by the authors is a static site generator for software documentation that runs on Node.js and uses AsciiDoc for its source content, but it does not natively handle complex content generation needs.

...read moreread less

Abstract: Static website generation has long been an effective use case for XML and XSLT. Today, static site generators remain popular, but they rarely use XML. Antora is a static site generator for software documentation. It runs on Node.js and uses AsciiDoc for its source content. It has desirable features including git integration, site versioning, and pluggable modern UI bundles. However, Antora doesn't natively handle complex content generation needs. Now, thanks to SaxonJS and Antora's new extension mechanism, we can weave in the power of XML and XSLT. The docca project generates reference documentation for Boost C++ libraries via an Antora extension that invokes SaxonJS, seamlessly integrating auto-generated and manually-authored content into the result. This presentation introduces key project components (Doxygen, Antora, AsciiDoc, and SaxonJS running on Node.js) and includes sample code, a demo, and a brief discussion of other ways XML and SaxonJS might complement AsciiDoc and Antora.

...read moreread less

1 citations

Journal Article•10.25079/ukhjse.v6n1y2022.pp33-41•

XML Schema Validation Using Java API for XML Processing

[...]

SheneJalil Jamal, ChnoorMeheadeen Rahman, Mzhda Sabir Abdulkarim

30 Jun 2022

TL;DR: This study focuses on constructing a separate XML document validator and validating XML documents against the defined XSD rules and the critical differences between XSD and DTD.

...read moreread less

Abstract: Extensible Markup Language (XML) is a markup language that is developed to organize the structure of information in a text file. The data in XML formatted documents are represented by specifying a number of tags and determining the structural relationship between those tags. It has a simple structure and can be handled by any text editor. Therefore, XML formatted data is being commonly used to transfer and share data between different applications and organizations without having to convert the format of the data (Yang, 2019). In the XML world, “well-formed” and “valid” are the two most frequently used terms. A well-formed XML document is free from errors that can cause the document to not parse, such as: spelling, punctuation, grammar, and syntax errors. While in addition to having a well-formed markup, a valid XML must conform to a document type definition, this means the document must be semantically correct and matches a described standard of schemas and relationships (Appel, 2020).There are two standards of document type definition that can be used to validate an XML document, one is DTD or Document Type Definition which is used to identify the legal structure and names the legal elements of an XML document (Dykes and Tittel, 2011), and the other is XSD or XML Schema Definition. XSD is a diagrammatic representation that defines the valid structure of an XML document, it enables specifying the building blocks of an XML data set such as elements and attributes and their data types, number of child elements, fixed and default values of the elements and attributes that can appear in the documents (XML Schema Tutorial, 2020). In some applications the process of validating XML documents is combined with parsing the document. However, in some other cases the process of parsing and validating the XML documents need to be separated. This study focuses on constructing a separate XML document validator and validating XML documents against the defined XSD rules. A Java program is used to perform this experiment. Furthermore, the critical differences between XSD and DTD are also mentioned.

...read moreread less

Journal Article•10.32014/2022.2518-1726.132•

Modern methods of processing xml data in relational and temporary xml databases

[...]

29 Jun 2022-Izvestiâ Nacionalʹnoj akademii nauk Respubliki Kazahstan

Journal Article•10.15849/ijasca.221128.07•

Framework to Mine XML Format Event Logs

[...]

Ang Sheng, Jastini Mohd Jamil, Izwan Nizal Mohd Shaharanee

28 Nov 2022-International journal of advances in soft computing and its applications

TL;DR: In this article , a framework that flattens and converts tree structured data into structured data, while maintaining the information of architecture and the composition of XML format is proposed to gain more information from event logs.

...read moreread less

Abstract: Abstract A lot of applications including event logs and web pages uses XML format for utilizing, keeping, transferring and displaying data. Thus, volume of data expressed in XML has increase rapidly. Numerous research has been done to extract and mine information from XML documents. Mining XML documents allows an understanding to the architecture and composition of XML documents. Generally, frequent subtree mining is one of the methods to mine XML documents. Frequent subtree mining searches the relation between data in a tree structured database. Due to the architecture and the composition of XML format, normal data mining and statistical analysis difficult to be performed. This paper suggests a framework that flattens and converts tree structured data into structured data, while maintaining the information of architecture and the composition of XML format. To gain more information from event logs, converting into structured data from semistructured format grants more ability to perform variety data mining techniques and statistical test. Keywords: Flatten Sequential Structure Model, XML Format Event Logs, Data Mining, Statistical Analysis.

...read moreread less

Dissertation•10.53846/goediss-3620•

Modeling and Querying of Distributed XML Data in Presence of 3rd Party Links

[...]

20 Feb 2022

TL;DR: In this article , the authors present a formal description of Simple Link and Extended Link semantics, based on a specification as an abstract data type (ADT), and providing Extended Links with a 3rd Party Link semantics.

...read moreread less

Abstract: XML (short for eXtensible Markup Language) is a meta-language for the representation of digital data. XML has had an enormous impact on modern computer science and IT industry since its advent in 1997, for several reasons: XML is simple and easily accessible. Using Unicode as encoding, XML can be viewed and authored/edited with common text editors, and due to the context-free and well-formed structure of XML document types, it is easy to provide efficient parsers for processing XML documents. Also, XML"s concept of definable document types enables for a structured representation of almost arbitrary digital data, with the document type modeling the domain of the data, which makes XML a very powerful and flexible standard for data representation, particularly regarding the Web. The XLink standard is an extension to XML for defining references between XML documents, inspired by the hyperlink concept from hypertext. XLink defines two types of links: Simple Links are unidirectional links from one document to another, similar to HTML hyperlinks. Extended Links create graph-based relationships (arcs) between portions of XML (resources) over multiple XML documents. Within the LinXIS project, models and query evaluation for XLink have been investigated: in a logical data model, a Simple Link is given the semantics of an embedded view that "imports" the referenced data from a remote document into the link-defining document. The participating XML data, together with the Simple Links define a virtual instance (a single-document view on the distributed data) according to the logical data model. Extended Links define relations between XML resources, but in contrast to Simple Links, they are not defined inside the participating resources but apart of them. This allows to define a semantics for Extended Links, with an Extended Link defining views that combine and extend the participating resources from a 3rd party perspective, without need for write access to them, and thus extending the Simple Links logical data model. The above described logical data model provides a semantics for the evaluation of XPath queries over distributed XML data: A query may be evaluated not on a (physical) XML document, but on the virtual instance defined by the given Simple and Extended Links. The query evaluation may "follow" along a Simple Link, continuing the evaluation process on the referenced, physically remote data. For Extended Links, queries can be evaluated on the integrated view combining the sources referenced by an Extended Link, based on the 3rd party semantics of the link. A previous PhD thesis, which also emerged from the LinXIS project, introduced the data model for Simple Links and investigated techniques and algorithms for XPath query evaluation on the linked XML data. As part of the work, the data model was implemented on base of the Open Source XML database system eXist, thus creating a Simple-Link-enhanced XML database prototype. The present work extends the focus from Simple to Extended Links: The work includes a formal description of both Simple Link and Extended Link semantics, based on a specification as an abstract data type (ADT), and providing Extended Links with a 3rd Party Link semantics. Also, the basic concepts for query evaluation with respect to 3rd Party Links are investigated. The algorithms as well as the logical data model for 3rd Party Links are implemented by further enhancement of the eXist-based prototype, providing the query evaluation unit with that semantics. The prototype is tested within a case study, evaluating the prototype"s functional behavior and performance. The case study is followed by a discussion of the proposed 3rd Party Link approach, addressing its applicability in terms of its design, performance and its relevance within a rapidly evolving Web infrastructure. The work is completed by a conclusion addressing the previously discussed issues, and giving an overview over related research as well as over perspectives and further work.

...read moreread less

Journal Article•10.1063/5.0092006•

An algorithm for automated transformation of the information from relational databases to JSON and XML

[...]

Nikolay Nikolov

01 Jan 2022-Nucleation and Atmospheric Aerosols

TL;DR: In this paper , the authors describe a universal solution for automated transformation from relational databases into semi-structured data (JSON or XML) using SQL queries and a tool that can transform the data, retrieved from a relational database with arbitrary SQL query into JSON or XML.

...read moreread less

Abstract: The paper describes an own development of a universal solution for automated transformation from relational databases into semi-structured data (JSON or XML). The aim is to create a tool, that will be able to transform the data, retrieved from relational database with arbitrary SQL query into semi-structured data (JSON or XML). This tool must be able not only to “pack” data into the required output format but, it must be able to recognize the hidden “hierarchical nature” of the “flat” relational data and automatically transform this kind of data into JSON or XML format, preserving existing relationships between them. This transformation must be done without (or with very little) intervention from the user.

...read moreread less

Book Chapter•10.1007/978-3-031-16947-2_3•

Designing XML Schema Inference Algorithm for Intra-enterprise Use

[...]

Dmitry Uraev, Eduard Babkin

1 Jan 2022

Repository•10.48550/arxiv.1501.02033•

XQOWL: An Extension of XQuery for OWL Querying and Reasoning

[...]

8 Mar 2022

Abstract: One of the main aims of the so-called Web of Data is to be able to handle heterogeneous resources where data can be expressed in either XML or RDF. The design of programming languages able to handle both XML and RDF data is a key target in this context. In this paper we present a framework called XQOWL that makes possible to handle XML and RDF/OWL data with XQuery. XQOWL can be considered as an extension of the XQuery language that connects XQuery with SPARQL and OWL reasoners. XQOWL embeds SPARQL queries (via Jena SPARQL engine) in XQuery and enables to make calls to OWL reasoners (HermiT, Pellet and FaCT++) from XQuery. It permits to combine queries against XML and RDF/OWL resources as well as to reason with RDF/OWL data. Therefore input data can be either XML or RDF/OWL and output data can be formatted in XML (also using RDF/OWL XML serialization).

...read moreread less

Repository•10.48550/arxiv.1007.2671•

XML Reconstruction View Selection in XML Databases: Complexity Analysis and Approximation Scheme

[...]

Fu, Bin

14 Mar 2022

Abstract: Query evaluation in an XML database requires reconstructing XML subtrees rooted at nodes found by an XML query. Since XML subtree reconstruction can be expensive, one approach to improve query response time is to use reconstruction views - materialized XML subtrees of an XML document, whose nodes are frequently accessed by XML queries. For this approach to be efficient, the principal requirement is a framework for view selection. In this work, we are the first to formalize and study the problem of XML reconstruction view selection. The input is a tree $T$, in which every node $i$ has a size $c_i$ and profit $p_i$, and the size limitation $C$. The target is to find a subset of subtrees rooted at nodes $i_1,\cdots, i_k$ respectively such that $c_{i_1}+\cdots +c_{i_k}\le C$, and $p_{i_1}+\cdots +p_{i_k}$ is maximal. Furthermore, there is no overlap between any two subtrees selected in the solution. We prove that this problem is NP-hard and present a fully polynomial-time approximation scheme (FPTAS) as a solution.

...read moreread less

Journal Article•10.48550/arxiv.cs/0011041•

EquiX---A Search and Query Language for XML

[...]

Cohen, Sara, Kanza, Yaron, Nutt, Werner, Sagiv, Yehoshua, Serebrenik Alexander - Show less +1 more

19 Mar 2022

Abstract: EquiX is a search language for XML that combines the power of querying with the simplicity of searching. Requirements for such languages are discussed and it is shown that EquiX meets the necessary criteria. Both a graphical abstract syntax and a formal concrete syntax are presented for EquiX queries. In addition, the semantics is defined and an evaluation algorithm is presented. The evaluation algorithm is polynomial under combined complexity. EquiX combines pattern matching, quantification and logical expressions to query both the data and meta-data of XML documents. The result of a query in EquiX is a set of XML documents. A DTD describing the result documents is derived automatically from the query.

...read moreread less

Repository•10.48550/arxiv.1012.2648•

Distributed XML Design

[...]

13 Mar 2022

Abstract: A distributed XML document is an XML document that spans several machines. We assume that a distribution design of the document tree is given, consisting of an XML kernel-document T[f1,...,fn] where some leaves are "docking points" for external resources providing XML subtrees (f1,...,fn, standing, e.g., for Web services or peers at remote locations). The top-down design problem consists in, given a type (a schema document that may vary from a DTD to a tree automaton) for the distributed document, "propagating" locally this type into a collection of types, that we call typing, while preserving desirable properties. We also consider the bottom-up design which consists in, given a type for each external resource, exhibiting a global type that is enforced by the local types, again with natural desirable properties. In the article, we lay out the fundamentals of a theory of distributed XML design, analyze problems concerning typing issues in this setting, and study their complexity.

...read moreread less

Repository•10.48550/arxiv.2102.02246•

The Forgotten Document-Oriented Database Management Systems: An Overview and Benchmark of Native XML DODBMSes in Comparison with JSON DODBMSes

[...]

Truica Ciprian-Octavian, Apostol, Elena-Simona, Darmont, Jérôme, Pedersen, Torben Bach

23 Feb 2022

Abstract: In the current context of Big Data, a multitude of new NoSQL solutions for storing, managing, and extracting information and patterns from semi-structured data have been proposed and implemented. These solutions were developed to relieve the issue of rigid data structures present in relational databases, by introducing semi-structured and flexible schema design. As current data generated by different sources and devices, especially from IoT sensors and actuators, use either XML or JSON format, depending on the application, database technologies that store and query semi-structured data in XML format are needed. Thus, Native XML Databases, which were initially designed to manipulate XML data using standardized querying languages, i.e., XQuery and XPath, were rebranded as NoSQL Document-Oriented Databases Systems. Currently, the majority of these solutions have been replaced with the more modern JSON based Database Management Systems. However, we believe that XML-based solutions can still deliver performance in executing complex queries on heterogeneous collections. Unfortunately nowadays, research lacks a clear comparison of the scalability and performance for database technologies that store and query documents in XML versus the more modern JSON format. Moreover, to the best of our knowledge, there are no Big Data-compliant benchmarks for such database technologies. In this paper, we present a comparison for selected Document-Oriented Database Systems that either use the XML format to encode documents, i.e., BaseX, eXist-db, and Sedna, or the JSON format, i.e., MongoDB, CouchDB, and Couchbase. To underline the performance differences we also propose a benchmark that uses a heterogeneous complex schema on a large DBLP corpus.

...read moreread less

Repository•10.48550/arxiv.1407.2845•

XML Matchers: approaches and challenges

[...]

De Meo, Pasquale, Ferrara, Emilio, Ursino, Domenico

9 Mar 2022

Abstract: Schema Matching, i.e. the process of discovering semantic correspondences between concepts adopted in different data source schemas, has been a key topic in Database and Artificial Intelligence research areas for many years. In the past, it was largely investigated especially for classical database models (e.g., E/R schemas, relational databases, etc.). However, in the latest years, the widespread adoption of XML in the most disparate application fields pushed a growing number of researchers to design XML-specific Schema Matching approaches, called XML Matchers, aiming at finding semantic matchings between concepts defined in DTDs and XSDs. XML Matchers do not just take well-known techniques originally designed for other data models and apply them on DTDs/XSDs, but they exploit specific XML features (e.g., the hierarchical structure of a DTD/XSD) to improve the performance of the Schema Matching process. The design of XML Matchers is currently a well-established research area. The main goal of this paper is to provide a detailed description and classification of XML Matchers. We first describe to what extent the specificities of DTDs/XSDs impact on the Schema Matching task. Then we introduce a template, called XML Matcher Template, that describes the main components of an XML Matcher, their role and behavior. We illustrate how each of these components has been implemented in some popular XML Matchers. We consider our XML Matcher Template as the baseline for objectively comparing approaches that, at first glance, might appear as unrelated. The introduction of this template can be useful in the design of future XML Matchers. Finally, we analyze commercial tools implementing XML Matchers and introduce two challenging issues strictly related to this topic, namely XML source clustering and uncertainty management in XML Matchers.

...read moreread less

Journal Article•10.1002/spe.3074•

Semantics to the rescue of document‐based XML diff: A JATS case study

[...]

Milos Cuculovic, Frédéric Fondement, Maxime Devanne, J. Weber, Michel Hassenforder - Show less +1 more

12 Feb 2022-Software - Practice and Experience

TL;DR: This paper proposes within this paper a new XML diff algorithm called jats‐diff, able to support bijection between higher‐level modifications made by the authors, such as structural changes and restyling, and the changes detected between XML documents.

...read moreread less

Abstract: The writing of digital text documents has become a longer process that usually goes through revision rounds. Document comparison is important for the human reader interested in changes made by the authors. These documents contain structural data using text‐centric XML as one of their main storage systems. Current XML diff algorithms are able to represent differences with a limited number of edit operations: insert, delete, move and update. This approach does not fit the scope of digital text document comparison where the human reader needs to understand actual modifications made by the author. With JATS being a text‐centric XML vocabulary, we propose within this paper a new XML diff algorithm called jats‐diff, able to support bijection between higher‐level modifications made by the authors, such as structural changes and restyling, and the changes detected between XML documents. In addition, jats‐diff provides similarity information between different nodes in order to measure the impact of the text changes on the XML tree.

...read moreread less

Repository•10.48550/arxiv.1311.4040•

Enhanced XML Validation using SRML

[...]

Kálmán Miklós, Havasi Ferenc

10 Mar 2022

Abstract: Data validation is becoming more and more important with the ever-growing amount of data being consumed and transmitted by systems over the Internet. It is important to ensure that the data being sent is valid as it may contain entry errors, which may be consumed by different systems causing further errors. XML has become the defacto standard for data transfer. The XML Schema Definition language (XSD) was created to help XML structural validation and provide a schema for data type restrictions, however it does not allow for more complex situations. In this article we introduce a way to provide rule based XML validation and correction through the extension and improvement of our SRML metalanguage. We also explore the option of applying it in a database as a trigger for CRUD operations allowing more granular dataset validation on an atomic level allowing for more complex dataset record validation rules.

...read moreread less

Repository•10.25949/19444103.v1•

Semantic transformations for XML queries

[...]

29 Mar 2022

Abstract: "The ever-increasing adoption of XML has created a need to ensure that XML query languages perform efficiently. Query optimization and transformation for XML query languages, both syntactically and semantically, have received much attention from research communities in recent years. However, due to the fast progress of the application of XML data management solutions, XML-Enabled Database Management Systems still face several challenges. Among these challenges is query processing, especially the processing of XML queries specified with XPath axes and redundancies that may exist in predicates used in XML queries. Semantic query optimization utilizes constraints in XML schemas to directly optimize a given query with a set of optimization rules. Due to the current complexity of the XML data structure which is enabled by rich semantics in XML Schemas, semantic query optimization should be performed in a more systematic manner. For a complete solution, this research proposes a series of semantic transformations to transform given XML queries to semantically equivalent, but more efficient, XML queries for optimization purposes, by using the semantics provided in XML Schemas. The proposed semantic transformations are grouped into three categories: (1) Semantic Path Transformations, (2) Semantic Transformations for XPath Queries Specified with Predicates, and (3) Semantic Transformations for XPath Queries Specified with XPath Axes. After a semantic transformation is applied to an XML query, the equivalent semantic XML query can be processed more efficiently by an XML data management system and returns the same result set. The proposed semantic transformations are then translated into a series of algorithms which are implemented and empirically evaluated for their efficiency and effectiveness. The experimental studies were carried out by using both real data sets (DBLP) and Benchmark data sets (Michigan) to illustrate that the majority of semantic transformations achieved significantly improved performance in XML query processing; this also enabled the research presented here to identify semantic transformations as optimization devices". -- Abstract.

...read moreread less