TL;DR: In this article , the authors proposed an approach to securely parse XML document and prevent large number of XML and XXE attacks using the incremental genetic algorithm (IGA) and self-evolving feature set.
Abstract: XML document provides a platform independent data representation and transportation facility to enable communication among heterogeneous Web application and Web and cloud computing services. The wide usage of XML document and its parsing makes it prone to cyberattacks. The attacker exploits the hidden vulnerability in the document parsers and injects ever changing malicious payload posing a threat to confidentiality, integrity, and availability of Web resources. In this paper, the authors propose an approach to securely parse XML document and prevent large number of XML and XXE attacks. The detection rules for preventing malicious document are self-evolving and update its feature set using the incremental genetic algorithm. The secure XML parsing pattern will provide a security guideline and supplement the already existing parser by facilitating detection of malicious payloads.
TL;DR: XML-first CMS dream realized through implementation of an XML-centric portal for automotive client.
Abstract: An XML-first content management system — XML technologies handling XML content in an XML database — has been the author's dream for the last decade and a half, ever since he first found a way to break free from the shackles of non-XML technologies limiting what he could do. A portal for an automotive client provided both an interesting, and XML-centric, case study using quite a few “X” technologies, but just as importantly a way forward to finally implementing that system.
Reza Samavi, Mariano P. Consens, Shahan Khatchadourian, Thodoros Topaloglou
13 Sep 2023
TL;DR: DescribeX is a novel visualization technique for analyzing PSI-MI XML collections at the instance level, providing insights into schema usage, common patterns, and evolution.
Abstract: <p>PSI-MI has been endorsed by the protein informatics community as a standard XML data exchange format for protein-protein interaction datasets. While many public databases support the standard, there is a degree of heterogeneity in the way the proposed XML schema is interpreted and instantiated by different data providers. Analysis of schema instantiation in large collections of XML data is a challenging task that is unsupported by existing tools. In this study we use DescribeX, a novel visualization technique of (semi-)structured XML formats, to quantitatively and qualitatively analyze PSI-MI XML collections at the instance level with the goal of gaining insights about schema usage and to study specific questions such as: adequacy of controlled vocabularies, detection of common instance patterns, and evolution of different data collections. Our analysis shows DescribeX enhances understanding the instance-level structure of PSI-MI data sources and is a useful tool for standards designers, software developers, and PSI-MI data providers. </p>
TL;DR: The XML binding technology covered in Chapter 22 introduced a valuable data exchange format, simplifying and standardizing the communication in heterogeneous and wide-spread computer networks as mentioned in this paper . But with all those additions, XML documents became more and more complex, and in order to send a few bytes formatted as XML, a lot of boilerplate data had to be transmitted.
Abstract: The XML binding technology covered in Chapter 22 introduced a valuable data exchange format, simplifying and standardizing the communication in heterogeneous and wide-spread computer networks. By using schemas and namespaces for elements and attributes, this enabled semantics and version consistency in XML, thus allowing for XML document validity checks. Unfortunately, with all those additions, XML documents became more and more complex, and in order to send a few bytes formatted as XML, a lot of boilerplate data had to be transmitted, too. Especially with browser-to-server communication, XML became overkill. For this reason, web application developers switched to the leaner JSON (JavaScript Object Notation) data protocol, making it necessary for the Java world to develop tools and libraries that could handle JSON.
TL;DR: XML binding in Java relates to a Java object tree, which is a set of Java objects connected via class member relations, with XML documents as discussed by the authors , and the technology behind the XML binding is called JAXB (Java XML Binding).
Abstract: XML binding in Java relates to a Java object tree, which is a set of Java objects connected via class member relations, with XML documents. The technology behind the XML binding is called JAXB (Java XML Binding), and for Jakarta EE 10, the JAXB specification is in version number 4.0.
TL;DR: In this article , a new XML stream structure for broadcasting XML data by compressing and summarizing the information about how XML nodes are put together is proposed, which can be used to answer a wide range of XML questions.
Abstract: XML data warehouses give decision-support systems a chance to use complicated data by giving them a place to start. But native-XML database management systems are slow right now, so it is important to look into ways to make them faster. This study presents two strategies to think about. To start, we suggest using a link index that was made with the fact that XML warehouses have a lot of dimensions in mind. The join procedures are taken out, but the data from the first warehouse stays the same. Second, we show how to choose XML materialized views by grouping the query load to show how this can be done. To prove that these ideas work, we built a set of decision support XQuery tools and ran them against an XML data warehouse. We compared the results obtained with and without our optimization methods. Our tests show that it works, even though the queries themselves are a little hard to understand and the datasets themselves are very big. XML has emerged as the widely accepted standard for transmitting data over mobile wireless networks. In these kinds of networks, mobile clients can use a wireless broadcast channel to send queries to get the XML data they need. Because mobile devices are so small and have so little storage space and a short battery life, it may be hard for customers to download the whole XML data set on one of these devices. To solve this problem, you need to index XML data so that mobile clients only have to download the parts of the file they need. Users who want to access only certain parts of the XML content in an XML stream could use one of several indexing methods. Still, the indexing methods that are used now add more data to an XML stream that is already very large. This research comes up with a new XML stream structure for broadcasting XML data by compressing and summarizing the information about how XML nodes are put together. This study was conducted in the United Kingdom. When data is summed up before being sent, the time it takes to get it in XML format over a wireless broadcast channel can be cut down. The recommended XML stream structure also has indexes that will help you skip over any data that is not important. So, it could make it less likely that XML query results will drain the batteries of mobile devices when they are being processed. We also found that our suggested XML stream design was better than its predecessors in terms of access and tuning times for processing XML queries over the XML data stream. So, our architecture can be used to answer a wide range of XML questions.
TL;DR: The data exchanging technology based on Landmark EDM is mainly focused on the data storage and exchange using XML file named EDM.XML. The data storage is implemented in SQL Server, while data exchange utilizes XML file. The data structure of EDM.XML is organized as a node tree, allowing for a comprehensive overview of data storage and management of various applications.
Abstract: As know Landmark Software Suite is widely used in engineering of oil industry. So, learning and using of its data is necessary for our researching. Engineer's Data Model (EDM) as its solution for data management plays a very important role in this software suite. EDM is consisted of two parts, data storage and data exchange. EDM use SQL Server as its storage media which makes data storage secure. Data is stored in a bunch of well-organized data tables of SQL Server. EDM covers data storage of all applications. As we know, XML is mostly used technology in data exchanging. In order to sharing data between customers, EDM use a xml file to exchange data. We name this XML file as EDM.XML. In this paper we mainly talk about this data exchanging technology, as this xml file a direct reflection of EDM's data structure. We can get the overview of the data storage, furthermore we can also learn much about the management of Survey, Geodetic, Coordinate Reference System, Magnetic, Survey Tool etc. EDM.XML is organized as a node tree, so we will introduce EDM.XML from the tree root to leaf and traverse the primary leaf nodes in detail. After that we will make a well understanding of the principle of its data storage design and some basic aspect of well design.
TL;DR: In this paper , the authors present a case study on examining the curricula of Hungarian and foreign universities and educational orgaizations and how they are teaching XML and related technologies, and summarize their efforts in teaching XML, using XML in education and other methodological innovations.
Abstract: In my dissertation, I summarized my efforts in teaching XML, using XML in education and other methodological innovations. The paper starts with a literature review and study of information technology as a profession. This is followed by the section about XML education, in which at first I wrote a case study on examining the curricula of Hungarian and foreign universities and educational orgaizations and how they are teaching XML and related technologies. Next I presented my subject „Adatkezelés – XML” („Data Management – XML”) which is taught in ELTE and I formulated my first thesis: Basics of XML technologies could be presented at the early stages of the current universtiy information technology training with my curriculum on XML and related technologies, so the students could build to this during their later studies. I have written about the usage of XML in education in next part of my dissertation. I have introduced the curriculum I have developed and tested in which I used XML to teach text processing. My second thesis is about this: XML provides a good example to training text processing and project work. I have studied the creation of a new markup language by extended XML to not only use XML in some tasks, but also to apply it. With the presented algorithm markup language, the syntax of coding can be taught where the possible native language keywords and their related documentation can help to understand the algorithmic thinking and the structure of source codes. My third thesis was drawn up from this: Education of beginner programmers can be helped by Algorithm Markup Language (AML), which is an extension of XML with defining its own types. I have implemented and tested with my students a web application for visualizing and editing algorithms defined in this new markup language. My fourth thesis is: The effectiveness of learning can be enhanced by my XML-based application for visualising and editing algorithms. Not only to manage algorithms but specification – so data, types, pre- and postconditions –, testing and documentation, I continued the development. At first I elaborated an Excel-based toolkit to lead my students through the most coordinated steps of coding process. Then I build together this and the previous application to a new web-application. I published this application which helps students’ work from the design to the documentation through the coding. So my fifth thesis: My XML-based application built to the analogy of the tools used by the methodologies and spreaded in industry supports teaching of systematic programming. In the paper I wrote about the application methodology of the presented softwares. based on experiences of lessons and feedbacks, my theses have been verified and my students have succesfully learned the required knowledge.
TL;DR: In this article , the authors recommend the use of multi-channel for XML data in wireless broadcasting and propose a protocol that allows mobile users access to the wireless XML stream generated with their method.
Abstract: Abstract In this paper, we recommend the use of multi-channel for XML data in wireless broadcasting. First, we divide XML data into information units as buckets, then extract path information (XPath) for any unit and build an index tree from the data path. Finally, we make a wireless data stream by merging parts of an index tree and parts of XML data in multichannel XML. Then, we create a protocol that allows mobile users access to the wireless XML stream generated with our method. We study 11 channels on server side and three orthogonal channels in client side.
TL;DR: The ZooPARK discovery service as mentioned in this paper is a distributed system of autonomous search engines (ZooPARK server) with a dedicated coordination center and its own balanced API, including a description of the main systems and subsystems of the discovery service.
Abstract: В работе рассматривается дискавери сервис для XML-данных, который представляет собой распределенную систему автономных поисковых машин (сервер ZooPARK) с выделенным центром координации и собственным сбалансированным API. Дано описание основных систем и подсистем дискавери сервиса, в том числе дано описание объектной модели документа, которая лежит в основе ETL-процессов (сбор, подготовка, загрузка и индексирования данных). В рамках работы выполнено исследование различных подходов пригодных для равномерного распределения данных на конечном множестве вычислительных узлов.
The paper considers a discovery service for XML data, which is a distributed system of autonomous search engines (ZooPARK server) with a dedicated coordination center and its own balanced API. A description of the main systems and subsystems of the discovery service is given, including a description of the document object model that underlies ETL processes (data collection, preparation, loading and indexing). As part of the work, a study was made of various approaches suitable for uniform distribution of data on a finite set of computing nodes.
TL;DR: In this paper , a method to convert XML format document into a structured format without ignoring the structural information is proposed, which can allow more data mining techniques and statistical test be conducted and extract information from the business process log data.
Abstract: The volume of extensible markup language (XML) format documents is increasing every day due to the development of internet and the use of XML format in business process log file. Storing business process log data in XML format is preferable due to the ability of extensible and storing data irrespective of how it will be represented. However, mining XML format data poses challenges due to its complex data structure and dimensions. This paper proposes a method to convert XML format document into a structured format without ignoring the structural information. Converting semi-structured business process log data into structured format will allow more data mining techniques and statistical test be conducted and extract information from the business process log data. The experiment in this study performs t-test on a set of synthetic data and a set of real-world data to prove that information in business process log can be extracted through normal statistical test. Empirical results show that statistical analysis can be conducted on business process log data especially in XML format after flatten sequential structure model (FSSM) is used.
TL;DR: In this article , a cross-platform information transmission method of industrial internet of things based on XML technology is proposed, which aims at problems of large error in data feature extraction and high congestion in the traditional information transmission methods.
Abstract: Aiming at problems of large error in data feature extraction and high congestion in the traditional information transmission methods, this paper proposes a cross-platform information transmission method of industrial internet of things based on XML technology. Based on the networked information features of extract, SUM function was used to complete the feature fusion. Then, the XML technology is used to obtain the optimal segmentation of tree, and the fitting training of tree data is carried out to realise the safe storage of information. Then, the information distribution probability is obtained according to the nature of XML file, so as to realise the cross-platform transmission of information. According to the simulation results, it can be seen that the data feature extraction error of this method is at least 2.1%, the sample data transmission time is always lower than 6 s, and the transmission process congestion is low, which fully proves its effectiveness.
TL;DR: In this paper , a web-based server application is developed as an XML records editor that provides display forms for the creation and editing of XML documents and is able to adapt to the internal resources of the system used.
Abstract: The ability of the end user to work with a large amount of data from a large number of heterogeneous sources and at the same time get an effective result from the work is carried out through the use of graphical web interfaces built on the basis of XML technologies that allow displaying any structure of a file presented in XML format. As a data exchange method between applications on the Web, XML still lacks capabilities for identification of web resources and a system that uses them, and capabilities to express the knowledge provided by XLM documents. In this study, a web interface has been developed (a web-based server application), as an XML records editor that provides display forms for the creation and editing of XML documents and is able to adapt to the internal resources of the system used. The technology is based on the XSD data set schema transformation by the way of XSLT transformations. Screen forms are generated on the server side and are provided to the user with all the necessary tools for correct input and/or editing of heterogeneous data. A distinguishing characteristic of this technology is the ability to display both properly and improperly formed XML data. The developed graphical interface allows any application to automatically exchange and read information from other applications without human intervention, which significantly improves performance and ease of use. This software solution could be used both as an independent data building and editing module presented in the XML format, and as a built-in module plugged into various server software for heterogeneous information management systems
TL;DR: This paper proposes a new XML labelling scheme to improve query performance and efficiency, addressing limitations of existing schemes, and evaluates its effectiveness through four experiments, achieving improved results in labelling XML documents.
Abstract: Nowadays Extensible Mark-up Language (XML) is a dominant technology for formatting and exchanging data across the Internet world. Updating and retrieving a massive amount of XML data is an interesting and active research area. In addition, indexing XML data is a significant task to improve the efficiency of XML queries. Labelling nodes is the used technique for indexing XML data efficiently. There are many labelling schemes that have been proposed. However, these schemes have many limitations and shortcomings. Therefore, this paper aims to propose a new XML labelling scheme that addresses the issue of efficiency of XML query performance. Thus, this paper developed a new XML labelling scheme. Consequently, four experiments were designed in order to evaluate this. The results of these experiments suggest that the proposed scheme achieved the target results and showed an improvement in the performance and the efficiency of labelling XML documents.
Reza Samavi, Mariano P. Consens, Shahan Khatchadourian, Thodoros Topaloglou
13 Sep 2023
TL;DR: DescribeX is a novel visualization technique for analyzing PSI-MI XML collections at the instance level, providing insights into schema usage, common patterns, and evolution.
Abstract: <p>PSI-MI has been endorsed by the protein informatics community as a standard XML data exchange format for protein-protein interaction datasets. While many public databases support the standard, there is a degree of heterogeneity in the way the proposed XML schema is interpreted and instantiated by different data providers. Analysis of schema instantiation in large collections of XML data is a challenging task that is unsupported by existing tools. In this study we use DescribeX, a novel visualization technique of (semi-)structured XML formats, to quantitatively and qualitatively analyze PSI-MI XML collections at the instance level with the goal of gaining insights about schema usage and to study specific questions such as: adequacy of controlled vocabularies, detection of common instance patterns, and evolution of different data collections. Our analysis shows DescribeX enhances understanding the instance-level structure of PSI-MI data sources and is a useful tool for standards designers, software developers, and PSI-MI data providers. </p>