Abstract: XML has become the standard for publishing and exchanging data on the Web. However, most business data is managed and will remain to be managed by relational database management systems. As such, there is an increasing need to efficiently and accurately publish relational data as XML documents for Internet-based applications. One way to publish relational data is to provide virtual XML documents for relational data via an XML schema which is transformed from the underlying relational database schema such that users can access the relational database through the XML schema. In this paper, we discuss issues in transforming a relational database schema into the corresponding XML schema. We aim to preserve all integrity constraints defined in a relational database schema, to achieve high level of nesting and to avoid introducing data redundancy in the transformed XML schema. In the paper, we first propose a basic transformation algorithm which introduces no data redundancy, then we improve the algorithm by exploring further nesting of the transformed XML schema.
Abstract: <p>Generally, most Web applications use relational databases to store and retrieve information. But, the growing acceptance of XML technologies for documents it is logical that security should be integrated with XML solutions. In a web application, an improper user inputs is a main cause for a wide variety of attacks. XML Path or XPath language is used for querying information from the nodes of an XML document. XPath Injection is an attack technique, much like SQL injection, exists when a malicious user can insert arbitrary XPath code into form fields and URL query parameters in order to inject this code directly into the XPath query evaluation engine. Through the crafted input a malicious user would bypass authentication or to access restricted data from the XML data source.Hence, we proposed an approach to detect XPath injection attack in XML databases at runtime. Our approach intercept XPath expression and parse the XQuery expression to find the inputs to be placed in the expression. The identified inputs are used to design an XML file and it would be validated through a schema.</p>
Abstract: Most current solutions for Web Based Instruction (WBI) use a centralized management model and a proprietary internal representation. The AVANTE Architecture is a WBI environment assembled using CORBA distributed components, implementing core services such as course management, user authentication, collaborative work, database access, presentation, and others. The AVANTE components conform to a 4-tiered model, with Client, Presentation, Management, and Low-Level Services component sublayers. Emergent XML standards for WBI describe all metadata definitions. Components at the Management layer manipulate JDBC-SQL data from the Low- Level Services Layer, and combine it with corresponding XML Schemas, instantiating course objects as new XML descriptions and component services. A filter-mapping service in the Presentation layer produces the dynamic HTML web pages needed for user interaction, processing these XML descriptions by applying one or more previously defined XSL stylesheets. A similar mechanism implements interface customization and remote service administration. The developed WBI system was deployed with open source software. Adding CORBA components easily achieves on-demand scalability. Future services include auditing, adaptive interfaces, grading, content development, and integration with existing systems.
TL;DR: This paper extends Cetus, a source-to-source compiler, with XML capabilities, enabling XPath-based searching and extensibility, and presents Sirius, an XML-to-C converter, reducing code complexity and improving maintainability.
Abstract: This paper presents an extension that adds XML capabilities to Cetus, a source-to-source compiler developed by Purdue University. In this work, the Cetus Intermediate Represen-tation is converted into an XML DOM tree that, in turn, enables XML capabilities, such as searching specic code features through XPath expressions. As an example, we write an XPath code to nd private and shared vari-ables for parallel execution in C source code. Loopest is a Java program with embedded XPath expressions. While Cetus needs 2573 lines of internal JAVA code to locate private variables in an input code, Loopest needs a total of only 425 lines of code to determine the same private variables in the equivalent XML representation. Using XPath as search method provides a second advantage over Cetus: extensibility. Changes in Cetus requires a deep knowledge of Java, Cetus internal structure, and its Inter-mediate Representation. Moreover, changes in Loopest are easier because it only depends on XPath to generate reports. Finally, we present Sirius, an XML DOM tree-to-C con-verter, that allows to generate the new output C code based on the annotations done in the XML tree.
Abstract: Extensible Markup Language (XML) is emerging as the primary standard for representing and exchanging data, with more than 60% of the total; XML considered the most dominant document type over the web; nevertheless, their quality is not as expected. XML integrity constraint especially XFD plays an important role in keeping the XML dataset as consistent as possible, but their ability to solve data quality issues is still intangible. The main reason is that old-fashioned data dependencies were basically introduced to maintain the consistency of the schema rather than that of the data. The purpose of this study is to introduce a method for discovering pattern tableaus for XML conditional dependencies to be used for enhancing XML document consistency as a part of data quality improvement phases. The notations of the conditional dependencies as new rules are designed mainly for improving data instance and extended traditional XML dependencies by enforcing pattern tableaus of semantically related constants. Subsequent to this, a set of minimal approximate conditional dependencies (XCFD, XCIND) is discovered and learned from the XML tree using a set of mining algorithms. The discovered patterns can be used as a Master data in order to detect inconsistencies that don't respect the majority of the dataset.
Abstract: Problem statement: In order to facilitate XML query processing, labeling schemes are used to determine the structural relationships between XML nodes.However, labeling schemes have to reliable the existing nodes or recalculate the label values when a new node is inserted into the XML document during XML update process.EXEL as a labeling scheme is able to remove relabeling for existing nodes during XML update process.Also, it is able to compute the structural relationship between nodes effectively.However, for the case of skewed insertions where nodes are always inserted at a fixed place, the label size of EXEL scheme increases very fast.Approach: This study discussed how to control the increment of label size for the EXEL scheme.In addition, EXEL does not consider the process of deleting labels.We also study how to reuse the deleted labels for future label insertions.Results: We proposed an algorithm which is able to control the label size increment.Conclusion: It required less storage size to store the inserted binary bit string and thus can improve query performance.
Abstract: Most current solutions for Web Based Instruction (WBI) use a centralized management model and a proprietary internal representation. The AVANTE Architecture is a WBI environment assembled using CORBA distributed components, implementing core services such as course management, user authentication, collaborative work, database access, presentation, and others. The AVANTE components conform to a 4-tiered model, with Client, Presentation, Management, and Low-Level Services component sublayers. Emergent XML standards for WBI describe all metadata definitions. Components at the Management layer manipulate JDBC-SQL data from the Low- Level Services Layer, and combine it with corresponding XML Schemas, instantiating course objects as new XML descriptions and component services. A filter-mapping service in the Presentation layer produces the dynamic HTML web pages needed for user interaction, processing these XML descriptions by applying one or more previously defined XSL stylesheets. A similar mechanism implements interface customization and remote service administration. The developed WBI system was deployed with open source software. Adding CORBA components easily achieves on-demand scalability. Future services include auditing, adaptive interfaces, grading, content development, and integration with existing systems.
TL;DR: This study introduces a method to discover XML conditional dependencies for data quality issues, using pattern tableaus and mining algorithms to enhance XML document consistency and detect inconsistencies in datasets.
Abstract: Extensible Markup Language (XML) is emerging as the primary standard for representing and exchanging data, with more than 60% of the total; XML considered the most dominant document type over the web; nevertheless, their quality is not as expected. XML integrity constraint especially XFD plays an important role in keeping the XML dataset as consistent as possible, but their ability to solve data quality issues is still intangible. The main reason is that old-fashioned data dependencies were basically introduced to maintain the consistency of the schema rather than that of the data. The purpose of this study is to introduce a method for discovering pattern tableaus for XML conditional dependencies to be used for enhancing XML document consistency as a part of data quality improvement phases. The notations of the conditional dependencies as new rules are designed mainly for improving data instance and extended traditional XML dependencies by enforcing pattern tableaus of semantically related constants. Subsequent to this, a set of minimal approximate conditional dependencies (XCFD, XCIND) is discovered and learned from the XML tree using a set of mining algorithms. The discovered patterns can be used as a Master data in order to detect inconsistencies that don't respect the majority of the dataset.
Abstract: Problem statement: In order to facilitate XML query processing, labeling schemes are used to determine the structural relationships between XML nodes.However, labeling schemes have to reliable the existing nodes or recalculate the label values when a new node is inserted into the XML document during XML update process.EXEL as a labeling scheme is able to remove relabeling for existing nodes during XML update process.Also, it is able to compute the structural relationship between nodes effectively.However, for the case of skewed insertions where nodes are always inserted at a fixed place, the label size of EXEL scheme increases very fast.Approach: This study discussed how to control the increment of label size for the EXEL scheme.In addition, EXEL does not consider the process of deleting labels.We also study how to reuse the deleted labels for future label insertions.Results: We proposed an algorithm which is able to control the label size increment.Conclusion: It required less storage size to store the inserted binary bit string and thus can improve query performance.
Abstract: This document explores the utilization of Apache Hive external tables for efficient XML data processing. XML (eXtensible Markup Language) is a widely used format for data interchange, and processing XML data efficiently poses challenges, especially when dealing with large datasets. Apache Hive, a data warehousing infrastructure built on top of Hadoop, offers a solution for processing structured data by providing a SQL-like interface. By leveraging Apache Hive external tables, XML data can be efficiently processed and queried in a distributed environment. This paper discusses the benefits of using external tables for XML data processing, provides a step-by-step guide for setting up and querying XML data in Apache Hive, and presents performance benchmarks demonstrating the efficiency of this approach.