TL;DR: The main result of the paper is that typechecking for k-pebble transducers is decidable, and therefore, typechecking can be performed for a broad range of XML transformation languages, including XML-QL and a fragment of XSLT.
Abstract: We study the typechecking problem for XML transformers: given an XML transformation program and a DTD for the input XML documents, check whether every result of the program conforms to a specified output DTD. We model XML transformers using a novel device called a k-pebble transducer, that can express most queries without data-value joins in XML-QL, XSLT, and other XML query languages. Types are modeled by regular tree languages, a nobust extension of DTDs. The main result of the paper is that typechecking for k-pebble transducers is decidable. Consequently, typechecking can be performed for a broad range of XML transformation languages, including XML-QL and a fragment of XSLT.
TL;DR: This work proposes an extension to XML query languages that enables keyword search at the granularity of XML elements, that helps novice users formulate queries, and also yields new optimization opportunities for the query processor.
Abstract: Due to the popularity of the XML data format, several query languages for XML have been proposed, specially devised to handle data of which the structure is unknown, loose, or absent. While these languages are rich enough to allow for querying the content and structure of an XML document, a varying or unknown structure can make formulating queries a very difficult task. We propose an extension to XML query languages that enables keyword search at the granularity of XML elements, that helps novice users formulate queries, and also yields new optimization opportunities for the query processor. We present an implementation of this extension on top of a commercial RDBMS; we then discuss implementation choices and performance results.
TL;DR: NATHIX is introduced, an efficient, native repository for storing, retrieving and managing tree-structured large objects, preferably XML documents, that takes the semantics of the underlying tree structure of XML documents into account.
Abstract: We introduce NATIX, an efficient, native repository for storing, retrieving and managing tree-structured large objects, preferably XML documents. In contrast to traditionallarge object (LOB) managers, we do not split at arbitrary byte positions but take the semantics of the underlying tree structure of XML documents into account. Our parameterizable split algorithm dynamically maintains physical records of size smaller than a page which contain sets of connected tree nodes. This not only improves efficiency by clustering subtrees but also facilitates their compact representation. Existing approaches to store XML documents either use flat files or map every single tree node onto a separate physical record. The increased flexibility of our approach results in higher efficiency. Performance measurements validate this claim.
TL;DR: The results of the experiments with real-life and synthetic DTDs demonstrate the effectiveness of XTRACT's approach in inferring concise and semantically meaningful DTD schemas for XML databases.
Abstract: XML is rapidly emerging as the new standard for data representation and exchange on the Web. An XML document can be accompanied by a Document Type Descriptor (DTD) which plays the role of a schema for an XML data collection. DTDs contain valuable information on the structure of documents and thus have a crucial role in the efficient storage of XML data, as well as the effective formulation and optimization of XML queries. In this paper, we propose XTRACT, a novel system for inferring a DTD schema for a database of XML documents. Since the DTD syntax incorporates the full expressive power of regular expressions, naive approaches typically fail to produce concise and intuitive DTDs. Instead, the XTRACT inference algorithms employ a sequence of sophisticated steps that involve: (1) finding patterns in the input sequences and replacing them with regular expressions to generate “general” candidate DTDs, (2) factoring candidate DTDs using adaptations of algorithms from the logic optimization literature, and (3) applying the Minimum Description Length (MDL) principle to find the best DTD among the candidates. The results of our experiments with real-life and synthetic DTDs demonstrate the effectiveness of XTRACT's approach in inferring concise and semantically meaningful DTD schemas for XML databases.
TL;DR: In this article, an extension to XML query languages that enables keyword search at the granularity of XML elements, that helps novice users formulate queries, and also yields new optimization opportunities for the query processor.
Abstract: Due to the popularity of the XML data format, several query languages for XML have been proposed, specially devised to handle data of which the structure is unknown, loose, or absent. While these languages are rich enough to allow for querying the content and structure of an XML document, a varying or unknown structure can make formulating queries a very difficult task. We propose an extension to XML query languages that enables keyword search at the granularity of XML elements, that helps novice users formulate queries, and also yields new optimization opportunities for the query processor. We present an implementation of this extension on top of a commercial RDBMS; we then discuss implementation choices and performance results.
TL;DR: In this paper, a data and an execution model that allow for efficient storage and retrieval of XML documents in a relational database is presented. But the model is strictly based on the notion of binary associations.
Abstract: In this paper, we present a data and an execution model that allow for efficient storage and retrieval of XML documents in a relational database. The data model is strictly based on the notion of binary associations: by decomposing XML documents into small, flexible and semantically homogeneous units we are able to exploit the performance potential of vertical fragmentation. Moreover, our approach provides clear and intuitive semantics, which facilitates the definition of a declarative query algebra. Our experimental results with large collections of XML documents demonstrate the effectiveness of the techniques proposed.
TL;DR: In this paper, a comparison of five representative query languages for XML, highlighting their common features and differences, is presented, with a focus on XML query languages. But no standard for XML query language has yet been decided, but discussion is ongoing within the World Wide Web Consortium and within many academic institutions and Internet-related major companies.
Abstract: XML is becoming the most relevant new standard for data representation and exchange on the WWW. Novel languages for extracting and restructuring the XML content have been proposed, some in the tradition of database query languages (i.e. SQL, OQL), others more closely inspired by XML. No standard for XML query language has yet been decided, but the discussion is ongoing within the World Wide Web Consortium and within many academic institutions and Internet-related major companies. We present a comparison of five, representative query languages for XML, highlighting their common features and differences.
TL;DR: An Access Control System for XML is described allowing for definition and enforcement of access restrictions directly on the structure and content of XML documents, thus providing a simple and effective way for users to protect information at the same granularity level provided by the language itself.
Abstract: More and more information is distributed in XML format, both on corporate Intranets and on the global Net. In this paper an Access Control System for XML is described allowing for definition and enforcement of access restrictions directly on the structure and content of XML documents, thus providing a simple and effective way for users to protect information at the same granularity level provided by the language itself.
TL;DR: In this article, an enterprise integration system is coupled to a number of legacy data sources, each of which uses different data formats and different access methods, and the integration system includes a back-end interface configured to convert input data source information to input XML documents and to convert output XML document to output data sources.
Abstract: An enterprise integration system is coupled to a number of legacy data sources. The data sources each use different data formats and different access methods. The integration system includes a back-end interface configured to convert input data source information to input XML documents and to convert output XML document to output data source information. A front-end interface converts the output XML documents to output HTML forms and the input HTML forms to the XML documents. A middle tier includes a rules engine and a rules database. Design tools are used to define the conversion and the XML documents. A network couples the back-end interface, the front-end interface, the middle tier, the design tools, and the data sources. Mobile agents are configured to communicate the XML documents over the network and to process the XML documents according to the rules.
TL;DR: A language to describe a mapping between an existing XML DTD and an existing relational schema is introduced and some of the interesting issues arising from such a mapping are discussed.
Abstract: XML is rapidly gaining momentum in e-commerce and Internet-based information exchange, where its simplicity and custom-defined tags make it usable as a semantics-preserving data exchange format. However, to realize this potential it is necessary to be able to extract structured data from XML documents and store it in a database, as well as to generate XML documents from data extracted from a database. Although many DBMS vendors are scrambling to extend their products to handle XML, there is a need for a lightweight, DBMS- and platform-independent load/extract utility as well. In this paper, we describe such a utility that solves the following problems: (1) loading data from XML documents into relational tables with a known schema, (2) creating XML documents according to a known document type definition (DTD) from data extracted from a database, (3) generating relational schemas from XML DTDs for on-the-fly storage of XML documents, and (4) generating XML DTDs from relational schemas for on-the-fly extraction of relational data. We introduce a language to describe a mapping between an existing XML DTD and an existing relational schema and discuss some of the interesting issues arising from such a mapping.
TL;DR: This paper instantiated the CES as an XML application called XCES, based on the same data architecture comprised of a primary encoded text and "standoff" annotation in separate documents, and demonstrated how XML mechanisms can be used to select from and manipulate annotated corpora encoded according toXCES specifications.
Abstract: The Corpus Encoding Standard (CES) is a part of the EAGLES Guidelines developed by the Expert Advisory Group on Language Engineering Standards (EAGLES) that provides a set of encoding standards for corpus-based work in natural language processing applications. We have instantiated the CES as an XML application called XCES, based on the same data architecture comprised of a primary encoded text and "standoff" annotation in separate documents. Conversion to XML enables use of some of the more powerful mechanisms provided in the XML framework, including the XSLT Transformation Language, XML Schemas, and support for interrescue reference together with an extensive path syntax for pointers. In this paper, we describe the differences between the CES and XCES DTDs and demonstrate how XML mechanisms can be used to select from and manipulate annotated corpora encoded according to XCES specifications. We also provide a general overview of XML and the XML mechanisms that are most relevant to language engineering research and applications.
TL;DR: In this article, the authors describe how to transform the static part of UML, i.e. class diagrams, into XML Document Type Definition (DTDs) by defining a suitable mapping reflecting the semantics of a UML specification in a DTD correctly.
Abstract: The eXtensible Markup Language (XML) is increasingly finding acceptance as a standard for storing and exchanging structured and semi-structured information. With its expressive power, XML enables a great variety of applications relying on such structures - notably product catalogs, digital libraries, and electronic data interchange (EDI). As the data schema, an XML Document Type Definition (DTD) is a means by which documents and objects can be structured. Currently, there is no suitable way to model DTDs conceptually. Our approach is to model DTDs and thus classes of documents on the basis of UML (Unified Modeling Language). We consider UML to be the connecting link between software engineering and document design, i.e., it is possible to design object-oriented software together with the necessary XML structures. For this reason, we describe how to transform the static part of UML, i.e. class diagrams, into XML DTDs. The major challenge for the transformation is to define a suitable mapping reflecting the semantics of a UML specification in a DTD correctly. Because of XML's specific properties, we slightly extend the UML language in a UML-compliant way. Our approach provides the stepping stone to bridge the gap between object-oriented software design and the development of XML data schemata.
TL;DR: This book teaches you all you need to know about XML - what it is, how it works, what technologies surround it, and how it can best be used in a variety of situations, from simple data transfer to using XML in your web pages.
Abstract: From the Publisher:
Extensible Markup Language (XML) is a rapidly maturing technology with powerful real-world applications, particularly for the management, display, and organization of data. Together with its many related technologies it is an essential technology for anyone using markup languages on the web or internally.
This book teaches you all you need to know about XML - what it is, how it works, what technologies surround it, and how it can best be used in a variety of situations, from simple data transfer to using XML in your web pages. It builds on the strengths of the first edition, and provides new material to reflect the changes in the XML landscape - notably SOAP and Web Services, and the publication of the XML Schemas Recommendation by the W3C.
This book covers:
XML syntax and writing well-formed XML
Using XML Namespaces
Transforming XML into other formats with XSLT
XPath and XPointer for locating specific XML data
XML Validation using DTDs and XML Schemas
Manipulating XML documents with the DOM and SAX 2.0
SOAP and Web Services
Displaying XML using CSS and XSL
Incorporating XML into tradition databases and n-tier architectures
XLink and XPointer for linking XML and non-XML resources
Beginning XML 2nd Edition is for any developer who is interested in learning to use XML in web, e-commerce or data-storage applications. Some knowledge of mark up, scripting, and/or object oriented programming languages is advantageous, but not essential, as the basis of these techniques are explained as required.
TL;DR: XMLQL as mentioned in this paper is a flexible XML search language that combines XML graph pattern matching with relevance estimations and produces ranked lists of XML subgraphs as search results for web search.
Abstract: XML query languages proposed so far are limited to Boolean retrieval in the sense that query results are sets of qualifying XML elements or subgraphs. This search paradigm is intriguing for closed collections of XML documents such as e-commerce catalogs, but we argue that it is inadequate for searching the Web where we would prefer ranked lists o results based on relevance estimation. IR-style Web search engines, on the other hand, are incapable of exploiting the additional information made explicit in the structure, element names, and attributes of XML documents. In this paper we present a compact query language, coined XXL for flexible XML search language, that reconciles both search paradigms by combining XML graph pattern matching with relevance estimations and producing ranked lists of XML subgraphs as search results. The paper describes the language design, sketches implementation issues, and presents preliminary experimental results.
TL;DR: In this article, a workflow server system is described, which uses an XML namespace designed to execute various workflow server services, such as XSL files, to allow users to modify the user interface and content.
Abstract: A workflow server system is provided which uses an XML namespace designed to execute various workflow server services. The workflow server may include an XML Execution Engine, which uses the XML namespace to execute commands issued by the user from a web browser. The use of the XML namespace allows users to easily modify the user interface and how content is handled without needing to contact the manufacturer of the workflow server or engage in a massive redesign of the server. The Workflow Server passes a user command to an XML Execution Engine, accesses an XML namespace to determine how to execute said command, executes said command, accessing a database if necessary, and returns an XML document back to user for display on the user's web browser, said XML document containing a reference to an XSL file.
TL;DR: This paper presents a compact query language, coined XXL for flexible XML search language, that reconciles both search paradigms by combining XML graph pattern matching with relevance estimations and producing ranked lists of XML subgraphs as search results.
Abstract: XML query languages proposed so far are limited to Boolean retrieval in the sense that query results are sets of qualifying XML elements or subgraphs. This search paradigm is intriguing for “closed” collections of XML documents such as e-commerce catalogs, but we argue that it is inadequate for searching the Web where we would prefer ranked lists of results based on relevance estimation. IR-style Web search engines, on the other hand, are incapable of exploiting the additional information made explicit in the structure, element names, and attributes of XML documents. In this paper we present a compact query language, coined XXL for “flexible XML search language”, that reconciles both search paradigms by combining XML graph pattern matching with relevance estimations and producing ranked lists of XML subgraphs as search results. The paper describes the language design, sketches implementation issues, and presents preliminary experimental results.
TL;DR: This paper presents the semantic knowledge that needs to be captured during the transformation to ensure a correct relational schema and shows a simple algorithm that can derive such semantic knowledge from the given XML Document Type Definition and preserve the knowledge by representing them in terms of semantic constraints in relational database terms.
Abstract: As Extensible Markup Language (XML) [5] is emerging as the data format of the internet era, there are increasing needs to efficiently store and query XML data. One way towards this goal is using relational database by transforming XML data into relational format. In this paper, we argue that existing transformation algorithms are not complete in the sense that they focus only on structural aspects and ignoring semantic aspects. We present the semantic knowledge that needs to be captured during the transformation to ensure a correct relational schema. Further, we show a simple algorithm that can 1) derive such semantic knowledge from the given XML Document Type Definition (DTD) and 2) preserve the knowledge by representing them in terms of semantic constraints in relational database terms. By combining the existing transformation algorithms and our constraints-preserving algorithm, one can transform XML DTD to relational schema where correct semantics and behaviors are guaranteed by the preserved constraints. Experimental results are also presented.
TL;DR: This paper discusses how XML data can be stored, managed and queried in the Oracle8i database, and presents Oracle's XML-enabling database technology.
Abstract: XML is here as the Internet standard for information exchange among e-businesses and applications. With its dramatic adoption and its ability to model structured, unstructured and semi-structured data, XML has the potential of becoming the data model for Internet data. In recent years, Oracle has evolved its DBMS to support complex, structured, and un-structured data. Oracle has now extended that technology to enable the storage and querying of XML data by evolving its DBMS to an XML enabled DBMS, Oracle8i. We present Oracle's XML-enabling database technology. In particular, we discuss how XML data can be stored, managed and queried in the Oracle8i database.
TL;DR: In this paper, an application server executes voice-enabled web applications by runtime execution of extensible markup language (XML) documents that define the voiceenabled web application to be executed.
Abstract: An application server executes voice-enabled web applications by runtime execution of extensible markup language (XML) documents that define the voice-enabled web application to be executed. The application server includes a hypertext markup language (HTML) conversion module configured for translating information present during runtime execution of an XML document into an HTML document. The system converts the XML document into an HTML document in a manner that is reversible, where all the information from the original XML document is preserved such that the HTML document can be converted back to the original XML document. In addition, the system supplies HTML-compliant formatting information to specifically identify formatting specifications for XML tags having implied formatting characteristics during runtime execution of the XML document. Moreover, the system generates HTML-compliant reference tags for each XML tag that refers to another XML object, based on the context of the XML tag during the runtime execution of the XML document. Hence, the generated HTML document includes all information used during runtime execution of the XML document, enabling the use of web analysis tools to analyze XML-defined applications by analyzing the HTML document for the structure of the XML document relative to other XML documents used to define the XML-defined application.
TL;DR: In this article, a translation mechanism is proposed to translate between a word processing document and an XML file, which is performed automatically by a computer system or other electronic device and eliminates the need for users to be familiar with the syntax of XML.
Abstract: A translation mechanism translates between a word processing document and an XML file. The translation facility may translate the word processing document into the XML file and, conversely, may translate the XML file into the word processing document. The mechanism may be partially integrated into a word processing package so that the translation from word processing document to XML file may be performed via the user interface provided by the word processing package. The translation mechanism is extensible and flexible so as to be able to translate different varieties of document types. The translation is performed automatically by a computer system or other electronic device and eliminates the need for the user to be familiar with the syntax of XML.
TL;DR: The proposed model extends the XPath data model, and is capable of representing change histories of XML documents, and various alternative approaches to the physical implementation of the model are presented.
Abstract: XML is expected to become the next generation standard language for exchanging data over the Internet. In general, the contents of XML documents may change as time goes by, and then, it is important to capture entire histories of those documents. In this paper, we propose a logical data model for representing histories of XML documents. The proposed model extends the XPath data model, and is capable of representing change histories of XML documents. Various alternative approaches to the physical implementation of the model are also presented.
TL;DR: In this paper, an XML import tool is used to import data from an XML file into a target repository by receiving user input for selecting data structures within the target repository, for selecting set of fields that belong to the selected set of data structures, and for mapping fields in the selected sets of fields to tags associated with data within the XML file.
Abstract: A system allows exchange of information by converting it to/from proprietary formats from/to XML. An XML import tool may be used to import data from an XML file into a target repository by receiving user input for selecting data structures within the target repository, for selecting set of fields that belong to the selected set of data structures, and for mapping fields in the selected set of fields to tags associated with data within the XML file. A set of commands is generated based on the user inputs for populating the one or more fields that are mapped to tags with the data in the XML file. The set of commands cause the one or more fields that are mapped to tags to be populated with the data in the XML file.
TL;DR: DB2 UDB XML Extender not only serves as a repository for both XML documents and their Document Type Definitions (DTDs), but also provides data management functionalities such as data integrity, security, recoverability and manageability.
Abstract: The eXtensible Markup Language (XML) is a key technology that facilitates both information exchange and e-business transactions. Starting with DB2 UDB Net.Data VI, an application can generate XML documents from SQL queries against DB2 or any ODBC compliant databases. Today DB2 UDB XML Extender not only serves as a repository for both XML documents and their Document Type Definitions (DTDs), but also provides data management functionalities such as data integrity, security, recoverability and manageability. The user has the option to store the entire document as an XML user-defined column or to decompose the document into multiple tables and columns. Fast search via indices is provided for both XML elements and attributes. Section search can be done against the content of the document. Query syntax adheres to W3C standards such as Extensive Stylesheet Language Transformation (XSLT) and XML Path Language (XPath) specifications. The user can retrieve the entire document or extract XML elements and attributes dynamically in an SQL query. In addition, XML Extender provides a stored procedure to generate XML documents from existing data. Together with Net.Data, one can browse the content of the XML documents via the Internet.
TL;DR: XML in a Nutshell covers the fundamental rules that all XML documents and authors must adhere to, detailing the grammar that specifies where tags may be placed, what they must look like, which element names are legal, how attributes attach to elements, and much more.
Abstract: From the Publisher:
XML, the Extensible Markup Language, is a W3C endorsed standard for document markup. Because of its ability to deliver portable data, XML is positioned to be a key web application technology.
Given the complexity and incredible potential of this powerful markup language, it is clear that every serious developer using XML for data or text formatting and transformation will need a comprehensive, easy-to- access desktop reference in order to take advantage of XML's full potential. XML in a Nutshell will assist developers in formatting files and data structures correctly for use in XML documents.
XML defines a basic syntax used to mark up data with simple, human-readable tags, and provides a standard format for computer documents. This format is flexible enough to be customized for transforming data between applications as diverse as web sites, electronic data inter-change, voice mail systems, and wireless devices, to name a few.
Developers can either write their own programs that interact with, massage, and manipulate the data in XML documents, or they can use off-the-shelf software like web browsers and text editors to work with XML documents. Either choice gives them access to a wide range of free libraries in a variety of languages that can read and write XML.
The XML specification defines the exact syntax this markup must follow: how elements are delimited by tags, what a tag looks like, what names are acceptable for elements, where attributes are placed, and so forth. XML doesn't have a fixed set of tags and elements that are supposed to work for everybody in all areas of interest for all time. It allows developers and writers to define the elements they need as they need them.
Although XML is quite flexible in the elements it allows to be defined, it is quite strict in many other respects. XML in a Nutshell covers the fundamental rules that all XML documents and authors must adhere to, detailing the grammar that specifies where tags may be placed, what they must look like, which element names are legal, how attributes attach to elements, and much more.
About the Authors:
Elliotte Rusty Harold is a noted writer and programmer, both on and off the Internet. His Cafe au Lait website has become one of the most popular independent Java sites on the internet, and his spin-off site Cafe con Leche for XML News and Resources has become one of the most popular XML sites on the internet. Elliotte is the author of O'Reilly's Java Network Programming.
W. Scott Means has been a professional software developer since 1988, when he joined Microsoft Corporation at the age of 17. He was one of the original developers of OS/2 1.1 and Windows NT, and did some of the early work on the Microsoft Network for the Advanced Technology and Business Development group. Most recently he is serving as the CEO of Industrial Web Machines, a new Internet venture based in Columbia, South Carolina.
TL;DR: In this paper, a logical data model for representing histories of XML documents is proposed, which extends the XPath data model and is capable of representing change histories of XML documents.
Abstract: XML is expected to become the next generation standard language for exchanging data over the Internet. In general, the contents of XML documents may change as time goes by, and then, it is important to capture entire histories of those documents. In this paper, we propose a logical data model for representing histories of XML documents. The proposed model extends the XPath data model, and is capable of representing change histories of XML documents. Various alternative approaches to the physical implementation of the model are also presented.
TL;DR: A novel feature of the algebra is the use of regular-expression types, similar in power to DTDs or XML Schemas, and closely related to Hasoya, Pierce, and Vouillon's work on Xduce.
Abstract: This document proposes an algebra for XML Query. The algebra has been submitted to the W3C XML Query Working Group. A novel feature of the algebra is the use of regular-expression types, similar in power to DTDs or XML Schemas, and closely related to Hasoya, Pierce, and Vouillon's work on Xduce. The iteration construct involves novel typing rules not encountered elsewhere (even in Xduce).
TL;DR: X-Ray is presented, a generic approach for integrating XML with relational database systems that proposes a meta schema and meta knowledge for resolving data model heterogeneity and schema heterogeneity and provides the basis for X-Ray to automatically compose XML documents out of the relational database when requested and decompose them when they have to be stored.
Abstract: Relational databases get more and more employed in order to store the content of a web site. At the same time, XML is fast emerging as the dominant standard at the hypertext level of web site management describing pages and links between them. Thus, the integration of XML with relational database systems to enable the storage, retrieval and update of XML documents is of major importance. This paper presents X-Ray, a generic approach for integrating XML with relational database systems. The key idea is that mappings may be defined between XML DTDs and relational schemata while preserving their autonomy. This is made possible by introducing a meta schema and meta knowledge for resolving data model heterogeneity and schema heterogeneity. Since the mapping knowledge is not hard-coded but rather reified within the meta schema, maintainability and changeability is enhanced. The meta schema provides the basis for X-Ray to automatically compose XML documents out of the relational database when requested and decompose them when they have to be stored.
TL;DR: An XML Document Type Definition (DTD) is developed for representing the schema of a Role-based Access Control (RBAC) Model and a conforming XML document containing the actual RBAC-based access control data for a commercial banking application.
Abstract: The use of Extensible Markup Language (XML) and its associated APIs, for information modeling and information interchange applications is being actively explored by the reseach community. In this paper we develop an XML Document Type Definition (DTD) for representing the schema of a Role-based Access Control (RBAC) Model and a conforming XML document containing the actual RBAC-based access control data for a commercial banking application. Based on this DTD, the XML document and the methods in the Document Object Model (DOM) API Level 1.0 standards, we describe three application tasks related to enterprise-wide implementation of RBAC. They are: (a)implementing an RBAC model for a database application (b)implementing RBAC models with identical data on two different database servers and (c)transforming data under an RBAC model to a different, but structurally similar model like Group-based Access Control model. Other potential Access Control Service applications exploiting the capabilities of some commercial XML processors are also outlined.