TL;DR: The main result of the paper is that typechecking for k-pebble transducers is decidable, and therefore, typechecking can be performed for a broad range of XML transformation languages, including XML-QL and a fragment of XSLT.
Abstract: We study the typechecking problem for XML transformers: given an XML transformation program and a DTD for the input XML documents, check whether every result of the program conforms to a specified output DTD. We model XML transformers using a novel device called a k-pebble transducer, that can express most queries without data-value joins in XML-QL, XSLT, and other XML query languages. Types are modeled by regular tree languages, a nobust extension of DTDs. The main result of the paper is that typechecking for k-pebble transducers is decidable. Consequently, typechecking can be performed for a broad range of XML transformation languages, including XML-QL and a fragment of XSLT.
TL;DR: The subtyping algorithm developed here is a variant of Aiken and Murphy's set-inclusion constraint solver, to which are added several optimizations and two new properties: the algorithm is provably complete, and it allows a useful "subtagging" relation between nodes with different labels in XML trees.
Abstract: We propose regular expression types as a foundation for XML processing languages. Regular expression types are a natural generalization of Document Type Definitions (DTDs), describing structures in XML documents using regular expression operators (i.e., *, ?, |, etc.) and supporting a simple but powerful notion of subtyping.The decision problem for the subtype relation is EXPTIME-hard, but it can be checked quite efficiently in many cases of practical interest. The subtyping algorithm developed here is a variant of Aiken and Murphy's set-inclusion constraint solver, to which are added several optimizations and two new properties: (1) our algorithm is provably complete, and (2) it allows a useful "subtagging" relation between nodes with different labels in XML trees.
TL;DR: This work proposes an extension to XML query languages that enables keyword search at the granularity of XML elements, that helps novice users formulate queries, and also yields new optimization opportunities for the query processor.
Abstract: Due to the popularity of the XML data format, several query languages for XML have been proposed, specially devised to handle data of which the structure is unknown, loose, or absent. While these languages are rich enough to allow for querying the content and structure of an XML document, a varying or unknown structure can make formulating queries a very difficult task. We propose an extension to XML query languages that enables keyword search at the granularity of XML elements, that helps novice users formulate queries, and also yields new optimization opportunities for the query processor. We present an implementation of this extension on top of a commercial RDBMS; we then discuss implementation choices and performance results.
TL;DR: NATHIX is introduced, an efficient, native repository for storing, retrieving and managing tree-structured large objects, preferably XML documents, that takes the semantics of the underlying tree structure of XML documents into account.
Abstract: We introduce NATIX, an efficient, native repository for storing, retrieving and managing tree-structured large objects, preferably XML documents. In contrast to traditionallarge object (LOB) managers, we do not split at arbitrary byte positions but take the semantics of the underlying tree structure of XML documents into account. Our parameterizable split algorithm dynamically maintains physical records of size smaller than a page which contain sets of connected tree nodes. This not only improves efficiency by clustering subtrees but also facilitates their compact representation. Existing approaches to store XML documents either use flat files or map every single tree node onto a separate physical record. The increased flexibility of our approach results in higher efficiency. Performance measurements validate this claim.
TL;DR: This work presents an access control model to protect information distributed on the Web that, by exploiting XML's own capabilities, allows the definition and enforcement of access restrictions directly on the structure and content of XML documents.
Abstract: Web-based applications greatly increase information availability and ease of access, which is optimal for public information. The distribution and sharing by theWeb of information that must be accessed in a selective way requires the definition and enforcement of security controls, ensuring that information will be accessible only to authorized entities. Approaches proposed to this end level, independently from the semantics of the data to be protected and for this reason result limited. The eXtensible Markup Language (XML), a markup language promoted by the World Wide Web Consortium (W3C), represents an important opportunity to solve this problem. We present an access control model to protect information distributed on the Web that, by exploiting XML's own capabilities, allows the definition and enforcement of access restrictions directly on the structure and content of XML documents. We also present a language for the specification of access restrictions that uses standard notations and concepts and briefly describe a system architecture for access control enforcement based on existing technology.
TL;DR: The results of the experiments with real-life and synthetic DTDs demonstrate the effectiveness of XTRACT's approach in inferring concise and semantically meaningful DTD schemas for XML databases.
Abstract: XML is rapidly emerging as the new standard for data representation and exchange on the Web. An XML document can be accompanied by a Document Type Descriptor (DTD) which plays the role of a schema for an XML data collection. DTDs contain valuable information on the structure of documents and thus have a crucial role in the efficient storage of XML data, as well as the effective formulation and optimization of XML queries. In this paper, we propose XTRACT, a novel system for inferring a DTD schema for a database of XML documents. Since the DTD syntax incorporates the full expressive power of regular expressions, naive approaches typically fail to produce concise and intuitive DTDs. Instead, the XTRACT inference algorithms employ a sequence of sophisticated steps that involve: (1) finding patterns in the input sequences and replacing them with regular expressions to generate “general” candidate DTDs, (2) factoring candidate DTDs using adaptations of algorithms from the logic optimization literature, and (3) applying the Minimum Description Length (MDL) principle to find the best DTD among the candidates. The results of our experiments with real-life and synthetic DTDs demonstrate the effectiveness of XTRACT's approach in inferring concise and semantically meaningful DTD schemas for XML databases.
TL;DR: This paper proposes access control policies and an associated model for XML documents, addressing peculiar protection requirements posed by XML, and allows the Security Administrator to choose different policies for documents not covered or only partially covered by the existingAccess control policies for document types.
Abstract: The Web is becoming the main information dissemination means in private and public organizations. As a consequence, several applications at both internet and intranet level need mechanisms to support a selective access to data available over the Web. In this context, developing an access control model, and related mechanisms, in terms of XML (eXtensible Markup Language) is an important step, because XML is increasingly used as the language for representing information exchanged over the Web. In this paper, we propose access control policies and an associated model for XML documents, addressing peculiar protection requirements posed by XML. A first requirement is that varying protection granularity levels should be supported to guarantee a differentiated protection of document contents. A second requirement arises from the fact that XML documents do not always conform to a predefined document type. To cope with these requirements, the proposed model supports varying protection granularity levels, ranging from a set of documents, to a single document or specific document portion(s). Moreover, it allows the Security Administrator to choose different policies for documents not covered or only partially covered by the existing access control policies for document types. An access control mechanism for the enforcement of the proposed model is finally described.
TL;DR: In this article, an extension to XML query languages that enables keyword search at the granularity of XML elements, that helps novice users formulate queries, and also yields new optimization opportunities for the query processor.
Abstract: Due to the popularity of the XML data format, several query languages for XML have been proposed, specially devised to handle data of which the structure is unknown, loose, or absent. While these languages are rich enough to allow for querying the content and structure of an XML document, a varying or unknown structure can make formulating queries a very difficult task. We propose an extension to XML query languages that enables keyword search at the granularity of XML elements, that helps novice users formulate queries, and also yields new optimization opportunities for the query processor. We present an implementation of this extension on top of a commercial RDBMS; we then discuss implementation choices and performance results.
TL;DR: In this paper, a data and an execution model that allow for efficient storage and retrieval of XML documents in a relational database is presented. But the model is strictly based on the notion of binary associations.
Abstract: In this paper, we present a data and an execution model that allow for efficient storage and retrieval of XML documents in a relational database. The data model is strictly based on the notion of binary associations: by decomposing XML documents into small, flexible and semantically homogeneous units we are able to exploit the performance potential of vertical fragmentation. Moreover, our approach provides clear and intuitive semantics, which facilitates the definition of a declarative query algebra. Our experimental results with large collections of XML documents demonstrate the effectiveness of the techniques proposed.
TL;DR: An Access Control System for XML is described allowing for definition and enforcement of access restrictions directly on the structure and content of XML documents, thus providing a simple and effective way for users to protect information at the same granularity level provided by the language itself.
Abstract: More and more information is distributed in XML format, both on corporate Intranets and on the global Net. In this paper an Access Control System for XML is described allowing for definition and enforcement of access restrictions directly on the structure and content of XML documents, thus providing a simple and effective way for users to protect information at the same granularity level provided by the language itself.
TL;DR: In this article, an enterprise integration system is coupled to a number of legacy data sources, each of which uses different data formats and different access methods, and the integration system includes a back-end interface configured to convert input data source information to input XML documents and to convert output XML document to output data sources.
Abstract: An enterprise integration system is coupled to a number of legacy data sources. The data sources each use different data formats and different access methods. The integration system includes a back-end interface configured to convert input data source information to input XML documents and to convert output XML document to output data source information. A front-end interface converts the output XML documents to output HTML forms and the input HTML forms to the XML documents. A middle tier includes a rules engine and a rules database. Design tools are used to define the conversion and the XML documents. A network couples the back-end interface, the front-end interface, the middle tier, the design tools, and the data sources. Mobile agents are configured to communicate the XML documents over the network and to process the XML documents according to the rules.
TL;DR: A language to describe a mapping between an existing XML DTD and an existing relational schema is introduced and some of the interesting issues arising from such a mapping are discussed.
Abstract: XML is rapidly gaining momentum in e-commerce and Internet-based information exchange, where its simplicity and custom-defined tags make it usable as a semantics-preserving data exchange format. However, to realize this potential it is necessary to be able to extract structured data from XML documents and store it in a database, as well as to generate XML documents from data extracted from a database. Although many DBMS vendors are scrambling to extend their products to handle XML, there is a need for a lightweight, DBMS- and platform-independent load/extract utility as well. In this paper, we describe such a utility that solves the following problems: (1) loading data from XML documents into relational tables with a known schema, (2) creating XML documents according to a known document type definition (DTD) from data extracted from a database, (3) generating relational schemas from XML DTDs for on-the-fly storage of XML documents, and (4) generating XML DTDs from relational schemas for on-the-fly extraction of relational data. We introduce a language to describe a mapping between an existing XML DTD and an existing relational schema and discuss some of the interesting issues arising from such a mapping.
TL;DR: In this article, the authors describe how to transform the static part of UML, i.e. class diagrams, into XML Document Type Definition (DTDs) by defining a suitable mapping reflecting the semantics of a UML specification in a DTD correctly.
Abstract: The eXtensible Markup Language (XML) is increasingly finding acceptance as a standard for storing and exchanging structured and semi-structured information. With its expressive power, XML enables a great variety of applications relying on such structures - notably product catalogs, digital libraries, and electronic data interchange (EDI). As the data schema, an XML Document Type Definition (DTD) is a means by which documents and objects can be structured. Currently, there is no suitable way to model DTDs conceptually. Our approach is to model DTDs and thus classes of documents on the basis of UML (Unified Modeling Language). We consider UML to be the connecting link between software engineering and document design, i.e., it is possible to design object-oriented software together with the necessary XML structures. For this reason, we describe how to transform the static part of UML, i.e. class diagrams, into XML DTDs. The major challenge for the transformation is to define a suitable mapping reflecting the semantics of a UML specification in a DTD correctly. Because of XML's specific properties, we slightly extend the UML language in a UML-compliant way. Our approach provides the stepping stone to bridge the gap between object-oriented software design and the development of XML data schemata.
TL;DR: This book teaches you all you need to know about XML - what it is, how it works, what technologies surround it, and how it can best be used in a variety of situations, from simple data transfer to using XML in your web pages.
Abstract: From the Publisher:
Extensible Markup Language (XML) is a rapidly maturing technology with powerful real-world applications, particularly for the management, display, and organization of data. Together with its many related technologies it is an essential technology for anyone using markup languages on the web or internally.
This book teaches you all you need to know about XML - what it is, how it works, what technologies surround it, and how it can best be used in a variety of situations, from simple data transfer to using XML in your web pages. It builds on the strengths of the first edition, and provides new material to reflect the changes in the XML landscape - notably SOAP and Web Services, and the publication of the XML Schemas Recommendation by the W3C.
This book covers:
XML syntax and writing well-formed XML
Using XML Namespaces
Transforming XML into other formats with XSLT
XPath and XPointer for locating specific XML data
XML Validation using DTDs and XML Schemas
Manipulating XML documents with the DOM and SAX 2.0
SOAP and Web Services
Displaying XML using CSS and XSL
Incorporating XML into tradition databases and n-tier architectures
XLink and XPointer for linking XML and non-XML resources
Beginning XML 2nd Edition is for any developer who is interested in learning to use XML in web, e-commerce or data-storage applications. Some knowledge of mark up, scripting, and/or object oriented programming languages is advantageous, but not essential, as the basis of these techniques are explained as required.
TL;DR: In this article, a workflow server system is described, which uses an XML namespace designed to execute various workflow server services, such as XSL files, to allow users to modify the user interface and content.
Abstract: A workflow server system is provided which uses an XML namespace designed to execute various workflow server services. The workflow server may include an XML Execution Engine, which uses the XML namespace to execute commands issued by the user from a web browser. The use of the XML namespace allows users to easily modify the user interface and how content is handled without needing to contact the manufacturer of the workflow server or engage in a massive redesign of the server. The Workflow Server passes a user command to an XML Execution Engine, accesses an XML namespace to determine how to execute said command, executes said command, accessing a database if necessary, and returns an XML document back to user for display on the user's web browser, said XML document containing a reference to an XSL file.
TL;DR: This paper presents a compact query language, coined XXL for flexible XML search language, that reconciles both search paradigms by combining XML graph pattern matching with relevance estimations and producing ranked lists of XML subgraphs as search results.
Abstract: XML query languages proposed so far are limited to Boolean retrieval in the sense that query results are sets of qualifying XML elements or subgraphs. This search paradigm is intriguing for “closed” collections of XML documents such as e-commerce catalogs, but we argue that it is inadequate for searching the Web where we would prefer ranked lists of results based on relevance estimation. IR-style Web search engines, on the other hand, are incapable of exploiting the additional information made explicit in the structure, element names, and attributes of XML documents. In this paper we present a compact query language, coined XXL for “flexible XML search language”, that reconciles both search paradigms by combining XML graph pattern matching with relevance estimations and producing ranked lists of XML subgraphs as search results. The paper describes the language design, sketches implementation issues, and presents preliminary experimental results.
TL;DR: In this paper, the authors present an algorithm that finds a type of optimal mapping based on the XML Document Type Definition (DTD) and statistics, which is derived from sample XML document sets and some knowledge about queries on XML document collections.
Abstract: XML becomes the standard for the representation of structured and semi-structured data on the Web. Relational and object-relational database systems are a well understood technique for managing and querying such large sets of structured data. Using an object-relational data model and an XML datatype, we show how a relevant subset of XML documents and their implied structure can be mapped onto database structures. Besides straight-forward mappings, there are some XML structures that cannot be easily mapped onto database structures. These structures would sometimes result in large database schemas and sparsely populated databases. As a consequence, such XML document fragments should be mapped onto database attributes of type XML and kept as is. The XML datatype implementation should support evaluating path expressions and fulltext operations. We present an algorithm that finds a type of optimal mapping based on the XML Document Type Definition (DTD) and statistics. The statistics are derived from sample XML document sets and some knowledge about queries on XML document collections.
TL;DR: This paper presents the semantic knowledge that needs to be captured during the transformation to ensure a correct relational schema and shows a simple algorithm that can derive such semantic knowledge from the given XML Document Type Definition and preserve the knowledge by representing them in terms of semantic constraints in relational database terms.
Abstract: As Extensible Markup Language (XML) [5] is emerging as the data format of the internet era, there are increasing needs to efficiently store and query XML data. One way towards this goal is using relational database by transforming XML data into relational format. In this paper, we argue that existing transformation algorithms are not complete in the sense that they focus only on structural aspects and ignoring semantic aspects. We present the semantic knowledge that needs to be captured during the transformation to ensure a correct relational schema. Further, we show a simple algorithm that can 1) derive such semantic knowledge from the given XML Document Type Definition (DTD) and 2) preserve the knowledge by representing them in terms of semantic constraints in relational database terms. By combining the existing transformation algorithms and our constraints-preserving algorithm, one can transform XML DTD to relational schema where correct semantics and behaviors are guaranteed by the preserved constraints. Experimental results are also presented.
TL;DR: This paper discusses how XML data can be stored, managed and queried in the Oracle8i database, and presents Oracle's XML-enabling database technology.
Abstract: XML is here as the Internet standard for information exchange among e-businesses and applications. With its dramatic adoption and its ability to model structured, unstructured and semi-structured data, XML has the potential of becoming the data model for Internet data. In recent years, Oracle has evolved its DBMS to support complex, structured, and un-structured data. Oracle has now extended that technology to enable the storage and querying of XML data by evolving its DBMS to an XML enabled DBMS, Oracle8i. We present Oracle's XML-enabling database technology. In particular, we discuss how XML data can be stored, managed and queried in the Oracle8i database.
TL;DR: XML takes the revolution a step further with a platform-independent language for interchanging data, building real-world applications in which both the code and the data are truly portable.
Abstract: Java revolutionized the programming world by providing a platform-independent programming language. XML takes the revolution a step further with a platform-independent language for interchanging data. Java and XML shows how to put the two together, building real-world applications in which both the code and the data are truly portable.
TL;DR: The proposed model extends the XPath data model, and is capable of representing change histories of XML documents, and various alternative approaches to the physical implementation of the model are presented.
Abstract: XML is expected to become the next generation standard language for exchanging data over the Internet. In general, the contents of XML documents may change as time goes by, and then, it is important to capture entire histories of those documents. In this paper, we propose a logical data model for representing histories of XML documents. The proposed model extends the XPath data model, and is capable of representing change histories of XML documents. Various alternative approaches to the physical implementation of the model are also presented.
TL;DR: In this paper, an XML import tool is used to import data from an XML file into a target repository by receiving user input for selecting data structures within the target repository, for selecting set of fields that belong to the selected set of data structures, and for mapping fields in the selected sets of fields to tags associated with data within the XML file.
Abstract: A system allows exchange of information by converting it to/from proprietary formats from/to XML. An XML import tool may be used to import data from an XML file into a target repository by receiving user input for selecting data structures within the target repository, for selecting set of fields that belong to the selected set of data structures, and for mapping fields in the selected set of fields to tags associated with data within the XML file. A set of commands is generated based on the user inputs for populating the one or more fields that are mapped to tags with the data in the XML file. The set of commands cause the one or more fields that are mapped to tags to be populated with the data in the XML file.
TL;DR: DB2 UDB XML Extender not only serves as a repository for both XML documents and their Document Type Definitions (DTDs), but also provides data management functionalities such as data integrity, security, recoverability and manageability.
Abstract: The eXtensible Markup Language (XML) is a key technology that facilitates both information exchange and e-business transactions. Starting with DB2 UDB Net.Data VI, an application can generate XML documents from SQL queries against DB2 or any ODBC compliant databases. Today DB2 UDB XML Extender not only serves as a repository for both XML documents and their Document Type Definitions (DTDs), but also provides data management functionalities such as data integrity, security, recoverability and manageability. The user has the option to store the entire document as an XML user-defined column or to decompose the document into multiple tables and columns. Fast search via indices is provided for both XML elements and attributes. Section search can be done against the content of the document. Query syntax adheres to W3C standards such as Extensive Stylesheet Language Transformation (XSLT) and XML Path Language (XPath) specifications. The user can retrieve the entire document or extract XML elements and attributes dynamically in an SQL query. In addition, XML Extender provides a stored procedure to generate XML documents from existing data. Together with Net.Data, one can browse the content of the XML documents via the Internet.
TL;DR: XML in a Nutshell covers the fundamental rules that all XML documents and authors must adhere to, detailing the grammar that specifies where tags may be placed, what they must look like, which element names are legal, how attributes attach to elements, and much more.
Abstract: From the Publisher:
XML, the Extensible Markup Language, is a W3C endorsed standard for document markup. Because of its ability to deliver portable data, XML is positioned to be a key web application technology.
Given the complexity and incredible potential of this powerful markup language, it is clear that every serious developer using XML for data or text formatting and transformation will need a comprehensive, easy-to- access desktop reference in order to take advantage of XML's full potential. XML in a Nutshell will assist developers in formatting files and data structures correctly for use in XML documents.
XML defines a basic syntax used to mark up data with simple, human-readable tags, and provides a standard format for computer documents. This format is flexible enough to be customized for transforming data between applications as diverse as web sites, electronic data inter-change, voice mail systems, and wireless devices, to name a few.
Developers can either write their own programs that interact with, massage, and manipulate the data in XML documents, or they can use off-the-shelf software like web browsers and text editors to work with XML documents. Either choice gives them access to a wide range of free libraries in a variety of languages that can read and write XML.
The XML specification defines the exact syntax this markup must follow: how elements are delimited by tags, what a tag looks like, what names are acceptable for elements, where attributes are placed, and so forth. XML doesn't have a fixed set of tags and elements that are supposed to work for everybody in all areas of interest for all time. It allows developers and writers to define the elements they need as they need them.
Although XML is quite flexible in the elements it allows to be defined, it is quite strict in many other respects. XML in a Nutshell covers the fundamental rules that all XML documents and authors must adhere to, detailing the grammar that specifies where tags may be placed, what they must look like, which element names are legal, how attributes attach to elements, and much more.
About the Authors:
Elliotte Rusty Harold is a noted writer and programmer, both on and off the Internet. His Cafe au Lait website has become one of the most popular independent Java sites on the internet, and his spin-off site Cafe con Leche for XML News and Resources has become one of the most popular XML sites on the internet. Elliotte is the author of O'Reilly's Java Network Programming.
W. Scott Means has been a professional software developer since 1988, when he joined Microsoft Corporation at the age of 17. He was one of the original developers of OS/2 1.1 and Windows NT, and did some of the early work on the Microsoft Network for the Advanced Technology and Business Development group. Most recently he is serving as the CEO of Industrial Web Machines, a new Internet venture based in Columbia, South Carolina.
TL;DR: In this paper, a logical data model for representing histories of XML documents is proposed, which extends the XPath data model and is capable of representing change histories of XML documents.
Abstract: XML is expected to become the next generation standard language for exchanging data over the Internet. In general, the contents of XML documents may change as time goes by, and then, it is important to capture entire histories of those documents. In this paper, we propose a logical data model for representing histories of XML documents. The proposed model extends the XPath data model, and is capable of representing change histories of XML documents. Various alternative approaches to the physical implementation of the model are also presented.
TL;DR: A novel feature of the algebra is the use of regular-expression types, similar in power to DTDs or XML Schemas, and closely related to Hasoya, Pierce, and Vouillon's work on Xduce.
Abstract: This document proposes an algebra for XML Query. The algebra has been submitted to the W3C XML Query Working Group. A novel feature of the algebra is the use of regular-expression types, similar in power to DTDs or XML Schemas, and closely related to Hasoya, Pierce, and Vouillon's work on Xduce. The iteration construct involves novel typing rules not encountered elsewhere (even in Xduce).
TL;DR: An XML Document Type Definition (DTD) is developed for representing the schema of a Role-based Access Control (RBAC) Model and a conforming XML document containing the actual RBAC-based access control data for a commercial banking application.
Abstract: The use of Extensible Markup Language (XML) and its associated APIs, for information modeling and information interchange applications is being actively explored by the reseach community. In this paper we develop an XML Document Type Definition (DTD) for representing the schema of a Role-based Access Control (RBAC) Model and a conforming XML document containing the actual RBAC-based access control data for a commercial banking application. Based on this DTD, the XML document and the methods in the Document Object Model (DOM) API Level 1.0 standards, we describe three application tasks related to enterprise-wide implementation of RBAC. They are: (a)implementing an RBAC model for a database application (b)implementing RBAC models with identical data on two different database servers and (c)transforming data under an RBAC model to a different, but structurally similar model like Group-based Access Control model. Other potential Access Control Service applications exploiting the capabilities of some commercial XML processors are also outlined.
TL;DR: Author-χ implements a discretionary access control model specifically tailored to the characteristics of XML documents, which allows a set-oriented and single-oriented document protection and a differentiated protection of document/document type contents.
Abstract: Author-χ is a Java-based system for access control to XML documents. Author-χ implements a discretionary access control model specifically tailored to the characteristics of XML documents. In particular, our system allows (i) a set-oriented and single-oriented document protection, by supporting authorizations both at document type and document level; (ii) a differentiated protection of document/document type contents by supporting multi-granularity protection objects and positive/negative authorizations; (iii) a controlled propagation of authorizations among protection objects, by enforcing multiple propagation options.
TL;DR: The use of UML class diagrams for modeling XML vocabularies and generating XML Schemas from the UML is described.
Abstract: The tools and processes used to design XML vocabularies (DTD or XML Schema) are generally different from those used for application design using UML. In addition, large XML vocabularies are often difficult to understand and communicate with business users. This research summary describes the use of UML class diagrams for modeling XML vocabularies and generating XML Schemas from the UML.
TL;DR: In this paper, an XML query is parsed and transformed into a language-neutral intermediate representation, which is then translated into an SQL query over the underlying relational tables and into instructions for a tagger.
Abstract: A method for publishing relational data as XML by translating XML queries into queries against an relational database. Conversion of the relational database into an XML database is not required. Each relational table is mapped to a virtual XML document, and XML queries are issued over these virtual documents. An XML query is parsed and transformed into a language-neutral intermediate representation, which is a sequence of operations describing how the output document is derived from the underlying relational tables. The intermediate representation is then translated into an SQL query over the underlying relational tables and into instructions for a tagger. The SQL query is executed, and the SQL query results are then fed into the tagger, which follows tagger instructions to generate the marked up output.