TL;DR: This survey considers two classes of major XML query processing techniques: the relational approach and the native approach, which could result in higher query processing performance and also significantly reduce system reengineering costs.
Abstract: Extensible markup language (XML) is emerging as a de facto standard for information exchange among various applications on the World Wide Web. There has been a growing need for developing high-performance techniques to query large XML data repositories efficiently. One important problem in XML query processing is twig pattern matching, that is, finding in an XML data tree D all matches that satisfy a specified twig (or path) query pattern Q. In this survey, we review, classify, and compare major techniques for twig pattern matching. Specifically, we consider two classes of major XML query processing techniques: the relational approach and the native approach. The relational approach directly utilizes existing relational database systems to store and query XML data, which enables the use of all important techniques that have been developed for relational databases, whereas in the native approach, specialized storage and query processing systems tailored for XML data are developed from scratch to further improve XML query performance. As implied by existing work, XML data querying and management are developing in the direction of integrating the relational approach with the native approach, which could result in higher query processing performance and also significantly reduce system reengineering costs.
TL;DR: In this article, a device control system including at least one device operable by the system, a processor, and two or more communication components, each communication component including an XML parser for parsing the XML document and extracting the message data.
Abstract: A device control system including at least one device operable by the system, at least one processor, software executing on the at least one processor for receiving message data and determining a corresponding XML document type, software executing on the at least one processor for generating a XML document based on the XML document type, the XML document including the message data, software executing on the processor for packetizing the XML document, and two or more communication components, each communication component including an XML parser for parsing the XML document and extracting the message data.
TL;DR: This paper evaluates existing range-based and prefix-based labeling schemes, before proposing its own scheme based on DeweyIDs, which is experimentally explored as a general and immutable node labeling mechanism, stress its synergetic potential for query processing and locking, and show how it can be implemented efficiently.
Abstract: We explore suitable node labeling schemes used in collaborative XML DBMSs (XDBMSs, for short) supporting typical XML document processing interfaces. Such schemes have to provide holistic support for essential XDBMS processing steps for declarative as well as navigational query processing and, with the same importance, lock management. In this paper, we evaluate existing range-based and prefix-based labeling schemes, before we propose our own scheme based on DeweyIDs. We experimentally explore its suitability as a general and immutable node labeling mechanism, stress its synergetic potential for query processing and locking, and show how it can be implemented efficiently. Various compression and optimization measures deliver surprising space reductions, frequently reduce the size of storage representation-compared to an already space-efficient encoding scheme-to less than 20-30% in the average and, thus, conclude their practical relevance.
TL;DR: In this article, an application program interface (API) is provided for requesting, storing, and accessing data within a health integration network, which facilitates secure and seamless access to the centrally-stored data by offering authentication/authorization, as well as the ability to receive requests in an extensible language format, such as XML, and returns resulting data in XML format.
Abstract: An application program interface (API) is provided for requesting, storing, and otherwise accessing data within a health integration network. The API facilitates secure and seamless access to the centrally-stored data by offering authentication/authorization, as well as the ability to receive requests in an extensible language format, such as XML, and returns resulting data in XML format. The data can also have transformation, style and/or schema information associated with it which can be returned in the resulting XML and/or applied to the data beforehand by the API. The API can be utilized in many environment architectures including XML over HTTP and a software development kit (SDK).
TL;DR: The ability to compute XML key propagation is a first step toward establishing a connection between XML data and its relational representation at the semantic level.
TL;DR: The core part of the paper describes the infrastructural services for XML document storage with compressed DeweyIDs, the principles and methods for navigational and declarative processing of queries, as well as the lock modes and protocols to enable efficient collaboration.
Abstract: Implementation techniques for relational database management systems (DBMSs) have proven their efficiency and robustness in many existing systems. However, many of these concepts and mechanisms cannot be used when implementing a native XML DBMS (XDBMS) because of substantial differences in the processing properties of natively stored XML documents as compared to relational tables. Therefore, we have to develop new and appropriate techniques with ACID transaction guarantees tailored to the processing characteristics of tree documents and the operations on them. For this reason, we want to provide for an efficient infrastructure of XDBMSs consisting of tree node addressing and indexing together with fine-grained locking of tree nodes. In this respect, our prime and novel contribution is to reveal the potential of our prefix-based node labeling called DeweyIDs supporting record addressing, indexing, and locking protocols. In this paper, we first sketch our version of prefix-based node labeling and summarize a quantitative study on them. An overview of our layered XDBMS architecture indicates the concepts and functionalities to be reused from relational DBMS implementations. The core part of the paper describes the infrastructural services for XML document storage with compressed DeweyIDs, the principles and methods for navigational and declarative processing of queries, as well as the lock modes and protocols to enable efficient collaboration. Selected empirical experiments evaluate the XTC system performance and support our system assessment.
TL;DR: This paper proposes the Probabilistic Interval XML (PIXML for short) data model, and provides an operational semantics that may be used to compute answers to queries and that is correct for a large class of probabilistic instances.
Abstract: Interest in XML databases has been expanding rapidly over the last few years. In this paper, we study the problem of incorporating probabilistic information into XML databases. We propose the Probabilistic Interval XML (PIXML for short) data model in this paper. Using this data model, users can express probabilistic information within XML markups. In addition, we provide two alternative formal model-theoretic semantics for PIXML data. The first semantics is a “global” semantics which is relatively intuitive, but is not directly amenable to computation. The second semantics is a “local” semantics which supports efficient computation. We prove several correspondence results between the two semantics. To our knowledge, this is the first formal model theoretic semantics for probabilistic interval XML. We then provide an operational semantics that may be used to compute answers to queries and that is correct for a large class of probabilistic instances.
TL;DR: Reference-based SQL/XML operators as discussed by the authors return a reference to a node to determine whether the corresponding node comes logical before, after, or is the same as another node.
Abstract: Techniques for processing reference-based SQL/XML operators are provided. Instead of extracting copies of one or more nodes from XML data, a reference-based operator returns a reference to a node. Such a reference is used to determine, for example, whether the corresponding node comes logical before, after, or is the same as another node. An SQL/XML query that includes a reference-based operator may be the original query, or may be generated (e.g., rewritten) from a non-SQL/XML query, such as an XQuery query. One or more physical rewrites may be performed on the SQL/XML query, depending on how the XML data is stored and/or whether an XML index exists for the XML data.
TL;DR: In this article, computer-implemented methods and computer-readable storage media are disclosed for facilitating browser-based, what-you-see-is-whatyou-get (WYSIWYG) editing of an extensible markup language (XML) file.
Abstract: Computer-implemented methods and computer-readable storage media are disclosed for facilitating browser-based, what-you-see-is-what-you-get (WYSIWYG) editing of an extensible markup language (XML) file. A browser executing on a local computing system is used to access a hypertext markup language (HTML) representation of an extensible markup language (XML) file. The HTML representation includes a plurality of elements of the XML file formatted in accordance with an extensible stylesheet language (XSL) transform associated with the XML file. A plurality of editing handlers is inserted within the HTML representation to facilitate modifying the HTML representation and applying the changes to the XML file. A user is permitted to modify the HTML representation for purposes of applying the modifications to the XML file.
TL;DR: A stealing-based dynamic load-balancing mechanism, called ThreadCrew, by which multiple threads are able to process the disjointed parts of the XML document in parallel with balanced load distribution, and a novel mechanism to trace the stealing actions is provided.
Abstract: A language for semi-structured documents, XML has emerged as the core of the web services architecture, and is playing crucial roles in messaging systems, databases, and document processing. However, the processing of XML documents has been regarded as the performance bottleneck in most systems and applications. On the other side, the multicore processor, emerged as a solution for the clock-speed limitation of the modern CPUs, has been growingly prevalent. Leveraging the parallelism provided by the multicorere source to speedup the software execution is becoming the trend of the software development. In this paper, we present a parallel processing model for the XML document. The model is not designed just for a specific XML processing task, instead, it is a general model, by which we are able to explore various parallel XML document processing. The kernel of the model is a stealing-based dynamic load-balancing mechanism, called ThreadCrew, by which multiple threads are able to process the disjointed parts of the XML document in parallel with balanced load distribution. The model also provides a novel mechanism to trace the stealing actions, thus the equivalent sequential result can be gotten by gluing the multiple parallel-running results together. To show the feasibility and effectiveness of our approaches, we present our C# implementation of parallel XML serialization in this paper. Our empirical study shows our parallel XML serialization algorithm can improved the XML serializing performance significantly on a multicore machine.
TL;DR: ViST as mentioned in this paper is a novel index structure for searching XML documents that uses tree structures as the basic unit of query to avoid expensive join operations, and provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over indexing either just content or structure.
Abstract: The present invention provides a ViST (or “virtual suffix tree”), which is a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, it is shown that querying XML data is equivalent to finding (non-contiguous) subsequence matches. A variety of XML queries, including those with branches, or wild-cards (‘*’ and ‘//’), can be expressed by structure-encoded sequences. Unlike index methods that disassemble a query into multiple sub-queries, and then join the results of these sub-queries to provide the final answers, ViST uses tree structures as the basic unit of query to avoid expensive join operations. Furthermore, ViST provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over methods indexing either just content or structure. ViST supports dynamic index update, and it relies solely on B+Trees without using any specialized data structures that are not well supported by common database management systems (hereinafter referred to as “DBMSs”).
TL;DR: A system for providing XML-based asynchronous and interactive feeds for Web applications that provides a highly efficient and extensible XML Javascript framework allowing easy insertion of a comment/news feed control into any Web page as discussed by the authors.
Abstract: A system for providing XML-based asynchronous and interactive feeds for Web applications that provides a highly efficient and extensible XML Javascript framework allowing easy insertion of a comment/news feed control into any Web page. The framework allows for reading of any XML format and provides a new and easy way for modifying the look-and-feel of the control via HTML templates with familiar XPath bindings. The rendering performed through the system supports both flat and indented (“threaded”) views for a comment thread. The system improves the parsing speed of incoming XML, and supports a flexible event model for others to develop plug-ins and mashups in the spirit of Web 2.0.
TL;DR: This paper presents a three-phase framework for high-performance XML-to-XML transformation based on schema mappings, and elaborate on novel techniques such as streamed extraction of mapped source values and scalable disk-based merging of overlapping data.
Abstract: Clio is an existing schema-mapping tool that provides user-friendly means to manage and facilitate the complex task of transformation and integration of heterogeneous data such as XML over the Web or in XML databases. By means of mappings from source to target schemas, Clio can help users conveniently establish the precise semantics of data transformation and integration. In this paper we study the problem of how to efficiently implement such data transformation (i.e., generating target data from the source data based on schema mappings). We present a three-phase framework for high-performance XML-to-XML transformation based on schema mappings, and discuss methodologies and algorithms for implementing these phases. In particular, we elaborate on novel techniques such as streamed extraction of mapped source values and scalable disk-based merging of overlapping data (including duplicate elimination). We compare our transformation framework with alternative methods such as using XQuery or SQL/XML provided by current commercial databases. The results demonstrate that the three-phase framework (although as simple as it is) is highly scalable and outperforms the alternative methods by orders of magnitude.
TL;DR: This paper presents a new storage scheme for XML data that supports all navigational operations in near constant time, and features a small memory footprint that increases cache locality, whilst still supporting standard APIs and necessary database operations, such as queries and updates, efficiently.
Abstract: As XML database sizes grow, the amount of space used for storing the data and auxiliary data structures becomes a major factor in query and update performance. This paper presents a new storage scheme for XML data that supports all navigational operations in near constant time. In addition to supporting efficient queries, the space requirement of the proposed scheme is within a constant factor of the information theoretic minimum, while insertions and deletions can be performed in near constant time as well. As a result, the proposed structure features a small memory footprint that increases cache locality, whilst still supporting standard APIs, such as DOM, and necessary database operations, such as queries and updates, efficiently. Analysis and experiments show that the proposed structure is space and time efficient.
TL;DR: A taxonomy of changes for XML schema evolution is described and guidelines for writing queries in such a way that they continue to operate as expected across evolving schemas are proposed.
Abstract: In XML databases, new schema versions may be released as frequently as once every two weeks. This poster describes a taxonomy of changes for XML schema evolution. It examines the impact of those changes on schema validation and query evaluation. Based on that study, it proposes guidelines for XML schema evolution and for writing queries in such a way that they continue to operate as expected across evolving schemas.
TL;DR: XACU, a language for specifying access control on XML data in the presence of update operations, is proposed and a formal access control model is defined which allows to study properties of XACU access policies.
Abstract: Several languages have been proposed over the past years which support the specification of access control on XML data. Most of these languages consider read-access restrictions only and do not deal with access rights for updates(such as add, delete, or modify operations). Fine-grain XML update operations are subject to current research. This paper proposes XACU, a language for specifying access control on XML data in the presence of update operations. The update operations used in XACU are based on the W3CX Query Update Facility working draft. A formal access control model is defined which allows to study properties of XACU access policies. One essential property is consistency the policy should not allow the execution of a sequence of updates which has the same total effect as an update forbidden by the policy. Since XACU is a rich language with inherent ambiguities, checking consistency of a set of XACU rules is difficult, and undecidable in general.
TL;DR: XSAGs are the first scalable query language for XML streams that allows for actual data transformations rather than just document filtering and the XSAG formalism provides a strong intuition for which queries can or cannot be processed scalably on streams.
Abstract: We introduce the notion of XML Stream Attribute Grammars (XSAGs). XSAGs are the first scalable query language for XML streams (running strictly in linear time with bounded memory consumption independent of the size of the stream) that allows for actual data transformations rather than just document filtering. XSAGs are also relatively easy to use for humans. Moreover, the XSAG formalism provides a strong intuition for which queries can or cannot be processed scalably on streams. We introduce XSAGs together with the necessary language-theoretic machinery, study their theoretical properties such as expressiveness and complexity, and discuss their implementation.
TL;DR: It is shown how the system can be used to tackle various two-level transformation scenarios, such as XML schema evolution coupled with document migration, and hierarchical-relational data mappings that convert between XML documents and SQL databases.
Abstract: A two-level data transformation consists of a type-level transformation of a data format coupled with value-level transformations of data instances corresponding to that format. We have implemented a system for performing two-level transformations on XML schemas and their corresponding documents, and on SQL schemas and the databases that they describe. The core of the system consists of a combinator library for composing type-changing rewrite rules that preserve structural information and referential constraints. We discuss the implementation of the system's core library, and of its SQL and XML front-ends in the functional language Haskell. We show how the system can be used to tackle various two-level transformation scenarios, such as XML schema evolution coupled with document migration, and hierarchical-relational data mappings that convert between XML documents and SQL databases.
TL;DR: In this paper, a transition system and an extensible markup language (XML) representation of the data is generated by querying the XML representation using (markup) query language.
Abstract: The invention concerns model program analysis of software code using model checking. Initially, a transition system (22) and an extensible markup language (XML) (24) representation of the data is generated. Next, labels (26) for the transition system are generated by querying the XML representation of the data using (markup) query language. The labels and the structure of the transition system are then used as input to model checking techniques to analyse the software code (28). It is an advantage of the invention that the problem of labelling a transition system can be transformed into the XML domain so that detailed information about the software code can be extracted using queries in a format that can be run in the XML domain which are well known. At the same time the transformation to the XML domain does not prevent the use of efficient model checking technologies.
TL;DR: This paper presents a hybrid schema match algorithm, QMatch, that provides a unique path-based framework for harnessing traditional structural and semantic information, while exploiting the constraints inherent in XML documents such as the order of XML elements, to provide improved levels of matching between two given XML schemata.
Abstract: Integration of multiple heterogeneous data sources continues to be a critical problem for many application domains and a challenge for researchers world-wide. With the increasing popularity of the XML model and the proliferation of XML documents on-line, automated matching of XML documents and databases has become a critical issue. In this paper, we present a hybrid schema match algorithm, QMatch, that provides a unique path-based framework for harnessing traditional structural and semantic information, while exploiting the constraints inherent in XML documents such as the order of XML elements, to provide improved levels of matching between two given XML schemata. QMatch is based on the measurement of a unique quality of match metric, QoM, and a set of classifiers which together provide not only an effective basis for the development of a new schema match algorithm, but also a useful tool for tuning existing schema match algorithms to output at desired levels of matching. In this paper, we show via a set of experiments the benefits of the path-based QMatch over existing structural, linguistic, and hybrid algorithms such as Cupid, and provide an empirical measure of the accuracy of QMatch in terms of the true matches discovered by the algorithm.
TL;DR: This paper proposes a mild condition on SPJ views, and shows that under this condition the analysis of deletions on relational views becomes PTIME while the insertion analysis is NF-complete, and presents efficient algorithms to translate XML updates to relational view updates.
Abstract: This paper investigates the view update problem for XML views published from relational data. We consider (possibly) recursively defined XML views, compressed into DAGs and stored in relations. We provide new techniques to efficiently support XML view updates specified in terms of XFath expressions with recursion and complex filters. The interaction between XFath recursion and DAG compression of XML views makes the analysis of XML view updates intriguing. Furthermore, many issues are still open even for relational view updates, and need to be explored. In response to these, we revise the update semantics to accommodate XML side effects based on the semantics of XML views, and present efficient algorithms to translate XML updates to relational view updates. Moreover, we propose a mild condition on SPJ views, and show that under this condition the analysis of deletions on relational views becomes PTIME while the insertion analysis is NF-complete. Finally, we present an experimental study to verify the effectiveness of our techniques.
TL;DR: This paper proposes a wireless XML streaming method designed to provide energy-efficient access to a wireless stream, and designs three data/index replication strategies (PP, TT, and TP) in the streaming method.
TL;DR: This seminar summarizes the main ideas and results of the research, which investigates the use of several structural features -markup and (derived) metadata -for effective XML retrieval.
Abstract: The structure of documents provides a new source of information that retrieval systems may exploit to improve their search effectiveness. This seminar summarizes the main ideas and results of our research, which investigates the use of several structural features -markup and (derived) metadata -for effective XML retrieval. Our retrieval framework is based on the principle of polyrepresentation (Ingwersen,1994) and makes use of the available evidence collected from documents, queries, and contextual features to rank components of XML documents. We will present our approaches on three main topics: (1) new retrieval strategies that use structural information, (2) the use of relevance feedback techniques to refine the structural information given a user need, and (3) the study of the relationships between user search tasks and contextual factors and the structural characteristics of the relevant information. We evaluate these approaches using the INEX benchmark and show that structural information can be further exploited to improve retrieval effectiveness.
TL;DR: This paper shows how this fine-grained access control mechanism for XML data can be integrated with a next-of-kin (NoK) XML query processor to provide efficient, secure query evaluation.
Abstract: Fine-grained access controls for XML define access privileges at the granularity of individual XML nodes. In this paper, we present a fine-grained access control mechanism for XML data. This mechanism exploits the structural locality of access rights as well as correlations among the access rights of different users to produce a compact physical encoding of the access control data. This encoding can be constructed using a single pass over a labeled XML database. It is block-oriented and suitable for use in secondary storage. We show how this access control mechanism can be integrated with a next-of-kin (NoK) XML query processor to provide efficient, secure query evaluation. The key idea is that the structural information of the nodes and their encoded access controls are stored together, allowing the access privileges to be checked efficiently. Our evaluation shows that the access control mechanism introduces little overhead into the query evaluation process.
TL;DR: ‘tree tuple’ and ‘closest node’ XFDs both capture the semantics of FDs when a complete relation is mapped to an XML document via arbitrary nesting, and so there is essentially a common definition of an XFD in complete XML documents.
Abstract: With the growing use of the eXtensible Markup Language (XML) in database technology as a format for the permanent storage of data, the topic functional dependencies in XML (XFDs) has assumed increased importance because of its central role in database design. Recently, two different approaches have been proposed for defining an XFD. The first uses the concept of a ‘tree tuple’, whereas the second uses the concept of a ‘closest node’. In general, the two approaches are not comparable, but are comparable when a Document Type Definition is present and there is no missing information in the XML document. The first contribution of this article shows that when the two XFD definitions are comparable, the definitions are equivalent, and so there is essentially a common definition of an XFD in complete XML documents. The second contribution is to provide justification for the definition of a ‘closest node’ XFD. We show that if a complete flat relation is mapped to an XML document by an arbitrary sequence of nest operations, the XML document satisfies a ‘closest node’ XFD if and only if the relation satisfies the corresponding functional dependency. The class of XML documents generated in this fashion is a subset of the class of XML documents for which the two definitions of XFDs coincide. Hence ‘tree tuple’ and ‘closest node’ XFDs both capture the semantics of FDs when a complete relation is mapped to an XML document via arbitrary nesting.
TL;DR: An algorithm for relocating partitioned XML data based on the CPU load of query processing and it is found that there is a performance advantage in the approach for executing distributed query processing of large XML data.
Abstract: We propose an efficient distributed query processing method for large XML data by partitioning and distributing XML data to multiple computation nodes. There are several steps involved in this method; however, we focused particularly on XML data partitioning and dynamic relocation of partitioned XML data in our research. Since the efficiency of query processing depends on both XML data size and its structure, these factors should be considered when XML data is partitioned. Each partitioned XML data is distributed to computation nodes so that the CPU load can be balanced. In addition, it is important to take account of the query workload among each of the computation nodes because it is closely related to the query processing cost in distributed environments. In case of load skew among computation nodes, partitioned XML data should be relocated to balance the CPU load. Thus, we implemented an algorithm for relocating partitioned XML data based on the CPU load of query processing. From our experiments, we found that there is a performance advantage in our approach for executing distributed query processing of large XML data.
TL;DR: This paper presents a novel algorithm called DTD-Diff to detect the changes to DTDs that defines the structure of a set of XML documents, and shows that converting DTD to XML schema (XSD) and detecting the changes using existing XML change detection algorithms is not a feasible option.
Abstract: The DTD of a set of XML documents may change due to many reasons such as changes to the real-world events, changes to the user's requirements, and mistakes in the initial design. In this paper, we present a novel algorithm called DTD-Diff to detect the changes to DTDs that defines the structure of a set of XML documents. Such change detection tool can be useful in several ways such as maintenance of XML documents, incremental maintenance of relational schema for storing XML data, and XML schema integration. We compare DTD-Diff with existing XML change detection approaches and show that converting DTD to XML schema (XSD) (which is in XML document format) and detecting the changes using existing XML change detection algorithms is not a feasible option. Our experimental results show that DTD-Diff is 5-325 times faster than X-Diff when it detects the changes to the XSD files. Compared to XyDiff, DTD-Diff is up to 38 times faster. We also study the result quality of detected deltas.
TL;DR: An algorithm for measuring the structural similarity between an XML document and a Document Type Definition (DTD) considered as the simplest way for specifying structural constraints on XML documents is proposed.
Abstract: The automatic processing and management of XML-based data are ever more popular research issues due to the increasing abundant use of XML, especially on the Web Nonetheless, several operations based on the structure of XML data have not yet received strong attention Among these is the process of matching XML documents with XML grammars, useful in various applications such as documents classification, retrieval and selective dissemination of information In this paper, we propose an algorithm for measuring the structural similarity between an XML document and a Document Type Definition (DTD) considered as the simplest way for specifying structural constraints on XML documents We consider the various DTD operators that designate constraints on the existence, repeatability and alternativeness of XML elements/attributes Our approach is based on the concept of tree edit distance, as an effective and efficient means for comparing tree structures, XML documents and DTDs being modeled as ordered labeled trees It is of polynomial complexity, in comparison with existing exponential algorithms Classification experiments, conducted on large sets of real and synthetic XML documents, underline our approach effectiveness, as well as its applicability to large XML repositories and databases
TL;DR: ReXSA proposes candidate database schemas given an information model of the enterprise data that has the advantage of considering qualitative properties of the information model such as reuse, evolution and performance profiles for deciding how to persist the data.
Abstract: In response to the widespread use of the XML format for document representation and message exchange, major database vendors support XML in terms of persistence, querying and indexing. Specifically, the recently released IBM DB2 9 (for Linux, Unix and Windows) is a hybrid data server with optimized management of both XML and relational data. With the new option of storing and querying XML in a relational DBMS, data architects face the the decision of what portion of their data to persist as XML and what portion as relational data. This problem has not been addressed yet and represents a serious need in the industry. Hence, this paper describes ReXSA, a schema advisor tool that is being prototyped for IBM DB2 9. ReXSA proposes candidate database schemas given an information model of the enterprise data. It has the advantage of considering qualitative properties of the information model such as reuse, evolution and performance profiles for deciding how to persist the data. Finally, we show the viability and practicality of ReXSA by applying it to custom and real usecases.
TL;DR: The logic-programming language XCentric is presented, discussed design issues, and its adequacy for XML processing is shown, showing a substantial degree of flexibility in programming.
Abstract: Here we present the logic-programming language XCentric, discuss design issues, and show its adequacy for XML processing. Distinctive features of XCentric are a powerful unification algorithm for terms with functors of arbitrary arity (which correspond closely to XML documents) and a rich type language that uses operators such as repetition (*), alternation, etc, as types allowing a compact representation of terms with functors with an arbitrary number of arguments (closely related to standard type languages for XML). This new form of unification together with an appropriate use of types yields a substantial degree of flexibility in programming.