TL;DR: This paper compares two data interchange formats currently used by industry applications; XML and JSON and finds that JSON is significantly faster than XML and is further record other resource-related metrics in the results.
Abstract: This paper compares two data interchange formats currently used by industry applications; XML and JSON. The choice of an adequate data interchange format can have significant consequences on data transmission rates and performance. We describe the language specifications and their respective setting of use. A case study is then conducted to compare the resource utilization and the relative performance of applications that use the interchange formats. We find that JSON is significantly faster than XML and we further record other resource-related metrics in our results.
TL;DR: This paper designs novel formulae to identify the search for nodes and search via nodes of a query, and presents a novel XML TF*IDF ranking strategy to rank the individual matches of all possible search intentions.
Abstract: Inspired by the great success of information retrieval (IR) style keyword search on the web, keyword search on XML has emerged recently. The difference between text database and XML database results in three new challenges: (1) Identify the user search intention, i.e. identify the XML node types that user wants to search for and search via. (2) Resolve keyword ambiguity problems: a keyword can appear as both a tag name and a text value of some node; a keyword can appear as the text values of different XML node types and carry different meanings. (3) As the search results are sub-trees of the XML document, new scoring function is needed to estimate its relevance to a given query. However, existing methods cannot resolve these challenges, thus return low result quality in term of query relevance. In this paper, we propose an IR-style approach which basically utilizes the statistics of underlying XML data to address these challenges. We first propose specific guidelines that a search engine should meet in both search intention identification and relevance oriented ranking for search results. Then based on these guidelines, we design novel formulae to identify the search for nodes and search via nodes of a query, and present a novel XML TF*IDF ranking strategy to rank the individual matches of all possible search intentions. Lastly, the proposed techniques are implemented in an XML keyword search engine called XReal, and extensive experiments show the effectiveness of our approach.
TL;DR: The requirements for rule interchange languages for applications in the legal domain are summarized and these requirements are used to evaluate RuleML, SBVR, SWRL and RIF and the Legal Knowledge Interchange Format (LKIF) is presented.
Abstract: In this survey paper we summarize the requirements for rule interchange languages for applications in the legal domain and use these requirements to evaluate RuleML, SBVR, SWRL and RIF. We also present the Legal Knowledge Interchange Format (LKIF), a new rule interchange format developed specifically for applications in the legal domain.
TL;DR: This paper provides a tutorial on current security standards for XML and Web services and discusses standards including XML Signature, XML Encryption, the XML Key Management Specification (XKMS), WS-Security, WS-Trust,WS-SecureConversation, Web Services Policy, and the Security Assertion Markup Language (SAML).
Abstract: XML and Web services are widely used in current distributed systems. The security of the XML based communication, and the Web services themselves, is of great importance to the overall security of these systems. Furthermore, in order to facilitate interoperability, the security mechanisms should preferably be based on established standards. In this paper we provide a tutorial on current security standards for XML and Web services. The discussed standards include XML Signature, XML Encryption, the XML Key Management Specification (XKMS), WS-Security, WS-Trust, WS-SecureConversation, Web Services Policy, WS-SecurityPolicy, the eXtensible Access Control Markup Language (XACML), and the Security Assertion Markup Language (SAML).
TL;DR: This paper provides an overview of XML similarity/comparison by presenting existing research related to XML similarity by detailing the possible applications of XML comparison processes in various fields, ranging over data warehousing, data integration, classification/clustering and XML querying.
TL;DR: A novel labeling scheme called DDE (for Dynamic DEwey) which is tailored for both static and dynamic XML documents which can completely avoid re-labeling and its label quality is most resilient to the number and order of insertions compared to the existing approaches.
Abstract: Labeling schemes lie at the core of query processing for many XML database management systems. Designing labeling schemes for dynamic XML documents is an important problem that has received a lot of research attention. Existing dynamic labeling schemes, however, often sacrifice query performance and introduce additional labeling cost to facilitate arbitrary updates even when the documents actually seldom get updated. Since the line between static and dynamic XML documents is often blurred in practice, we believe it is important to design a labeling scheme that is compact and efficient regardless of whether the documents are frequently updated or not. In this paper, we propose a novel labeling scheme called DDE (for Dynamic DEwey) which is tailored for both static and dynamic XML documents. For static documents, the labels of DDE are the same as those of dewey which yield compact size and high query performance. When updates take place, DDE can completely avoid re-labeling and its label quality is most resilient to the number and order of insertions compared to the existing approaches. In addition, we introduce Compact DDE (CDDE) which is designed to optimize the performance of DDE for insertions. Both DDE and CDDE can be incorporated into existing systems and applications that are based on dewey labeling scheme with minimum efforts. Experiment results demonstrate the benefits of our proposed labeling schemes over the previous approaches.
TL;DR: This article focuses on the verification of temporal properties of runs of Active XML systems, specified in a tree-pattern-based temporal logic, Tree-LTL, which allows expressing a rich class of semantic properties of the application.
Abstract: Active XML is a high-level specification language tailored to data-intensive, distributed, dynamic Web services. Active XML is based on XML documents with embedded function calls. The state of a document evolves depending on the result of internal function calls (local computations) or external ones (interactions with users or other services). Function calls return documents that may be active, and so may activate new subtasks. The focus of this article is on the verification of temporal properties of runs of Active XML systems, specified in a tree-pattern-based temporal logic, Tree-LTL, which allows expressing a rich class of semantic properties of the application. The main results establish the boundary of decidability and the complexity of automatic verification of Tree-LTL properties.
TL;DR: In this article, a semiconductor device and method for fabricating the same, which can maintain a threshold voltage constant despite of decreased channel width, is disclosed, and the device including a first, and a second conductive type wells in a substrate, a first gate electrode on the first gate insulating film, the second gate electrode being doped with a secondconductive type except for edges of the first gateway electrode in a channel width direction counter, and isolating regions formed between the first-and second-gate electrodes.
Abstract: Semiconductor device and method for fabricating the same, is disclosed, which can maintain a threshold voltage constant despite of decreased channel width, the device including a first, and a second conductive type wells in a substrate, a first, and a second gate insulating films on the first, and the second conductive type wells, a first gate electrode on the first gate insulating film, the first gate electrode being doped with a second conductive type except for edges of the first gate electrode in a channel width direction counter doped with a first conductive type, a second gate electrode on the second gate insulating film, the second gate electrode being doped with a first conductive type except for edges of the second gate electrode in a channel width direction counter doped with a second conductive type, and isolating regions formed between the first, and second conductive type wells, the first, and second gate insulating films, and the first, and second gate electrodes.
TL;DR: An attempt has been made to evaluate the quality of XML schema documents (XSD) written in W3C XML Schema language with a metric, which measures the complexity due to the internal architecture of XSD components, and due to recursion.
Abstract: The eXtensible Markup Language (XML) has been gaining extraordinary acceptance from many diverse enterprise software companies for their object repositories, data interchange, and development tools. Further, many different domains, organizations and content providers have been publishing and exchanging information via internet by the usage of XML and standard schemas. Efficient implementation of XML in these domains requires well designed XML schemas. In this point of view, design of XML schemas plays an extremely important role in software development process and needs to be quantified for ease of maintainability. In this paper, an attempt has been made to evaluate the quality of XML schema documents (XSD) written in W3C XML Schema language. We propose a metric, which measures the complexity due to the internal architecture of XSD components, and due to recursion. This is the single metric, which cover all major factors responsible for complexity of XSD. The metric has been empirically and theoretically validated, demonstrated with examples and supported by comparison with other well known structure metrics applied on XML schema documents.
TL;DR: In this article, the authors describe techniques for XML (Extensible Markup Language) web feeds for web access of remote resources and present a method for obtaining information regarding one or more available resources from one or multiple resource hosts, rendering the information regarding available resources into an XML document, and providing the XML document to a user device.
Abstract: Techniques for XML (Extensible Markup Language) web feeds for web access of remote resources are described. In one embodiment, a method includes obtaining information regarding one or more available resources from one or more resource hosts, rendering the information regarding one or more available resources into an Extensible Markup Language (XML) document, and providing the XML document to a user device.
TL;DR: This paper presents some of the research topics on XML, namely: XML on relational databases, query processing, views, data matching, and schema evolution, and summarizes some (some!) of the most relevant or traditional papers on those subjects.
Abstract: XML has been explored by both research and industry communities. More than 5500 papers were published on different aspects of XML. With so many publications, it is hard for someone to decide where to start. Hence, this paper presents some of the research topics on XML, namely: XML on relational databases, query processing, views, data matching, and schema evolution. It then summarizes some (some!) of the most relevant or traditional papers on those subjects.
TL;DR: This work proposes a "pure hardware" based solution, which utilizes XPath query blocks on FPGA to solve the filtering problem and achieves drastically better through put than the existing software or mixed (hardware/software) architectures.
Abstract: growing amount of XML encoded data exchanged over the In- ternet increases the importance of XML based publish-subscribe (pub-sub) and content based routing systems. The input in such systems typically consists of a stream of XML documents and a set of user subscriptions expressed as XML queries. The pub-sub system then filters the published documents and passes them t o the subscribers. Pub-sub systems are characterized by very high input ratios, therefore the processing time is critical. In this p aper we propose a "pure hardware" based solution, which utilizes XPath query blocks on FPGA to solve the filtering problem. By utiliz - ing the high throughput that an FPGA provides for parallel pro- cessing, our approach achieves drastically better through put than the existing software or mixed (hardware/software) architectures. The XPath queries (subscriptions) are translated to regular expres- sions which are then mapped to FPGA devices. By introducing stacks within the FPGA we are able to express and process a wide range of path queries very efficiently, on a scalable environ ment. Moreover, the fact that the parser and the filter processing a re per- formed on the same FPGA chip, eliminates expensive communi- cation costs (that a multi-core system would need) thus enabling very fast and efficient pipelining. Our experimental evalua tion re- veals more than one order of magnitude improvement compared to traditional pub/sub systems.
TL;DR: TwigX-Guide is presented, a hybrid system, which takes advantage of the beautiful features of path summary in DataGuide and region encoding in TwigStack to improve complex query processing.
TL;DR: This paper analyzes how each approach to labeling schemes works, as well as its advantages and disadvantages, and discusses some of the current trends in labeling methods, which indicate a clear shift towards hybrid approaches.
Abstract: With the rapid emergence of XML as a data exchange and data transfer medium over the Web, querying XML data has become a major concern. Labeling schemes have been developed to optimize query retrieval, since they provide a quick way to determine the type of relationships that are present among the nodes. In this paper, we analyze how each approach works, as well as its advantages and disadvantages. In addition, we discuss some of the current trends in labeling methods, which indicate a clear shift towards hybrid approaches. Hybrid systems open the possibility of balancing one technology’s weakness with another technology’s strengths.
TL;DR: A novel method is developed, called SAIL, to index such structural relationships embedded in XML documents to facilitate the processing of keyword queries and devise structure-aware indices to maintain the structural relationships for efficiently identifying the minimal-cost trees.
TL;DR: To facilitate intuitionistic view selection, a view graph is presented to structurally maintain all generated views and two view selection strategies are proposed, targeting at space-optimized and space-time tradeoff, respectively.
Abstract: Materialized views, a rdbms silver bullet, demonstrate its efficacy in many applications, especially as a data warehousing/decison support system tool The pivot of playing materialized views efficiently is view selection Though studied for over thirty years in rdbms , the selection is hard to make in the context of xml databases, where both the semi-structured data and the expressiveness of xml query languages add challenges to the view selection problem We start our discussion on producing minimal xml views (in terms of size) as candidates for a given workload (a query set) To facilitate intuitionistic view selection, we present a view graph (called vcube ) to structurally maintain all generated views By basing our selection on vcube for materialization, we propose two view selection strategies, targeting at space-optimized and space-time tradeoff, respectively We built our implementation on top of Berkeley DB XML, demonstrating that significant performance improvement could be obtained using our proposed approaches
TL;DR: This work proposes an automatic query refinement method to transform a keyword query into structured XML queries that capture the original information need and conform to the underlying XML data.
Abstract: The structural heterogeneity and complexity of XML repositories makes query formulation challenging for users who have little knowledge of XML. To assist its users, an XML retrieval system can have a keyword-based interface, relegating the task of combining textual and structural clues to the retrieval algorithm. In this work, we propose an automatic query refinement method to transform a keyword query into structured XML queries that capture the original information need and conform to the underlying XML data. We formulate query generation as a search problem, and show the effectiveness of the method in generating accurate content-and-structure queries.
TL;DR: This paper addresses the problem of efficiently locating relevant XML documents in a P2P network, where a user poses queries in a language such as XPath, and develops a new system called psiX that runs on top of an existing distributed hashing framework.
Abstract: One of the key challenges in a peer-to-peer (P2P) network is to efficiently locate relevant data sources across a large number of participating peers With the increasing popularity of the extensible markup language (XML) as a standard for information interchange on the Internet, XML is commonly used as an underlying data model for P2P applications to deal with the heterogeneity of data and enhance the expressiveness of queries In this paper, we address the problem of efficiently locating relevant XML documents in a P2P network, where a user poses queries in a language such as XPath We have developed a new system called psiX that runs on top of an existing distributed hashing framework Under the psiX system, each XML document is mapped into an algebraic signature that captures the structural summary of the document An XML query pattern is also mapped into a signature The query's signature is used to locate relevant document signatures Our signature scheme supports holistic processing of query patterns without breaking them into multiple path queries and processing them individually The participating peers in the network collectively maintain a collection of distributed hierarchical indexes for the document signatures Value indexes are built to handle numeric and textual values in XML documents These indexes are used to process queries with value predicates Our experimental study on PlanetLab demonstrates that psiX provides an efficient location service in a P2P network for a wide variety of XML documents
TL;DR: In this paper, a novel method is proposed, called S^3, which can selectively process the document's nodes and substantially outperform previous QTP processing methods w.r.t. response time, I/O overhead, and memory consumption - critical parameters in any real multi-user environment.
Abstract: XML queries are frequently based on path expressions where their elements are connected to each other in a tree-pattern structure, called query tree pattern (QTP). Therefore, a key operation in XML query processing is finding those elements which match the given QTP. In this paper, we propose a novel method, called S^3, which can selectively process the document's nodes. In S^3, unlike all previous methods, path expressions are not directly executed on the XML document, but first they are evaluated against a guidance structure, called QueryGuide. Enriched by information extracted from the QueryGuide, a query execution plan, called SMP, is generated to provide focused pattern matching and avoid document access as far as possible. Moreover, our experimental results confirm that S^3 and its optimized version OS^3 substantially outperform previous QTP processing methods w.r.t. response time, I/O overhead, and memory consumption - critical parameters in any real multi-user environment.
TL;DR: This work has put into practice this approach mapping the XBRL filings available from the SEC’s EDGAR program to Resource Description Framework (RDF) and the XML Schema taxonomies these filings are based on to Web Ontology Language (OWL).
Abstract: The XML Business Reporting Language (XBRL) is a standard for business and financial information reporting. It is based on XML so instance documents based on XBRL, e.g. a quarterly report, are highly constrained by the XML document-oriented nature. This makes more difficult to perform queries that mix information from filings from different dates, companies, or accounting principles than with a formalism based on a graph model instead of a tree model. Semantic Web technologies provide a graph model that facilitates mashing-up different XBRL sources. We have put into practice this approach mapping the XBRL filings available from the SEC’s EDGAR program to Resource Description Framework (RDF) and the XML Schema taxonomies these filings are based on to Web Ontology Language (OWL). The resulting semantic metadata, though highly tied to the XML structure it is mapped from, benefits from Semantic Web technologies and tools in order to facilitate integration and crossquerying, even together with other parts of the Web of Linked Data.
TL;DR: This paper evaluates the efficient access of rich internet applications especially in the domain of resource limited embedded devices such as digital picture frames and proposes a generic adaption of the EXI format to even increase the efficiency.
Abstract: The tremendous acceptance of web applications, or more specifically, rich media applications are about to be extended to embedded devices such as mobile phones, digital picture frames or TV sets. The Extensible Markup Language (XML) is one important pillar when we deal with such internet applications. XML is known as the interchange language of the web. Besides its outstanding features, in the domain of embedded devices, XML is difficult to handle due to the processing overhead and the verbosity associated with its use. This paper evaluates the efficient access of rich internet applications especially in the domain of resource limited embedded devices such as digital picture frames. For this purpose, typical XML source information models for rich internet applications, such as Silverlight and SVG, are evaluated. In this context, the new Efficient XML Interchange (EXI) format is applied and studied. Finally a generic adaption of the EXI format is developed to even increase the efficiency. The paper concludes with the proposal for further studies on an integration of EXI-based typed interfaces to reduce the processing complexity for rich media applications on embedded devices.
TL;DR: An approach to extract Tree-based association rules from XML documents that provide approximate, intensional information on both the structure and the content of XML documents, and can be stored in XML format to be queried later on is described.
Abstract: The increasing amount of very large XML datasets available to casual users is a most challenging problem for our community, and calls for an appropriate support to efficiently gather knowledge from these data. Data mining, already widely applied to extract frequent correlations of values from both structured and semi-structured datasets, is the appropriate tool for knowledge elicitation. In this work we describe an approach to extract Tree-based association rules from XML documents. Such rules provide approximate, intensional information on both the structure and the content of XML documents, and can be stored in XML format to be queried later on. The mined knowledge is used to provide: (i) quick, approximate answers to queries and (ii) information about structural regularities. A prototype system demonstrates the effectiveness of the approach.
TL;DR: This paper designs efficient index and proposes hash-based method to answer SLCA-based keyword search queries and outperforms Incremental Multiway-SLCA approach, which is the most efficient algorithms in the literature.
Abstract: XML is a de-facto standard for exchanging and presenting information and keyword search over XML documents has become an interesting topic. However semi-structured XML data give rise to many challenges of conventional information retrieval technologies. In order to return highly-related data nodes and improve the quality of keyword search result, SLCA(Smallest Lowest Common Ancestor )-based keyword search on XML data is recently attracting more and more attention in the database community. In this paper, we design efficient index and propose hash-based method to answer SLCA-based keyword search queries. Our approach outperforms Incremental Multiway-SLCA approach , which is the most efficient algorithms in the literature. We demonstrate the effectiveness of our algorithms analytically and experimentally.
TL;DR: It is shown that the interplay of XML Signature, XPath, and the XML namespace concept has severe flaws that can be exploited for an attack, and that XML namespaces in general pose real troubles to digital signatures in the XML domain.
Abstract: The XML signature wrapping attack is one of the most discussed security issues of the Web Services security community during the last years. Until now, the issue has not been solved, and all countermeasure approaches proposed so far were shown to be insufficient.In this paper, we present yet another way to perform signature wrapping attacks by using the XML namespace injection technique. We show that the interplay of XML Signature, XPath, and the XML namespace concept has severe flaws that can be exploited for an attack, and that XML namespaces in general pose real troubles to digital signatures in the XML domain. Additionally, we present and discuss some new approaches in countering the proposed attack vector.
TL;DR: In this paper, a method and apparatus for building and using a persistent XML tree index for navigating an XML document is described, which is stored separately from the XML document content, and thus is able to optimize performance through the use of fixed-sized index entries.
Abstract: A method and apparatus are provided for building and using a persistent XML tree index for navigating an XML document. The XML tree index is stored separately from the XML document content, and thus is able to optimize performance through the use of fixed-sized index entries. The XML document hierarchy need not be constructed in volatile memory, so creating and using the XML tree index scales even for large documents. To evaluate a path expression including descendent or ancestral syntax, navigation links can be read from persistent storage and used directly to find the nodes specified in the path expression. The use of an abstract navigational interface allows applications to be written that are independent of the storage implementation of the index and the content. Thus, the XML tree index can index documents stored at least in a database, a persistent file system, or as a sequence of in memory.
TL;DR: Through empirical evaluation, it is shown that ParDOM yields better scalability than PXP on commodity multicore processors, and can process a wide-variety of XML datasets with complex structures which PXP fails to parse.
Abstract: The extensible markup language XML has become the de facto standard for information representation and interchange on the Internet. XML parsing is a core operation performed on an XML document for it to be accessed and manipulated. This operation is known to cause performance bottlenecks in applications and systems that process large volumes of XML data. We believe that parallelism is a natural way to boost performance. Leveraging multicore processors can offer a cost-effective solution, because future multicore processors will support hundreds of cores, and will offer a high degree of parallelism in hardware. We propose a data parallel algorithm called ParDOM for XML DOM parsing, that builds an in-memory tree structure for an XML document. ParDOM has two phases. In the first phase, an XML document is partitioned into chunks and parsed in parallel. In the second phase, partial DOM node tree structures created during the first phase, are linked together (in parallel) to build a complete DOM node tree. ParDOM offers fine-grained parallelism by adopting a flexible chunking scheme --- each chunk can contain an arbitrary number of start and end XML tags that are not necessarily matched. ParDOM can be conveniently implemented using a data parallel programming model that supports map and sort operations. Through empirical evaluation, we show that ParDOM yields better scalability than PXP [23] --- a recently proposed parallel DOM parsing algorithm --- on commodity multicore processors. Furthermore, ParDOM can process a wide-variety of XML datasets with complex structures which PXP fails to parse.
TL;DR: This paper designs an adaptive XML keyword search approach, called XBridge, that can derive the semantics of a keyword query and generate a set of effective structured queries by analyzing the given keywords and the schemas of XML data sources.
Abstract: Recently, keyword search has attracted a great deal of attention in XML database. It is hard to directly improve the relevancy of XML keyword search because lots of keyword-matched nodes may not contribute to the results. To address this challenge, in this paper we design an adaptive XML keyword search approach, called XBridge , that can derive the semantics of a keyword query and generate a set of effective structured queries by analyzing the given keyword query and the schemas of XML data sources. To efficiently answer keyword query, we only need to evaluate the generated structured queries over the XML data sources with any existing XQuery search engine. In addition, we extend our approach to process top-k keyword search based on the execution plan to be proposed. The quality of the returned answers can be measured using the context of the keyword-matched nodes and the contents of the nodes together. The effectiveness and efficiency of XBridge is demonstrated with an experimental performance study on real XML data.
TL;DR: This work investigates XML keys which uniquely identify XML elements based on a very general notion of value-equality: isomorphic subtrees with the identity on data values, and establishes a sound and complete set of inference rules for this expressive fragment of XML keys.
Abstract: Constraints are important for a variety of XML recommendations and applications. Consequently, there are numerous opportunities for advancing the treatment of XML semantics. In particular, suitable notions of keys will enhance XML's capabilities of modeling, managing and processing native XML data. However, the different ways of accessing and comparing XML elements make it challenging to balance expressiveness and tractability.We investigate XML keys which uniquely identify XML elements based on a very general notion of value-equality: isomorphic subtrees with the identity on data values. Previously, an XML key fragment has been recognised that is robust in the sense that its implication problem can be expressed as the reachability problem in a suitable digraph. We analyse the impact of extending this fragment by structural keys that uniquely identify XML elements independently of any data. We establish a sound and complete set of inference rules for this expressive fragment of XML keys, and encode these rules in an algorithm that decides the associated implication problem in time quadratic in the size of the input keys. Consequently, we gain significant expressiveness without any loss of efficiency in comparison to less expressive XML key fragments.
TL;DR: This work has adapted the Hadoop implementation to determine the threshold data sizes and computation work required per node, for a distributed solution to be effective and presents both a parallel and distributed approach to analyze how the scalability and performance requirements of large-scale XML-based data processing can be achieved.
Abstract: An emerging trend is the use of XML as the data format for many distributed scientific applications, with the size of these documents ranging from tens of megabytes to hundreds of megabytes. Our earlier benchmarking results revealed that most of the widely available XML processing toolkits do not scale well for large sized XML data. A significant transformation is necessary in the design of XML processing for scientific applications so that the overall application turn-around time is not negatively affected. We present both a parallel and distributed approach to analyze how the scalability and performance requirements of large-scale XML-based data processing can be achieved. We have adapted the Hadoop implementation to determine the threshold data sizes and computation work required per node, for a distributed solution to be effective. We also present an analysis of parallelism using our Piximal toolkit for processing large-scale XML datasets that utilizes the capabilities for parallelism that are available in the emerging multi-core architectures. Multi-core processors are expected to be widely available in research clusters and scientific desktops, and it is critical to harness the opportunities for parallelism in the middleware, instead of passing on the task to application programmers. Our parallelization approach for a multi-core node is to employ a DFA-based parser that recognizes a useful subset of the XML specification, and convert the DFA into an NFA that can be applied to an arbitrary subset of the input. Speculative NFAs are scheduled on available cores in a node to effectively utilize the processing capabilities and achieve overall performance gains. We evaluate the efficacy of this approach in terms of potential speedup that can be achieved for representative XML data sets.
TL;DR: This paper proposes a new information-access paradigm for XML data, called "Inks," in which the system searches on the underlying data "on the fly" as the user types in query keywords, and implemented the algorithm.
Abstract: In a traditional keyword-search system over XML data, a user composes a keyword query, submits it to the system, and retrieves relevant subtrees. In the case where the user has limited knowledge about the data, often the user feels "left in the dark" when issuing queries, and has to use a try-and-see approach for finding information. In this paper, we study a new information-access paradigm for XML data, called "Inks," in which the system searches on the underlying data "on the fly" as the user types in query keywords. Inks extends existing XML keyword search methods by interactively answering queries. We propose effective indices, early-termination techniques, and efficient search algorithms to achieve a high interactive speed. We have implemented our algorithm, and the experimental results show that our method achieves high search efficiency and result quality.