TL;DR: In this paper, the user enters a query and the system processes the query to generate an alternative representation, which includes conceptual-level abstraction and representations based on complex nominals (CNs), proper nouns (PNs), single terms, text structure, and logical make-up of the query, including mandatory terms.
Abstract: Techniques for generating sophisticated representations of the contents of both queries and documents in a retrieval system by using natural language processing (NLP) techniques to represent, index, and retrieve texts at the multiple levels (e.g., the morphological, lexical, syntactic, semantic, discourse, and pragmatic levels) at which humans construe meaning in writing. The user enters a query and the system processes the query to generate an alternative representation, which includes conceptual-level abstraction and representations based on complex nominals (CNs), proper nouns (PNs), single terms, text structure, and logical make-up of the query, including mandatory terms. After processing the query, the system displays query information to the user, indicating the system's interpretation and representation of the content of the query. The user is then given an opportunity to provide input, in response to which the system modifies the alternative representation of the query. Once the user has provided desired input, the possibly modified representation of the query is matched to the relevant document database, and measures of relevance generated for the documents. A set of documents is presented to the user, who is given an opportunity to select some or all of the documents, typically on the basis of such documents being of particular relevance. The user then initiates the generation of a query representation based on the alternative representations of the selected document(s).
TL;DR: In this paper, a document retrieval system (20) where a user can enter a query, including a natural query, in a desired one of a plurality of supported languages, and retrieve documents from a database (60) that includes documents in at least one other language of the supported languages.
Abstract: A document retrieval system (20) where a user can enter a query, including a natural query, in a desired one of a plurality of supported languages, and retrieve documents from a database (60) that includes documents in at least one other language of the plurality of supported languages. The user need not have any knowledge of the other languages. Each document in the database is subjected to a set of processing steps to generate a language-independent conceptual representation of the subject content of the document. The query is also subjected to a (possibly different) set of processing steps to generate a language-independent conceptual representation of the subject content of the query. Documents are matched to queries based on the conceptual-level contents of the document and query, and, optionally, on the basis of the term-based representation.
TL;DR: In this article, the user enters a query and the system processes the query to generate an alternative representation, which includes conceptual-level abstraction and representations based on complex nominals (CNs), proper nouns (PNs), single terms, text structure, and logical make-up of the query, including mandatory terms.
Abstract: Techniques for generating sophisticated representations of the contents of both queries and documents in a retrieval system by using natural language processing (NLP) techniques to represent, index, and retrieve texts at the multiple levels (e.g., the morphological, lexical, syntactic, semantic, discourse, and pragmatic levels) at which humans construe meaning in writing. The user enters a query and the system processes the query to generate an alternative representation, which includes conceptual-level abstraction and representations based on complex nominals (CNs), proper nouns (PNs), single terms, text structure, and logical make-up of the query, including mandatory terms. After processing the query, the system displays query information to the user, indicating the system's interpretation and representation of the content of the query. The user is then given an opportunity to provide input, in response to which the system modifies the alternative representation of the query. Once the user has provided desired input, the possibly modified representation of the query is matched to the relevant document database, and measures of relevance generated for the documents. A set of documents is presented to the user, who is given an opportunity to select some or all of the documents, typically on the basis of such documents being of particular relevance. The user then initiates the generation of a query representation based on the alternative representations of the selected document(s).
TL;DR: In this paper, a system, method, and various software products provide improved information retrieval performance from multiple document databases by retrieving from the multiple document database in response to a user query, a set of documents that globally satisfy the query, even though each database maintains independent document indices, term frequency information, and scoring functions.
Abstract: A system, method, and various software products provide improved information retrieval performance from multiple document databases by retrieving from the multiple document databases in response to a user query, a set of documents that globally satisfy the query, even though each database maintains independent document indices, term frequency information, and scoring functions. The global search result approximates, to any desired degree of error, the search results that would have been obtained had the multiple document databases been globally indexed. This is done by sharing at the time the query is executed, a small subset of information about the local relative significance of terms related to the user's query, and from this information, determining a global relative significance of such terms. From the global relative significance, the individual document databases determine their query results, which are then merged into a global set of documents satisfying the query. The shared local relative significance information may be the inverse document frequency of each of a number of terms related to the query, or it may be the total frequency of each of such terms. The global relative significance may correspondingly be a global inverse document frequency, or a global term frequency from which the global inverse document frequency is calculated.
TL;DR: In this article, a computer system identifies web pages of interest to a client, which includes a cataloging function which defines a hierarchy of subject categories, logically arranges a multitude of web pages in the categories and periodically adds web pages to the categories.
Abstract: A computer system identifies web pages of interest to a client. The system comprises a cataloging function which defines a hierarchy of subject categories, logically arranges a multitude of web pages in the categories and periodically adds web pages to the categories. The system also comprises a profile building function which receives selections of categories from a user, records the selections and responds with an identification of subcategories of each selected category. The user can make further selections from the subcategories. Alternately, the user makes a key word search and then selects data web pages from the results of the search. The profile building function records the categories of the data web pages selected by the user. Next, the user requests a list of recently added web pages of interest to the user. In response, the system identifies recently added web pages from the categories previously selected by the user and from the categories of the data web pages previously selected by the user.
TL;DR: In this article, a method for facilitating World Wide Web searches and like database searches by combining search result documents, as provided by separate search engines in response to a query, into one single integrated list so as to produce a single document with a ranked list of pages, by forming a set of selected queries, the queries including respective terms, for which selected queries relevance data from past data is known.
Abstract: A computer-implemented method for facilitating World Wide Web Searches and like database searches by combining search result documents, as provided by separate search engines in response to a query, into one single integrated list so as to produce a single document with a ranked list of pages, by forming a set of selected queries, the queries including respective terms, for which selected queries relevance data from past data is known, herein referred to as training queries, in a vector space comprising all training queries, the relevance data comprising judgments by a user as to whether a page is appropriate for a query which retrieved it. Further steps in the method are identifying a set of k most similar training queries to current query q, computing an average relevant document distribution of the k queries within the training queries' search results for each of the search engines, using the computed relevant document distributions, finding an optimal number of pages to select from the result set of each search engine when N total pages are to be retrieved, and creating a final retrieved set by forming the union of the top λs pages from each search engine.
TL;DR: The issues that were addressed when building the SEQ sequence database system, a database system with support for sequence data, are described and a novel nested design paradigm used in PREDATOR to combine sequence ‘and relational data is presented.
Abstract: This paper discusses the design and implementation of SEQ, a database system with support for sequence data. SEQ models a sequence as an ordered collection of records, and supports a declarative sequence query language based on an algebra of query operators, thereby permitting algebraic query optimization and evaluation. SEQ has been built as a component of the PREDATOR database system that provides support for relational and other kinds of complex data as well. that could describe a wide variety of sequence data, and a query algebra that could be used to represent queries over sequences [SLR95]. We had also observed that sequence query evaluation could benefit greatly from algebraic optimizations that exploited the order information [SLR94]. This paper describes the issues that were addressed when building the SEQ sequence database system based on these ideas. There are three distinct contributions made in this paper. (1) We describe the specification of sequence queries using the SEQUIN query language. (2) We quantitatively demonstrate the importance of various storage and optimization techniques by studying their effect on performance. (3) We present a novel nested design paradigm used in PREDATOR to combine sequence ‘and relational data. SEQ is a component of the PREDATOR* multi-threaded, client-server database system which supports sequences, as well as relations and other kinds of complex data. The system uses the SHORE storage manager library [CDF+94] for lowlevel database functionality like buffer management, concurrency control and recovery. A novel design paradigm provides query processing support for multiple data types, including both sequences and relations. The system implementation has been in progress for more than a year and is currently at approximately 35,000 lines of C++ code (excluding the SHORE libraries). In this paper, the focus is on the SEQ component which provides the S&QUZN language to specify declarative sequence queries, and an optimization and execution engine to process them. The PREDATOR system is described in detail in [Ses96], and only a high-level overview is presented here.
TL;DR: In this paper, a method implemented on a computer for facilitating World Wide Web Searches and like database searches by combining search result documents, as provided by separate search engines in response to a query, into one single integrated list so as to produce a single document with a ranked list of pages, includes the steps of: (a) training the computer for each search engine by clustering training queries and building cluster centroids; (b) assigning weights to each cluster reflecting the number of relevant pages expected to be obtained by this search engine for queries similar to those in that cluster;
Abstract: A method implemented on a computer for facilitating World Wide Web Searches and like database searches by combining search result documents, as provided by separate search engines in response to a query, into one single integrated list so as to produce a single document with a ranked list of pages, includes the steps of: (a) training the computer for each search engine by clustering training queries and building cluster centroids; (b) Assign weights to each cluster reflecting the number of relevant pages expected to be obtained by this search engine for queries similar to those in that cluster; (c) processing an incoming query by selecting, for each search engine, that cluster centroid that is most similar to the incoming query and returning the weight associated with the selected cluster as the weight of the current search engine; and (d) apportioning the N slots in the retrieved set according to the weights returned by each search engine.
TL;DR: FACT takes a query-centered view of knowledge discovery, in which a discovery request is viewed as a query over the implicit set of possible results supported by a collection of documents, and where background knowledge is used to specify constraints on the desired results of this query process.
Abstract: This paper describes the FACT system for knowledge discovery from text. It discovers associations - patterns of co-occurrence -amongst keywords labeling the items in a collection of textual documents. In addition, FACT is able to use background knowledge about the keywords labeling the documents in its discovery process. FACT takes a query-centered view of knowledge discovery, in which a discovery request is viewed as a query over the implicit set of possible results supported by a collection of documents, and where background knowledge is used to specify constraints on the desired results of this query process. Execution of a knowledge-discovery query is structured so that these background-knowledge constraints can be exploited in the search for possible results. Finally, rather than requiring a user to specify an explicit query expression in the knowledge-discovery query language, FACT presents the user with a simple-to-use graphical interface to the query language, with the language providing a well-defined semantics for the discovery actions performed by a user through the interface.
TL;DR: In this article, a system and method provides for indexing and retrieval of stored documents using a decomposition of words in the documents in n-grams, or linear word subunits.
Abstract: A system and method provides for indexing and retrieval of stored documents using a decomposition of words in the documents in n-grams, or linear word subunits The documents are indexed as pages in a number of banks For each bank there is a bank index The individual n-grams are identified for each page are stored in the bank index Each bank index further contains an entry map that indicates whether a given n-gram is present in any of the pages of the bank, and then provides an index to a page map that further indicates which page in the bank contains the n-gram When a search query is input, the query words are decomposed into their n-grams The query word n-grams are compared first with entry maps to determine if the query word n-grams appear on any page in the bank If so, the associated page map is traversed to determine which page in the bank contains the query word n-grams The n-grams on the page are compared with the query word n-grams to determine the presence of an match therebetween Matching pages are flagged When all pages in all banks have been processed, the pages are consolidated with respect to the documents to which they belong, resulting in a list of documents that match the search query The results are displayed to a user
TL;DR: This study shows that knowledge discovery substantially broadens the spectrum of intelligent query answering and may have deep implications on query answering in data- and knowledge-base systems.
Abstract: Knowledge discovery facilitates querying database knowledge and intelligent query answering in database systems. We investigate the application of discovered knowledge, concept hierarchies, and knowledge discovery tools for intelligent query answering in database systems. A knowledge-rich data model is constructed to incorporate discovered knowledge and knowledge discovery tools. Queries are classified into data queries and knowledge queries. Both types of queries can be answered directly by simple retrieval or intelligently by analyzing the intent of query and providing generalized, neighborhood or associated information using stored or discovered knowledge. Techniques have been developed for intelligent query answering using discovered knowledge and/or knowledge discovery tools, which includes generalization, data summarization, concept clustering, rule discovery, query rewriting, deduction, lazy evaluation, application of multiple-layered databases, etc. Our study shows that knowledge discovery substantially broadens the spectrum of intelligent query answering and may have deep implications on query answering in data- and knowledge-base systems.
TL;DR: While most conceptual query languages are based on the Entity-Relationship approach, ConQuer is based on Object-Role Modeling (ORM), which exposes semantic domains as conceptual object types, thus allowing queries to be formulated in terms of paths through the information space.
Abstract: Relational query languages such as SQL and QBE are less than ideal for end user queries since they require users to work explicitly with structures at the relational level, rather than at the conceptual level where they naturally communicate. ConQuer is a new conceptual query language that allows users to formulate queries naturally in terms of elementary relationships, and operators such as “and”, “not” and “maybe”, thus avoiding the need to deal explicitly with implementation details such as relational tables, null values, and outer joins. While most conceptual query languages are based on the Entity-Relationship approach, ConQuer is based on Object-Role Modeling (ORM), which exposes semantic domains as conceptual object types, thus allowing queries to be formulated in terms of paths through the information space. This paper provides an overview of the ConQuer language.
TL;DR: A differential re-evaluation algorithm (DRA) is proposed, which exploits the structure and information contained in both the query expressions and the database update operations to support efficient processing of continual queries.
Abstract: We define continual queries as a useful tool for monitoring of updated information. Continual queries are standing queries that monitor the source data and notify the users whenever new data matches the query. In addition to periodic refresh, continual queries include Epsilon Transaction concepts to allow users to specify query refresh based on the magnitude of updates. To support efficient processing of continual queries, we propose a differential re-evaluation algorithm (DRA), which exploits the structure and information contained in both the query expressions and the database update operations. The DRA design can be seen as a synthesis of previous research on differential files, incremental view maintenance, and active databases.
TL;DR: In this paper, a method for modeling an enterprise so that its policy changes as well as its current components and operations are represented in a database and a method of using a computer to query the database is presented.
Abstract: A method of modeling an enterprise so that its policy changes as well as its current components and operations are represented in a database (11), and a method of using a computer to query the database. The enterprise is modeled using classes of objects and associated methods. During operation, a query about data in the database (11) is received from a user, with the query calling for the use of at least one method to answer the query. The database (11) is accessed to determine whether the method is affected by a policy change, where different policies are represented by policy objects. If so, the user is provided with policy choices. A policy selection is received, and the query is answered, using an implementation of the method based on the policy selection (FIG. 2).
TL;DR: This article develops a strategy to cope with the problem of overwhelm by formulating ad hoc queries, based on ideas from the information retrieval world, in particular the query by navigation mechanism and the stratified hypermedia architecture.
Abstract: Query formulation in the context of large conceptual schemata is known to be a hard problem. When formulating ad hoc queries users may become overwhelmed by the vast amount of information that is stored in the information system; leading to a feeling of lost in conceptual space. In this article we develop a strategy to cope with this problem. This strategy is based on ideas from the information retrieval world, in particular the query by navigation mechanism and the stratified hypermedia architecture. The stratified hypermedia architecture is used to describe the information contained in the information system on multiple levels of abstraction. When using our approach to the formulation of queries, a user will first formulate a number of simple queries corresponding to linear paths through the information structure. The formulation of the linear paths is the result of the explorative phase of query formulation. Once users have specified a number of these linear paths, they may combine them to form more complex queries. This last process is referred to as query by construction and corresponds to the constructive phase of the query formulation process.
TL;DR: In a networked information system (such as the NASA Earth Observing System-Data Information System (EOS-DIS)), there are three major obstacles facing users in a querying process: network performance, data volume and data complexity.
Abstract: In a networked information system (such as the NASA Earth Observing System-Data Information System (EOS-DIS)), there are three major obstacles facing users in a querying process: network performance, data volume and data complexity. In order to overcome these obstacles, we propose a two phase approach to query formulation. The two phases are the Query Preview and the Query Refinement. In the Query Preview phase, users formulate an initial query by selecting rough attribute values. The estimated number of matching data sets is shown, graphically on preview bars which allows users to rapidly focus on a manageable number of relevant data sets. Query previews also prevent wasted steps by eliminating zero hit queries. When the estimated number of data sets is long enough, the initial query is submitted to the network which returns the metadata of the data sets for further refinement in the Query Refinement phase. The two phase approach to query formulation overcomes slow network performance, and reduces the data volume and data complexity, problems. This approach is especially appropriate for users who do not have extensive knowledge about the data and who prefer an exploratory method to discover data patterns and exceptions. Using this approach, we have developed dynamic query user interfaces to allow users to formulate their queries across a networked environment.
TL;DR: In this article, a method and an apparatus may be implemented in a digital computer to query a set of arbitrarily structured records, which are structured differently from each other, and a query engine, query structure, operators of conventional and non-conventional types may be used in formulating a query.
Abstract: A method and apparatus disclosed may be implemented in a digital computer to query a set of arbitrarily structured records. Arbitrarily structured records are structured differently from each other. A query engine, query structure, operators of conventional and non-conventional types may be used in formulating a query. The apparatus may evaluate records having missing fields, repeating fields, or an UNKNOWN value arising from a missing field, division by zero, modulo by zero, or the like. New aggregator (e.g. universal quantifier and existential quantifier) and selector operators (e.g., first, last, nth) may distill multiple values to return a single value. To evaluate a query, the search engine may implement filtered indices, alternate-key indices, compound alternate-key indices, hybrid queries having both full-text and non-full text operands, and joinder of records. Certain of these features may be implemented for evaluating records from both prior art databases and heterogeneous databases of arbitrarily structured records.
TL;DR: In this article, the authors propose a method to evaluate the content of a set of data to determine whether the data set satisfies one or more queries. But the method is not suitable for large numbers of queries and the number of queries is large and/or the queries are complex.
Abstract: The invention enables evaluation of the content of a set of data to determine whether the data set satisfies one or more queries. The invention enables rapid evaluation of large numbers of data sets much more rapidly than has previously been possible, even when the number of queries is large and/or the queries are complex. The queries are evaluated using an execution plan of query terms that is constructed from one or more specified queries by translating each query term of each query into one or more evidence descriptors and one or more combination operators, and operably relating each of the combination operators to at least one of the evidence descriptors or other combination operators, such that each query is defined by one or more of the evidence descriptors and one or more of the combination operators that are operably related to each other. Preferably, none of the evidence descriptors or combination operators are duplicated in the execution plan. The invention can be used to evaluate data sets of a variety of types, such as text documents and databases. The invention can be further optimized to achieve rapid evaluation of a data set with respect to the queries in two steps. First, one or more candidate queries that may be satisfied by the data set are identified by approximately evaluating each query. Second, each of the candidate queries is fully evaluated to determine whether the candidate query is satisfied by the data set.
TL;DR: This work presents a technique for query decomposition, under which the query is shipped exactly once to every site, computed locally, then the local results are shipped to the client, and assembled here into the final result.
Abstract: Recently, several query languages have been proposed for querying information sources whose data is not constrained by a schema, or whose schema is unknown. Examples include: LOREL (for querying data combined from several heterogeneous sources), W3QS (for querying the World Wide Web); and UnQL (for querying unstructured data). The natural data model for such languages is that of a rooted, labeled graph. Their main novelty is the ability to express queries which traverse arbitrarily long paths in the graph, typically described by a regular expression. Such queries however may prove difficult to evaluate in the case when the data is distributed on severalsites, with many edges going between sites. A typical case is that of a collection of WWW sites, with links pointing freely from one site to another (even forming cycles). A naive query shipping strategy may force the query to migrate back and forth between the various sites, leading to poor performance (or even non-termination). We present a technique for query decomposition, under which the query is shipped exactly once to every site, computed locally, then the local results are shipped to the client, and assembled here into the final result. This technique is efficient, in that (a) only data which is part of the final result is shipped from the data sites to the client site, and (b) the total work done locally at all sites does not exceed that needed for computing the (unoptimized) query on a centralized version of the database. Permission to copy without fee 011 or part of this material is granted provided that the copies ore not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice io given that copy’ng is by.permi.vsion of the Very Large Data Bose Endowment. To copy otherwise, or to republish, require8 o fee and/or special permission from the Endowment. Proceedings of the 22nd VLDB Conference Mumbai(Bombay), India, 1996 We also show that the query decomposition technique can be adapted to derive a simple view maintenance method, for two forms of updates which we introduce for the graph data model.
TL;DR: The use of an algebraic source code query technique that blends expressive power with query compactness is demonstrated and a case study where SCA expressions are used to query a program in terms of program organization, resource flow, control flow, metrics and syntactic structure is presented.
Abstract: Querying source code is an essential aspect of a variety of software engineering tasks such as program understanding, reverse engineering, program structure analysis and program flow analysis. In this paper, we present and demonstrate the use of an algebraic source code query technique that blends expressive power with query compactness. The query framework of Source Code Algebra (SCA) permits users to express complex source code queries and views as algebraic expressions. Queries are expressed on an extensible, object-oriented database that stores program source code. The SCA algebraic approach offers multiple benefits such as an applicative query language, high expressive power, seamless handling of structural and flow information, clean formalism and potential for query optimization. We present a case study where SCA expressions are used to query a program in terms of program organization, resource flow, control flow, metrics and syntactic structure. Our experience with an SCA-based prototype query processor indicates that an algebraic approach to source code queries combines the benefits of expressive power and compact query formulation.
TL;DR: This article develops a new approach, a “model-assisted global query system,” that utilizes an on-line repository of enterprise metadata—the Metadatabase—to facilitate global query formulation and processing with certain desirable properties such as adaptiveness and open-systems architecture.
Abstract: Today's enterprises typically employ multiple information systems, which are independently developed, locally administered, and different in logical or physical designs. Therefore, a fundamental challenge in enterprise information management is the sharing of information for enterprise users across organizational boundaries; this requires a global query system capable of providing on-line intelligent assistance to users. Conventional technologies, such as schema-based query languages and hard-coded schema integration, are not sufficient to solve this problem. This article develops a new approach, a “model-assisted global query system,” that utilizes an on-line repository of enterprise metadata—the Metadatabase—to facilitate global query formulation and processing with certain desirable properties such as adaptiveness and open-systems architecture. A definitional model characterizing the various classes and roles of the required metadata as knowledge for the system is presented. The significance of possessing this knowledge (via a Metadatabase) toward improving the global query capabilities available previously is analyzed. On this basis, a direct method using model traversal and a query language using global model constructs are developed along with other new methods required for this approach. It is then tested through a prototype system in a computer-integrated manufacturing (CIM) setting.
TL;DR: In this paper, a method and an apparatus may be implemented in a digital computer to query a set of arbitrarily structured records, which are structured differently from each other, and a query engine, query structure, operators of conventional and non-conventional types may be used in formulating a query.
Abstract: A method and apparatus disclosed may be implemented in a digital computer to query a set of arbitrarily structured records. Arbitrarily structured records are structured differently from each other. A query engine, query structure, operators of conventional and non-conventional types may be used in formulating a query. The apparatus may evaluate records having missing fields, repeating fields, or an UNKNOWN value arising from a missing field, division by zero, modulo by zero, or the like. New aggregator (e.g. universal quantifier and existential quantifier) and selector operators (e.g., first, last, nth) may distill multiple values to return a single value. To evaluate a query, the search engine may implement filtered indices, alternate-key indices, compound alternate-key indices, hybrid queries having both full-text and non-full text operands, and joinder of records. Certain of these features may be implemented for evaluating records from both prior art databases and heterogeneous databases of arbitrarily structured records.
TL;DR: This work proposes an adaptive approach to interoperability which allows information consumers to represent their queries based on the customized personal view rather than at system-defined integrated view.
Abstract: The authors propose a query mediation framework to support customizable information gathering across heterogeneous and autonomous information sources. Instead of an integrated (and static) global schema, they propose an adaptive approach to interoperability which allows information consumers to represent their queries based on the customized personal view rather than at system-defined integrated view. The query mediation framework consists of five steps: query routing, query decomposition, parallel access plan generation, subquery translation and execution, and query result assembly. Concrete examples illustrate the challenges arising from heterogeneity in these five steps and how the framework scales up as the number of information sources grow and evolve.
TL;DR: The area of query processing is described and an approach to the management of late bound functions is presented which allows optimization of invertibleLate bound functions where available indexes are utilized even though the function is late bound.
Abstract: To support new application areas for database systems such as mechanical engineering applications or office automation applications a powerful data model is required that supports the modelling of complex data, e.g. the object-oriented model. The object-oriented model supports subtyping, inheritance, operator overloading and overriding. These are features to assist the programmer in managing the complexity of the data being modelled. Another desirable feature of a powerful data model is the ability to use inverted functions in the query language, i.e. for an arbitrary function call fn(x)=y, retrieve the arguments x for a given result y. Optimization of database queries is important in a large database system since query optimization can reduce the execution cost dramatically. The optimization considered here is a cost-based global optimization where all operations are assigned a cost and a way of a priori estimating the number of objects in the result. To utilize available indexes the optimizer has full access to all operations used by the query, i.e. its implementation. The object-oriented data modelling features lead to the requirement of having late bound functions in queries which require special query processing strategies to achieve good performance. This is so because late bound functions obstruct global optimization since the implementation of a late bound function cannot be accessed by the optimizer and available indexes remain hidden within the function body. In this thesis the area of query processing is described and an approach to the management of late bound functions is presented which allows optimization of invertible late bound functions where available indexes are utilized even though the function is late bound. This ability provides a system with support for the modelling of complex relations and efficient execution of queries over such complex relations.
TL;DR: In this article, a system and method for accepting and responding to queries based on information stored on multiple heterogenous information sources is presented, and a query plan for answering the query is formulated from descriptions of the contents and capabilities of the available information sources.
Abstract: A system and method for accepting and responding to queries based on information stored on multiple heterogenous information sources. A uniform query interface to large collections of structured information sources is provided to a user to pose queries using a uniform schema of the available information. A query plan for answering the query is formulated from descriptions of the contents and capabilities of the available information sources. Based on these descriptions logical solutions which are subsets of the complete solution to the query are derived. An order for executing these solutions is determined based on the input requirements and other capabilities of the relevant information sources.
TL;DR: This work presents the hybrid query language HQL/EER for an Extended Entity-Relationship model and demonstrates the look-and-feel of this query language, and shows how syntax and semantics of this language are formally defined using programmed graph rewriting systems.
Abstract: We present the hybrid query language HQL/EER for an Extended Entity-Relationship model. As its main characteristic, this language allows a user to usebothgraphical and textual elements in the formulation of one and the same query. We demonstrate the look-and-feel of this query language by means of examples, and show how syntax and semantics of this language are formally defined using programmed graph rewriting systems. Although we present the language in the context of the EER model, the concept of hybrid languages is applicable in the context of other database models as well. We illustrate this claim by discussing a prototype implementation of a Hybrid Query Tool based on an object-oriented approach, namely the Object Modeling Technique (OMT).
TL;DR: A variant of the membership query model in which the learning algorithm is given as input the number of relevant variables of the target function, using a number-theoretic coloring technique, it is shown that in this model, any class of functions that can be learned in polynomial time can be learning attributeefficiently in poynomial time.
Abstract: We consider the problem ofattribute-efficientlearning in query and mistake-bound models. Attribute-efficient algorithms make a number of queries or mistakes that is polynomial in the number of relevant variables in the target function, but only sublinear in the number of irrelevant variables. We consider a variant of the membership query model in which the learning algorithm is given as input the number of relevant variables of the target function. We show that in this model, any projection and embedding closed class of functions (including parity) that can be learned in polynomial time can be learned attribute-efficiently in polynomial time. We show that this does not hold in the randomized membership query model. In the mistake-bound model, we consider the problem of learning attribute-efficiently using hypotheses that are formulas of small depth. Our results extend the work of A. Blum, L. Hellerstein, and N. Littlestone (J. Comput. System Sci.50(1995), 32?40) and N. Bshouty, R. Cleve, S. Kannan, and C. Tamon (in “Proceedings, 7th Annu. ACM Workshop on Comput. Learning Theory,” pp. 130?139, ACM Press, New York, 1994).
TL;DR: The results of an empirical performance study carried out in the application domain of market research are presented which substantiate the practical importance of such work and show that query response time can be shortened in an order of magnitude if a proper data aggregation concept is used.
Abstract: Although most state-of-the-art database systems have no inherent limitations w.r.t. the amount of data they can handle, the huge data quantities typically found in scientific database applications often exceed the feasibility level from a practical point of view when query performance is the issue. One theoretically well-known concept of improving query response time in scientific database applications is using the categorization and classification facilities often found in scientific computing domains for storing data aggregations that allow to substitute expensive access to raw data by the use of stored aggregated values. The results of an empirical performance study carried out in the application domain of market research are presented which substantiate the practical importance of such work. Using real market research data, it is shown that query response time can be shortened in an order of magnitude if a proper data aggregation concept is used. If the data aggregates are designed properly, the overhead of generating and managing materializations of data aggregates is by far outweighed by the improved query performance in realistic scenarios.
TL;DR: This paper presents a dynamic reordering strategy that can be exploited to match execution order to the optimal data fetch order, in all parts of the plan-tree, and reports on a prototype implementation based on Postgres.
Abstract: In the relational model the order of fetching data does not affect query correctness. This flexibility is exploited in query optimization by statically reordering data accesses. However, once a query is optimized, it is executed in a fixed order in most systems, with the result that data requests are made in a fixed order. Only limited forms of runtime reordering can be provided by low-level device managers. More aggressive reordering strategies are essential in scenarios where the latency of access to data objects varies widely and dynamically, as in tertiary devices. This paper presents such a strategy. Our key innovation is to exploit dynamic reordering to match execution order to the optimal data fetch order, in all parts of the plan-tree. To demonstrate the practicality of our approach and the impact of our optimizations, we report on a prototype implementation based on Postgres. Using our system, typical I/O cost for queries on tertiary memory databases is as much as an order of magnitude smaller than with conventional query processing techniques.
TL;DR: Query by Review, with its more constrained user interface, performed somewhat better than AccessMed, a more general tool, which points to the difficulty of formulating a query for a clinical database and the need for further work.