TL;DR: In this paper, the authors propose new principles for visual information seeking (VIS), which are distinguished from familiar query composition and information retrieval because of its emphasis on rapid filtering to reduce result sets, progressive refinement of search parameters, continuous reformulation of goals, and visual scanning to identify results.
Abstract: This paper offers new principles for visual information seeking (VIS). A key concept is to support browsing, which is distinguished from familiar query composition and information retrieval because of its emphasis on rapid filtering to reduce result sets, progressive refinement of search parameters, continuous reformulation of goals, and visual scanning to identify results. VIS principles developed include: dynamic query filters (query parameters are rapidly adjusted with sliders, buttons, maps, etc.), starfield displays (two-dimensional scatterplots to structure result sets and zooming to reduce clutter), and tight coupling (interrelating query components to preserve display invariants and support progressive refinement combined with an emphasis on using search output to foster search input). A FilmFinder prototype using a movie database demonstrates these principles in a VIS environment.
TL;DR: In this article, a method of identifying, retrieving, or sorting documents by language or topic involving the steps of creating an n-gram array for each document in a database, parsing an unidentified document or query into n-words, assigning a weight to each n-word, removing the commonality from the nword, comparing each unidentified document and query to each database document, scoring the query and document against each document for similarity, and based on the similarity score, identifying retrieving or sorting the query with respect to language and topic.
Abstract: A method of identifying, retrieving, or sorting documents by language or topic involving the steps of creating an n-gram array for each document in a database, parsing an unidentified document or query into n-grams, assigning a weight to each n-gram, removing the commonality from the n-grams, comparing each unidentified document or query to each database document, scoring the unidentified document or query against each database document for similarity, and based on the similarity score, identifying retrieving, or sorting the document or query with-respect to language or topic.
TL;DR: In this article, a computer controlled display system providing for graphical representation of a query to a database and creation and traversal through a search history is presented, where a database search is typically performed by a sequence of narrowing queries, each narrowing query is performed in a query window.
Abstract: A computer controlled display system providing for graphical representation of a query to a database and creation and traversal through a search history. A database search is typically performed by a sequence of narrowing queries. Each narrowing query is performed in a query window. A query window is comprised of an input area for entering query expressions, an query results display area, an indicator of a search scope associated with the query window and a history indicator area. A suitable information visualization technique is used to graphically display the search results in the query results display area. From these visualizations, new search scopes and query windows are created. A search path comprising the query windows for the current search path are displayed at any instant of time of the search. A history mechanism provides for ready traversal through the search history.
TL;DR: In this paper, an extensible query architecture based on an abstract base class of query nodes, or code objects that retrieve records from the database, is presented, where specific subclasses for particular query models are derived from the base class.
Abstract: An information retrieval system incorporates an extensible query architecture allowing an applications programmer to integrate new query models into the system as desired. The query architecture is based on an abstract base class of query nodes, or code objects that retrieve records from the database. Specific subclasses for particular query models are derived from the base class. Each query node class includes a search function that iteratively searches the database for matching records. Query node objects are instantiated by associated node creator class objects. A parser is used to parse a search query into its components, including nested search queries used to combine various query models. The parser determines the particular search operator keywords, and the node creator object for instantiating the appropriate query node object for each search operator. The node creator objects return pointers to the created query nodes, allowing the parser to assemble complex hierarchical query nodes that combine multiple query models.
TL;DR: Search system and method for retrieving relevant documents from a text data base collection comprised of patents, medical and legal documents, journals, news stories and the like.
TL;DR: The experiments show that on average a current generation natural language system provides better retrieval performance than expert searchers using a Boolean retrieval system when searching full-text legal materials.
Abstract: The results of experiments comparing the relative performance of natural language and Boolean query formulations are presented. The experiments show that on average a current generation natural language system provides better retrieval performance than expert searchers using a Boolean retrieval system when searching full-text legal materials. Methodological issues are reviewed and the effect of database size on query formulation strategy is discussed.
TL;DR: This work approaches the subsumption problem in the setting of object-oriented databases, and finds that reasoning techniques from Artificial Intelligence can be applied and yield efficient algorithms.
Abstract: Subsumption between queries is a valuable information, eg, for semantic query optimization We approach the subsumption problem in the setting of object-oriented databases, and find that reasoning techniques from Artificial Intelligence can be applied and yield efficient algorithms
TL;DR: In this article, the authors present techniques for optimizing queries in a system in which executing the query requires retrieval of information from a number of different data bases which are accessible via a network.
Abstract: Techniques for optimizing queries in a system in which executing the query requires retrieval of information from a number of different data bases which are accessible via a network. In the techniques, a query results in a query plan which includes subplans for querying the data bases which contain the required information. When a subplan is executed in one of the data bases, the data base returns not only the information which results from the execution of the subplan, but also source and constraint information about the data in the data base. The source and constraint information is then used to optimize the query plan by pruning redundant subplans. An embodiment is disclosed in which queries are made to a domain model implemented using a knowledge base system. The domain model includes a world view of the data, a set of descriptions of the data bases, and a set of descriptions of how to access the data. The information in the domain model is used to formulate the query plan.
TL;DR: Preliminary experimental results reveal that the proposed type abstraction hierarchy provides an organized structure representing concepts at different knowledge levels in various domains, and provides a systematic and efficient method for cooperative query answering.
Abstract: This paper proposes the use of a type abstraction hierarchy as a framework for deriving cooperative query answers. The type abstraction hierarchy integrates the abstraction view with the subsumption (is-a) and composition (part-of) views of a type hierarchy. Such a framework provides multilevel object representation, which is an important aspect of cooperative query answering. The concept of pattern that specifies one or more conditions on an object is also proposed. Patterns have smaller granularity than types, and thus provide more specific semantic information. Cooperative query answering consists of query relaxation, generalization, specialization, and association on patterns. Query relaxation can be explicitly specified by the user or implicitly performed by the system. The implicit and explicit relaxations can also be combined and performed interactively by both the system and the user. CSQL, an extension of SQL for cooperative query answering, is also proposed. Preliminary experimental results reveal that the proposed type abstraction hierarchy provides an organized structure representing concepts at different knowledge levels in various domains, and provides a systematic and efficient method for cooperative query answering. >
TL;DR: In this article, a method and apparatus for improving the efficiency and security of a database management system (DBMS) is disclosed, where a plurality of query packages are stored at a host DBMS.
Abstract: A method and apparatus for improving the efficiency and security of a database management system (DBMS) is disclosed. A plurality of query packages are stored at a host DBMS. Generation of the query packages is limited only to those users that have authorization, such as the database administrator of the DBMS. The query packages include a plurality of procedures. Each procedure is a single SQL statement that has been pre-compiled. Each query package also includes an authorization table that enumerates each individual user, or category of user, that can invoke any procedure within the query package. By formulating a plurality of query packages, each package tailored to a particular category of user, security of the data on the DBMS is enhanced. Further, by having static statements that are pre-compiled, access time to the data on the DBMS is significantly reduced. Moreover, the database administrator and users can interactively generate and use the query packages in a user friendly environment.
TL;DR: The system allows one to represent the largest amount of data that can be visualized on current display technology, provides valuable feedback in querying the database, and allows the user to find results which would otherwise remain hidden in the database.
Abstract: Describes a query system that provides visual relevance feedback in querying large databases. The goal is to support the process of data mining by representing as many data items as possible on the display. By arranging and coloring the data items as pixels according to their relevance for the query, the user gets a visual impression of the resulting data set. Using an interactive query interface, the user may change the query dynamically and receives immediate feedback by the visual representation of the resulting data set. Furthermore, by using multiple windows for different parts of a complex query, the user gets visual feedback for each part of the query and, therefore, may easier understand the overall result. The system allows one to represent the largest amount of data that can be visualized on current display technology, provides valuable feedback in querying the database, and allows the user to find results which would otherwise remain hidden in the database. >
TL;DR: A number of further differences between sort-merge-join and hybrid hash join that traditionally have been ignored in such comparisons and render sort-MERge- join mostly obsolete are outlined.
Abstract: Matching two sets of data items is a fundamental operation required in relational, extensible, and object-oriented database systems alike. However, the pros and cons of sort- and hash-based query evaluation techniques in modern query processing systems are still not fully understood. After our earlier research clarified strengths and weaknesses of sort- and hash-based query processing techniques and suggested remedies for the shortcomings of hash-based algorithms, the present paper outlines a number of further differences between sort-merge-join and hybrid hash join that traditionally have been ignored in such comparisons and render sort-merge-join mostly obsolete. We consolidate old and raise new issues pertinent to the comparison of sort- and hash-based query evaluation techniques and stir some thought and discussion among both academic and industrial database system builders. >
TL;DR: The results show that the most effective sources were the users written question statement, user terms derived during the interaction and terms selected from particular database fields.
Abstract: To improve information retrieval effectiveness, research in both the algorithmic and human approach to query expansion is required. This paper uses the human approach to examine the selection and effectiveness of search terms sources for query expansion. The results show that the most effective sources were the users written question statement, user terms derived during the interaction and terms selected from particular database fields. These findings indicate the need for the design and testing of automatic relevance feedback techniques that place greater emphasis on these sources.
TL;DR: This paper contains a complete classiication of oo nested queries and appropriate unnesting optimization strategies based on algebraic rewriting and introduces two new and powerful grouping operators which will form the basis for the unnesting techniques.
Abstract: Many declarative query languages for object-oriented (oo) databases allow nested subqueries. This paper contains a complete classiication of oo nested queries and appropriate unnesting optimization strategies based on algebraic rewriting. We adapt some known relational techniques and introduce new ones that use and are concerned with features speciic to object-oriented queries. In particular, we introduce two new and powerful grouping operators which will form the basis for our unnesting techniques.
TL;DR: In this article, the search space identifies a plurality of objects, whether directly or through means of an index, each object comprising a pluralityof words, the attributes defining the conditions imposed on the search.
Abstract: A computer system and method for performing rapid and multi-dimensional word searches upon specification of a search space and specification of a search query. The search space identifies a plurality of objects, whether directly or through means of an index, each object comprising a plurality of words. The search query comprises a plurality of words and a plurality of attributes, the attributes defining the conditions imposed on the search. The search query is processed in two steps. In the first step, a parser evaluates the search query and creates a data structure based on the words and the attributes in the search query and the scope of an index, if the search space includes an index. The parser allows a rich syntax of attributes as well as complex (multi-dimensional) combinations of attributes. In the second step, an evaluator generates a list of objects in the search space which satisfy the search query by scanning the search space with the data structure. The evaluator scans object data where the search space identifies at least one object; scans index data where the search space identifies at least one index of objects and the indexes are sufficient to resolve the search query; and scans complex combinations of object data and index data where index data alone is insufficient to resolve the search query.
TL;DR: It is proved that it is /spl Nscr//spl Pscr/-/spl Cscr/omplete to eliminate as many unnecessary joins as possible for various types of acyclic queries with the exception of the closure chain queries whose query graphs are chains and all equi-join attributes are distinct.
Abstract: Semantic query optimization, or knowledge-based query optimization, has received increasing interest in recent years. The authors provide an effective and systematic approach to optimizing queries by appropriately choosing semantically equivalent transformations. Basically, there are two different types of transformations: transformations by eliminating unnecessary joins, and transformations by adding/eliminating redundant beneficial/nonbeneficial selection operations (restrictions). A necessary and sufficient condition to eliminate a single unnecessary join is provided. We prove that it is /spl Nscr//spl Pscr/-/spl Cscr/omplete to eliminate as many unnecessary joins as possible for various types of acyclic queries with the exception of the closure chain queries whose query graphs are chains and all equi-join attributes are distinct. An algorithm is provided to minimize the number of joins in tree queries. This algorithm has an important property that, when applied to a closure chain query, it will yield an optimal solution with the time complexity O(n*m), where n is the number of relations referenced in the chain query, and m is the time complexity of a restriction closure computation. >
TL;DR: This paper depart from deductive paradigm with object-oriented extensions, then relax the too strict modus ponens in the classic propositional logic by appropriate inference rules that would capture the relevance of information in the document to the information needed by the user.
Abstract: Relevance of the retrieved documents to a query is in the sense of information retrieval a judgement of the user rather than the material implication in the sense of logic. In this paper we depart from deductive paradigm with object-oriented extensions, then relax the too strict modus ponens in the classic propositional logic by appropriate inference rules that would capture the relevance of information in the document to the information needed by the user. In such a framework, a document is relevant to a query if the latter can be deduced from the set of axioms associated with the document using inference rules. As various kinds of inference rules will be used in the deduction, we distinguish between logical, strict, and plausible rules. Answering a query in such a framework can be done either by a special query processor that supports different kinds of inference mechanisms, or by relaxing the original query so that it can be evaluated by an ordinary query processor. Instead of suggesting a new logic model, we make use of the query answering machinery of deductive and object-oriented database in this approach.
TL;DR: Experimental results demonstrate that this approach can learn sufficient background knowledge to reformulate queries and provide a 57 percent average performance improvement.
Abstract: Semantic query optimization can dramatically speed up database query answering by knowledge intensive reformulation. But the problem of how to learn required semantic rules has not previously been solved. This paper describes an approach using an inductive learning algorithm to solve the problem. In our approach, learning is triggered by user queries and then the system induces semantic rules from the information in databases. The inductive learning algorithm used in this approach can select an appropriate set of relevant attributes from a potentially huge number of attributes in real-world databases. Experimental results demonstrate that this approach can learn sufficient background knowledge to reformulate queries and provide a 57 percent average performance improvement.
TL;DR: This work presents a general technique to push query constraints into database views and (constraint) logic programs, which can be executed on systems, such as database query evaluation systems, that do not handle full constraint solving.
Abstract: We present a general technique to push query constraints (such as length≤1000) into database views and (constraint) logic programs. We introduce the notion of parametrized constraints, which help us push constraints with argument values that are known only at run time, and develop techniques for pushing parametrized constraints into predicate/view definitions. Our technique provides a way of compiling programs with constraint queries into programs with parametrized constraints compiled in, and which can be executed on systems, such as database query evaluation systems, that do not handle full constraint solving. Thereby our technique can push constraint selections that earlier constraint query rewriting techniques could not. Our technique is independent of the actual constraint domain, and we illustrate its use with equality constraints on structures (which are useful in object-oriented query languages) and linear arithmetic constraints.
TL;DR: In this article, a database system that provides independence between the query and physical structure of the database tables by captioning each database table with a partial query reflecting the contents of that table is presented.
Abstract: A database system that provides independence between the query and physical structure of the database tables by captioning each database table with a partial query reflecting the contents of that table. In particular, the partial query is a query that if applied to a larger database of a standard configuration would produce the data of the table. Relevant tables for a particular query may be identified by piecing together the partial queries until the user query is obtained. The database system may be integrated with an optimizer by comparing each of the identified tables against the others for the amount of overlap their sub-queries have with the user query and the cost of accessing the table and then repeating this process as the tables are joined in various combinations.
TL;DR: A framework for database querying providing the user with several interaction paradigms based on different visual representations of the database, in order to maintain the same query consistently in any representation is proposed.
Abstract: We propose a framework for database querying providing the user with several interaction paradigms based on different (i.e., form-based, diagrammatic, iconic, and hybrid) visual representations of the database. A unified model, namely the Graph Model, is used as the common underlying model, in terms of which databases expressed in the most common data models can be easily converted. Graph Model databases can be queried by means of the multiparadigmatic interface. The semantics of the query operations is formally defined in terms of the Graphical Primitives. Such a formal approach enables the query manager to maintain the same query consistently in any representation. In the proposed multiparadigmatic environment, the user can switch from one interaction paradigm to another during query formulation, so that the most suitable query representation can be found.
TL;DR: This paper aims to improve the description of documents' information content by formalizing several approaches defined to soften the classical Boolean Information Retrieval model.
Abstract: Introduction In the last years several approaches have been defined to soften the classical Boolean Information Retrieval model, These approaches were formalized within different mathematical framework with the following aims:-to improve the description of documents' information content;
TL;DR: A new full-text document retrieval model that is based on comparing occurrence frequency rank numbers of terms in queries and documents is introduced.
Abstract: This paper introduces a new full-text document retrieval model that is based on comparing occurrence frequency rank numbers of terms in queries and documents.
TL;DR: A proposed query language for extended entity relationship schemas that includes quantifiers and aggregates to allow complex queries to be expressed, and it allows derived subtypes, attributes and relationships to be defined and used in queries.
Abstract: We present a proposed query language for extended entity relationship schemas The language improves on previous proposals by using only concepts explicitly in a given schema It includes quantifiers and aggregates to allow complex queries to be expressed, and it allows derived subtypes, attributes and relationships to be defined and used in queries Further extensions are discussed
TL;DR: The parallel OO query processing algorithms analyzed in this study are based on a query graph approach rather than the traditional query tree approach, and avoid the execution of time-consuming join operations by making use of the object references among the objects.
Abstract: Advanced application domains such as computer-aided design, computer-aided software engineering, and office automation are characterized by their need to store, retrieve, and manage large quantities of data having complex structures. A number of object-oriented database management systems (OODBMS) are currently available that can effectively capture and process the complex data. The existing implementations of OODBMS outperform relational systems by maintaining and querying cross-references among related objects. However, the existing OODBMS still do not meet the efficiency requirements of advanced applications that require the execution of complex queries involving the retrieval of a large number of data objects and relationships among them. Parallel execution can significantly improve the performance of complex OO queries. In this paper, we analyze the performance of parallel OO query processing algorithms for various benchmark application domains. The application domains are characterized by specific mixes of queries of different semantic complexities. The performance of the application domains has been analyzed for various system and data parameters by running parallel programs on a 32-node transputer based parallel machine developed at the IBM Research Center at Yorktown Heights. The parallel processing algorithms, data routing techniques, and query management and control strategies have been implemented to obtain accurate estimation of controlling and processing overheads. However, generation of large complex databases for the study was impractical. Hence, the data used in the simulation have been parameterized. The parallel OO query processing algorithms analyzed in this study are based on a query graph approach rather than the traditional query tree approach. Using the query graph approach, a query is processed by simultaneously initiating the execution at several object classes, thereby, improving the parallelism. During processing, the algorithms avoid the execution of time-consuming join operations by making use of the object references among the objects. Further, the algorithms do not generate any temporary data, thereby, reducing disk accesses. This is accomplished by marking the selected objects and by employing a two-phase query processing strategy.
TL;DR: The purpose of this paper is to prepare a formal framework for studying “polynomial-time” query learnability, and introduces necessary notation and clarify notions that are necessary for discussing polynomial-time query learning.
Abstract: Query learning is to learn aconcept (i.e., a representation of some language) through communication with ateacher (i.e., someone who knows the concept). The purpose of this paper is to prepare a formal framework for studying “polynomial-time” query learnability. We introduce necessary notation and, by using several examples, clarify notions that are necessary for discussing polynomial-time query learning.
TL;DR: This paper studies the construction of MLDBs using generalization and knowledge discovery techniques and the application ofMLDBs to cooperative/intelligent query answering in database systems.
Abstract: How can a real-estate agent respond to inquiries quickly and intelligently? Thètrick' could be using a simple table to brieey outline the general information and a complete book to reference the details. Such a method can be generalized to the construction of a multiple layered database (MLDB), a useful database organization technique for cooperative query answering, database browsing, query optimization and querying cooperative information systems. In this paper, we study the construction of MLDBs using generalization and knowledge discovery techniques and the application of MLDBs to cooperative/intelligent query answering in database systems.
TL;DR: This paper describes the cql data description and query language, query optimizations, and provides comparisons with other tools.
Abstract: cql is a UNIX system tool that applies C style query expressions to flat file databases. In some respects it is yet another addition to the toolbox of programmable file filters: grep [Hume88], sh [Bour78] [BK89], awk [AKW88], and perl [Wall]. However, by restricting its problem domain, cql takes advantage of optimizations not available to these more general purpose tools.
This paper describes the cql data description and query language, query optimizations, and provides comparisons with other tools.
TL;DR: In this paper, the query statements and their results are graphically presented as a tree (108), wherein the query statement and query results are nodes (106,107) and each query statement result (107) is a child of a query statement (106) which was run to create it.
Abstract: A search facility having a user interface (100) including three windows: a query window (101), a graph window (102) and a history window (103), presented simultaneously in the graphical user interface (100). The query window (101) displays the text of the most recently input query statement (104) which is searched in a database stored in a computer system. The graph window (102) graphically displays the current results (105) of the most recent query statement (104). The history window (103) presents the query statements and their results during the current query session. In one preferred embodiment, the query statements and their results are graphically presented as a tree (108), wherein the query statements and query results are nodes (106,107) and each query statement result (107) is a child of the query statement (106) which was run to create it. Input to any of the windows will change the presentation of data within the other two windows.
TL;DR: The concept of path dictionary is introduced and it is argued that most queries in object-oriented databases require traversing from one object to another in the aggregation hierarchy and should be represented separately from the database.
Abstract: This paper argues that most queries in object-oriented databases require traversing from one object to another in the aggregation hierarchy. Thus, the connections between objects through object identifiers are essential to the efficiency of query processing and should be represented separately from the database. We introduce the concept of path dictionary and describe how it supports queries of different types. We evaluate the storage overhead, query and update costs of the path dictionary. Compared to the path index, the path dictionary has better overall query and update performance and lower storage overhead.