TL;DR: G is proposed as a complementary language in which recursive queries are simple to formulate, and it is hoped that this graphical query language can be able to exploit well-known graph algorithms in evaluating recursive queries efficiently, a topic which has received widespread attention recently.
Abstract: We define a language G for querying data represented as a labeled graph G. By considering G as a relation, this graphical query language can be viewed as a relational query language, and its expressive power can be compared to that of other relational query languages. We do not propose G as an alternative to general purpose relational query languages, but rather as a complementary language in which recursive queries are simple to formulate. The user is aided in this formulation by means of a graphical interface. The provision of regular expressions in G allows recursive queries more general than transitive closure to be posed, although the language is not as powerful as those based on function-free Horn clauses. However, we hope to be able to exploit well-known graph algorithms in evaluating recursive queries efficiently, a topic which has received widespread attention recently.
TL;DR: It is shown how these new tactics can be deployed to greatly increase the space of interesting strategies for optimising all of SQL or other query languages that have similiar features, without seriously altering the architecture of existing optimisers.
Abstract: Existing query optimizers focus on Restrict-Project-Join queries. In practice, however, query languages such as SQL and DAPLEX have many powerful features (eg., control over duplicates, nested subqueries, grouping, aggregates, and quantifiers) that are not expressible as sequences of Restrict, Project, and Join operations. Existing optimizers are severely limited in their strategies for processing such queries; typically they use only tuple substitution, and process nested subquery blocks top down. Tuple substitution, however, is generally inefficient and especially so when the database is distributed. Hence, it is imperative to develop alternative strategies. This paper introduces new operations for these difficult features, and describes implementation methods for them. From the algebraic properties of these operations, new query processing tactics are derived. It is shown how these new tactics can be deployed to greatly increase the space of interesting strategies for optimisa tion, without seriously altering the architecture of existing optimisers. The contribution of the paper is in demonstrating the feasibility and desirability of developing an integrated framework for optimising all of SQL or other query languages that have similiar features.
TL;DR: Previous research in the area of nested query optimization which sought methods of reducing evaluation costs is summarized, including a classification scheme for nested queries, algorithms designed to transform each type of query to a logically equivalent form which may then be evaluated more efficiently.
Abstract: Current methods of evaluating nested queries in the SQL language can be inefficient in a variety of query and data base contexts. Previous research in the area of nested query optimization which sought methods of reducing evaluation costs is summarized, including a classification scheme for nested queries, algorithms designed to transform each type of query to a logically equivalent form which may then be evaluated more efficiently, and a description of a major bug in one of these algorithms. Further examination reveals another bug in the same algorithm. Solutions to these bugs are proposed and incorporated into a new transformation algorithm, and extensions are proposed which will allow the transformation algorithms to handle a larger class of predicates. A recursive algorithm for processing a general nested query is presented and the action of this algorithm is demonstrated. This algorithm can be used to transform any nested query.
TL;DR: This paper describes its operations by transformation rules which generate different QEPs from initial query specifications and hopes that the approach taken will contribute to the more general goal of a modular query optimizer as part of an extensible database management system.
Abstract: The query optimizer is an important system component of a relational database management system (DBMS). It is the responsibility of this component to translate the user-submitted query - usually written in a non-procedural language - into an efficient query evaluation plan (QEP) which is then executed against the database. The research literature describes a wide variety of optimization strategies for different query languages and implementation environments. However, very little is known about how to design and structure the query optimization component to implement these strategies.This paper proposes a first step towards the design of a modular query optimizer. We describe its operations by transformation rules which generate different QEPs from initial query specifications. As we distinguish different aspects of the query optimization process, our hope is that the approach taken in this paper will contribute to the more general goal of a modular query optimizer as part of an extensible database management system.
TL;DR: The scheme uses a graph theoretic approach to identify redundant join clauses and redundant restriction clauses specified in a user query and an algorithm is suggested to eliminate such redundant joins and avoid unnecessary restrictions.
Abstract: This paper describes a scheme to utilize semantic integrity constraints in optimizing a user specified query. The scheme uses a graph theoretic approach to identify redundant join clauses and redundant restriction clauses specified in a user query. An algorithm is suggested to eliminate such redundant joins and avoid unnecessary restrictions. In addition to these eliminations, the algorithm aims to introduce as many restrictions on indexed attributes as possible, thus yielding an equivalent, but potentially more profitable, form of the original query.
TL;DR: In this paper, a denotational semantics approach to formal description of query languages is proposed, which allows us to equip most database models (relational, hierarchical, network, semantic and so on) with powerful query languages possessing clear, formal and precise semantics.
TL;DR: A graphical representation of the ECR model and its application as the basis for an interactive query language are discussed and a method of implementing the graphical ECR interface for accessing relational database systems is proposed.
Abstract: The Entity-Category-Relationship (ECR) model extends the Entity-Relationship (ER) model with the concepts of subclass and generalization categories. In this paper a graphical representation of the ECR model and its application as the basis for an interactive query language are discussed. The proposed query language is based on algebraic operators that can be used to transform an ECR diagram so that it represents a desired query. Semantics of the algebraic operators are formally defined. A method of implementing the graphical ECR interface for accessing relational database systems is proposed.
TL;DR: The presented solutions are based on detecting whether the response associated with a query is influenced by a database update, and on correcting the response after an update, based on NETUL, a user-friendly query language with the power of programming languages, for network/semantic data models.
Abstract: A stored query is a pair , where "respons&' is the-query meaning for the current database state. When a collection of stored queries is available responses to sane queries may be obtained easily. Stored queries give a possibility of improvement of database sys tern response time regardless of the complexity of user request and the data model assumed. The method is a generalization of methods based on indices. Its main properties and problems are outlined, particularly the problem of updating stored queries. The presented solutions are based on detecting whether the response associated with a query is influenced by a database update, and on correcting the response after an update. The methods concern NETUL, a user-friendly query language, with the power of programming languages, for network/semantic data models.
TL;DR: This paper presents a framework for the study of the query decomposition translation for heterogeneous record -oriented database management systems based on the applied database logic representation of relational, hierarchical and network databases.
Abstract: This paper presents a framework for the study of the query decomposition translation for heterogeneous record -oriented database management systems. This framework is based on the applied database logic representation of relational, hierarchical and network databases. The input to the query decomposition translation is the query graph which is derived from the complex to basic, external to conceptual and logical optimization translations. Once the query graph is obtained the objective of the query decomposition translation is to break up a query expressed in terms of the actual or conceptual databases into its component parts or subqueries and find a strategy indicating the sequence of primitive or fundamental operations and their corresponding processing sites in the network necessary to answer the query. The query processing strategy is usually chosen so as to satisfy some performance criterion such as response time reduction. Contingent on after each primitive operation. The prequery decomposition translation, the query decomposition translation and the size estimation issues are presented through an example based on the current implementation of the Distributed Access View Integration Database (DAVID) currently being built at NASA''s Goddard Space Flight Center (GSFC). The choice of a query processing strategy is the successful estimation of intermediate results
TL;DR: Two high level interfaces which can be used to assist a student in query formulation are discussed and preliminary experiences have shown that they can facilitate teaching of query formulation and help students to understand better the syntax and semantics of non-procedural query languages.
Abstract: The objectives and the general structure of a database management system designed for instructional use are described in this paper. Two high level interfaces which can be used to assist a student in query formulation are discussed. First of these interfaces is graphical and uses Macintosh microcomputer as a user's workstation. The other interface guides a student through the process of query formulation using a menu-driven approach. Preliminary experiences with these interfaces have shown that they can facilitate teaching of query formulation and help students to understand better the syntax and semantics of non-procedural query languages.
TL;DR: An optimal query has been defined as one which will recover all the known relevant documents of a query in their best probability of relevance ranking, and it is slightly modified so that it allows one to trace its evolution from the original to the optimal via the various feedback stages.
Abstract: An optimal query has been defined as one which will recover all the known relevant documents of a query in their best probability of relevance ranking. We have slightly modified the definition so that it also allows one to trace its evolution from the original to the optimal via the various feedback stages. Such a query can be constructed by modifying the original query with terms from the known relevant documents. It is pointed out that such a term addition strategy differs materially from other approaches that add terms based on term association with all query terms, and calculated from the whole document collection. The effect of viewing a document as constituted of components, and hence affecting the weighting and retreival results of of the optimal query, is also discussed.
TL;DR: Welty et al. as mentioned in this paper found that designers and users of databases think about the same data differently; users expect the user view to be organized in one way, but it is actually organized in a different way.
Abstract: In spite of years of research aimed at improving database query languages (Boyle, Bury & Evey, 1983; Reisner, Boyce & Chamberlin, 1975; Thomas & Gould, 1975; Welty & Stemple, 1981; Welty, 1985), users still make mistakes. They consistently err when writing queries requiring information about two (or more) different entities, e.g., "What are the locations of all products?" Users not only have to learn a query language, but have to learn the structure of their data (called the user view of data) and then match the language with the data (Ogden, 1986). Current results (Smelcer & Mantei, 1984) indicate that designers and users of databases think about the same data differently; users expect the user view to be organized in one way, but it is actually organized in a different way. Database administrators need guidelines for creating user views of data, which enable users to more easily understand the organization of their data, and more easily write queries to access their data.
TL;DR: The thesis examines difficulties that arise in using a relational query language to support advanced information retrieval techniques such as ranking and weighted retrieval, and develops query language extensions that would significantly improve the performance of such searching techniques in a relational setting.
Abstract: This thesis studies the use of relational database systems to construct large, high performance information retrieval systems such as online library catalogs or citation retrieval applications. The major problem areas in relational implementations are query execution costs, poor space utilization, and functionality deficiencies both in query processing and in query languages such as SQL. Analytic and simulation methods are applied to quantify these problems.
Proposals extending earlier work on user-defined operators for relational query languages and accompanying secondary index support allow both efficient query formulation and the definition of space-efficient relational bibliographic databases. When column values follow distributions typical of bibliographic databases (Zipf distributions), a key performance problem is inaccurate selectivity estimation. A framework for incorporating user-defined selectivity estimators into a relational query optimizer is established, and methods are given to construct highly accurate selectivity estimators for bibliographic databases. Relational query optimizer extensions are specified which incorporate query execution plans that use TID list manipulation algorithms for evaluating single-relation queries into the optimizer's vocabulary. With these extensions a relational system can outperform an inverted file retrieval system on bibliographic databases. Also explored are query planner extensions to implement nonmaterialized relations (allowing both partially deferred evaluation of queries and inexpensive iterative query construction) and preexecution identification of queries that will be costly to evaluate or will produce very large results. Both of these features are important for public access information retrieval applications. Finally, the thesis examines difficulties that arise in using a relational query language to support advanced information retrieval techniques such as ranking and weighted retrieval, and develops query language extensions that would significantly improve the performance of such searching techniques in a relational setting.