TL;DR: These methods are presented in the framework of a general query evaluation procedure using the relational calculus representation of queries, and nonstandard query optimization issues such as higher level query evaluation, query optimization in distributed databases, and use of database machines are addressed.
Abstract: Efficient methods of processing unanticipated queries are a crucial prerequisite for the success of generalized database management systems. A wide variety of approaches to improve the performance of query evaluation algorithms have been proposed: logic-based and semantic transformations, fast implementations of basic operations, and combinatorial or heuristic algorithms for generating alternative access plans and choosing among them. These methods are presented in the framework of a general query evaluation procedure using the relational calculus representation of queries. In addition, nonstandard query optimization issues such as higher level query evaluation, query optimization in distributed databases, and use of database machines are addressed. The focus, however, is on query optimization in centralized database systems.
TL;DR: A metric is introduced for measuring the complexity of a query and also a proposal that a sentence be translated into the least complex query which "satisfies" the sentence.
Abstract: The query inference problem is to translate a sentence of a query language into an unambiguous representation of a query. A query is represented as an expression over a set of query trees. A metric is introduced for measuring the complexity of a query and also a proposal that a sentence be translated into the least complex query which “satisfies” the sentence. This method of query inference can be used to resolve ambiguous sentences and leads to easier formulation of sentences.
TL;DR: In this article, an algorithm is presented which decides whether or not a query involving join clauses with relational operators { } is a tree query, so that the sequences of semi-joins to answer the query can immediately be determined.
Abstract: In processing distributed relational database queries, the cost of communication between sites is the dominant cost factor. It is generally agreed that the amount of data transferred determines this communication cost to a large extent. Thus it is desirable to minimize the amount of transmitted data.Semi-join is a relational database operator which can be utilized to reduce the amount of data transmission in processing distributed queries. A class of queries, called tree queries, can always be answered using semi-joins. In this paper an algorithm is presented which decides whether or not a query involvingjoin clauses with relational operators { } is a tree query. For any tree query the algorithm also produces a tree query graph, so that the sequences of semi-joins to answer the query can immediately be determined.
TL;DR: The model presents a synthesis of concepts from retrospective and current awareness retrieval systems, employing the user profile as a factor in interpreting a query, and is expected that this will provide a more personalized response to queries.
Abstract: We describe a theoretical model and an on-going series of experiments aimed at a priori query enhancement The model presents a synthesis of concepts from retrospective and current awareness retrieval systems, employing the user profile as a factor in interpreting a query It is expected that this will provide a more personalized response to queries
TL;DR: It is shown how the hierarchy and geometry can interact to improve query processing, and how knowledge about the behavior of attributes stored in the hierarchy can be used to choose appropriate levels of detail for query output.
Abstract: Queries to part hierarchies in CAD/CAM databases (and structures with similar semantics found in other applications) can be fundamentally different from queries to sets of parts having no underlying structure, and they can raise dif f icult issues in query language behavior and data management. Including explicit geometric informat ion further complicates query processing by adding computational geometry to the list of is sues to be considered. In this paper, we investigate the problems of querying part hierarchies and of using the special semantics associated with such structures to improve query performance and responsiveness to user requirements. In particular, we show how the hierarchy and geometry can interact to improve query processing, and how knowledge about the behavior of attributes stored in the hierarchy can be used to choose appropriate levels of detail for query output. -his research is funded by General Dynamics, Data Systems Division. Permission to copy without fee all or part of this material is granted provided that the copies are not made or diwibuted for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires (I fee and/or special permission from the Endowment. Proceedings of the Tenth International Conference on Very Large Data Bases. 1. Information Model A basic data structure frequently found in databases for computer aided design and computer aided manufacturing (CAD/CAM) applications is some form of bill-of-materials (BOM). This structure defines the component parts and subassemblies that make up each product being designed or manufactured (an assembly is a part that is composed of other parts). The classic BOM structure has a schema of the form: ------------------------PART 1 PARTID 1 COST I . Moreover, such structures are not confined to CAD/CAM applications. Examples of other applications exhibiting similar structures include: Singapore, August, 1984
TL;DR: This paper develops linear-time solutions to the query optimization problem for selection and projection queries, and then extends these solutions to provide heuristics for joins and conjunctive queries.
Abstract: A multidatabase system provides a logically integrated view of existing, possibly inconsistent, databases. Logical integration is achieved primarily through the use of generalization, which can be modelled algebraically as a sequence of outerjoin and aggregation operations. Conventional distributed query processing techniques are inadequate for processing queries over views defined by outerjoins and aggregates. In a conventional distributed database system, selections and projections are inexpensive to process; hence joins have been the rocus of most previous research. In a multidatabase system, however, even selections and projections can be as expensive as joins. The semiouterjoin operation can potentially reduce query processing costs. In general, there may be many different strategies based on semiouterjoins for processing a given query. The query optimization problem is to choose the most profitable of these strategies. This paper studies the query optimization problem for selection and projection queries. It develops linear-time solutions to the problem, and then extends these solutions to provide heuristics for joins and conjunctive queries.
TL;DR: This work first presents elementary nested loop and relational algebra algorithms for query execution and point out some opportunities for improving their performance, then presents optimization strategies, structured in query transformation techniques and access planning methods.
Abstract: Query processing in databases can be divided into two steps:selecting an 'optimal' evaluation strategy, and executing it. We first presentelementary nested loop and relational algebra algorithms for query execution andpoint out some opportunities for improving their performance. A survey ofoptimization strategies, structured in query transformation techniques andaccess planning methods, follows. Finally, extensions for special-purpose querysystems are briefly addressed.
TL;DR: The method integrates the results of existing human factors studies and provides a structured framework for future research for selecting query languages suitable for user types.
Abstract: A methodology is presented for selecting query languages suitable forcertain user types. The method is based on a trend model of querylanguage development on the dimensions of functional capabilities andusability. Expected developments are exemplified by the descriptionof "second generationA¢Â¬Â? database query languages. From the trend modelare derived: a classification scheme for query languages; acriterion hierarchy for query language evaluation; a comprehensiveclassification scheme of query language users and their requirements;and recommendations for allocating language classes to user types.The method integrates the results of existing human factors studiesand provides a structured framework for future research.
TL;DR: The use of an universal symbol and tree manipulation system to perform query translation, decomposition and optimization is described in the paper.
Abstract: A database management system designed for instructional use should offer facilities usually not required in a commercial environment. One of the most important features desirable in such a system is its ability to perform query transformation. The use of an universal symbol and tree manipulation system to perform query translation, decomposition and optimization is described in the paper. Examples of transformation rules required to translate SQL expressions into equivalent QUEL expressions, decompose SQL expressions into parse trees and perform optimization of expressions based on relational algebra are shown.An experimental relational DBMS using the above approach is currently under development at the University of Houston. It supports various nonprocedural query languages within a single system, using a unified database dictionary. Cross-translation between various query languages is allowed. The results of every important phase of the query transformation during its execution are interactively available to the system user.
TL;DR: A query processing strategy for personal computers that requires at most a single sequential scan of the database for nearly all queries and is applicable to any database management system which has a large amount of available main memory.
Abstract: We present a query processing strategy for personal computers that requires at most a single sequential scan of the database for nearly all queries. On personal computers, most queries are ad-hoe, produce little output, and operate on small databases limited by secondary storage. For these queries we can use the relatively large amount of main memory to offset the slow secondary storage accesses. This is our intuitive motivation for the two-step query processing strategy which we present in this paper. In the first step we use a reduction scheme to find, for a query, a subset of the database which can fit into main memory. This step requires at most a single sequential scan of the database. In the second step we compute the answer to the query without further access to secondary storage. Since traditional query processing strategies are nonlinear in secondary storage access, we contend that our strategy is superior for nearly all queries; for the remainder, our strategy degrades gracefully. Even though we use the example of query processing on personal computers throughout this paper, the strategy we present is general, and applicable to any database management system which has a large amount of available main memory.
TL;DR: In this article, a linear query for accessing a relation data base in computer storage is synthesized from a graphic query input at a user terminal, where the graphic query may be one of a combined print query, a target print query or a delete query.
Abstract: A linear query for accessing a relation data base in computer storage is synthesized from a graphic query input at a user terminal. The graphic query may be one of a combined print query, a target print query, an insert query, a delete query, or an update query. According to one embodiment, the linearquery is expressed in Structured Query Language (SQL) syntax, and the graphic query in Query By Example (QBE) syntax. responsive to a QBE combined print query or target print query, an SQL select query is generated (170-190) comprising the UNION of one or more generated select statements. Responsive to a QBE delete query, an SQL delete query is generated (192-202) from the logical OR of generated condition statements including an outer query DELETE and a SELECT* subquery. Responsive to a QBE update query, an SQL update query is generated (204-214) including a SET clause and any generated WHERE clause and subquery. Responsive to a QBE insert query, an SQL insert query is generated (216-226) to include an INSERT statement and any generated SELECT statement.
TL;DR: The optimization method based mainly on the transformation of algebraic expressions, the optimization method of query decomposition, the opti-mization method in processing directly multi-variable queries and the optimizationmethod of conjunctive queries are presented.
Abstract: Query optimization is an important problem which should be considered in query processingThe problem of query optimization in a relational database system is discussed in this paperThe optimization method based mainly on the transformationsof algebraic expressions,the optimization method of query decomposition,the opti-mization method in processing directly multi-variable queries and the optimizationmethod of conjunctive queries are presentedAt last we make a remark upon these fourtypes of typical optimization methods
TL;DR: An adaptive algorithm based on a general model without committing to any specific query processing algorithm is introduced and an adaptive algorithm is proposed for allocating files to sites in distributed database systems.
Abstract: In this thesis, we investigate the problem of allocating files to sites in distributed database systems. A general model without committing to any specific query processing algorithm is introduced. Based on the model, an adaptive algorithm is proposed. The file allocation problem in two different distributed database environments using different classes of query processing algorithms and the application of the algorithm to the problem are presented in detail.
The algorithm has many desirable properties. They are given as follows. (1) Query statistics need not be collected in advance. Instead, each time a query is processed, useful information is extracted by the algorithm. (2) Estimation is used minimally. This assures that cost computation is accurate. (3) Sudden changes in users' access pattern can be easily detected by the algorithm.
Other properties regarding accuracy and convergence of the algorithm are derived. Experimental results are provided to justify the usefulness of the algorithm.
The application of the same approach to the record clustering problem in heirarchical databases is also discussed in this thesis.
TL;DR: In this study, two recently proposed automatic methods for relevance feedback of Boolean queries are evaluated and conclusions are drawn concerning the use of effective feedback methods in a Boolean query environment.
Abstract: The relevance feedback process uses information derived from an initially retrieved set of documents to improve subsequent search formulations and retrieval output. In a Boolean query environment this implies that new query terms must be identified and Boolean operators must be chosen automatically to connect the various query terms. In this study, two recently proposed automatic methods for relevance feedback of Boolean queries are evaluated and conclusions are drawn concerning the use of effective feedback methods in a Boolean query environment.