TL;DR: In this paper, a relational database management system (RDBMS) efficiently evaluates correlated subqueries by decorrelating them and taking care of the so-called SQL count bug and yet avoid using the expensive outer join operation.
Abstract: A relational database management system (RDBMS) efficiently evaluates correlated subqueries by decorrelating them and taking care of the so-called SQL count bug and yet avoid using the expensive outer join operation When there is no tuple match from a correlated subquery, the RDBMS query processor returns a tuple of null(s) from a scalar derived table, and then uses COALESCE function to generate a proper count value of zero from the null The correlation level remains one The query processor also performs a pass-through optimization to eliminate a join operation for floating SELECT operation by removing a join operation involving the magic operation, so that the correlation bindings are received from the correlation source table rather than the magic operation
TL;DR: In this paper, a system and method transform queries with subqueries, using window aggregation, is proposed to reduce the number of times that tables or views are accessed by reducing the computational demands of a query.
Abstract: A system and method transform queries with subqueries, using window aggregation. An optimizer in a relational database management system transforms queries to optimize their efficiency and speed. The method transforms queries that have a subquery, replacing the subquery with a window aggregation function. In the case of a correlated subquery, the window aggregation function is partitioned by a correlated column of a correlated table. All data in the main select clause, or outer block, of the query that was obtained through references to the correlated table is instead obtained through the new window aggregation subquery. By using window aggregation, the aggregation is performed at the same time as the selection of relevant data from the correlated table, thereby compiling all needed data in a single pass through the table or view. Reducing the number of times that tables or views are accessed reduces the computational demands of a query.
TL;DR: This work proposes a new technique to handle some typical correlated queries that makes use of extended window aggregation capabilities, and eliminates redundant access to common tables referenced in the outer query block and the subquery.
Abstract: Database queries often take the form of correlated SQL queries. Correlation refers to the use of values from the outer query block to compute the inner subquery. This is a convenient paradigm for SQL programmers and closely mimics a function invocation paradigm in a typical computer programming language. Queries with correlated subqueries are also often created by SQL generators that translate queries from application domain-specific languages into SQL. Another significant class of queries that use this correlated subquery form is that involving temporal databases using SQL. Performance of these queries is an important consideration particularly in large databases. Several proposals to improve the performance of SQL queries containing correlated subqueries can be found in database literature. One of the main ideas in many of these proposals is to suitably decorrelate the subquery internally to avoid a tuple-at-a-time invocation of the subquery. Magic decorrelation is one method that has been successfully used. Another proposal is to cache the portion of the subquery that is invariant with the changing values of the outer query block. What we propose here is a new technique to handle some typical correlated queries. We go a step further than to simply decorrelate the subquery. By making use of extended window aggregation capabilities, we eliminate redundant access to common tables referenced in the outer query block and the subquery. This technique can be exploited even for non-correlated subqueries. It is possible to get a huge boost in performance for queries that can exploit this technique, which we call WinMagic. This technique was implemented in IBM® DB2® Universal Database" Version 7 and Version 8. In addition to improving DB2 customer queries that contain aggregation subqueries, it has provided significant improvements in a number of TPCH benchmarks that IBM has published since late in 2001.
TL;DR: In this article, a query is analyzed using matching and compensation tests between the query at least one correlated subquery within the query and the automatic summary table to determine whether expressions occurring in the query, but not in the summary table, can be derived using the automated summary table.
Abstract: A method, apparatus, and article of manufacture for optimizing database queries using an automatic summary table. A query is analyzed using matching and compensation tests between the query at least one correlated subquery within the query and the automatic summary table to determine whether expressions occurring in the query, but not in the automatic summary table, can be derived using the automatic summary table. If so, the query is rewritten so that the automatic summary table is used.
TL;DR: This paper develops algorithms for online estimation over subset-based SQL queries, and considers the difficult problem of providing probabilistic accuracy guarantees at all times during query execution.
Abstract: The largest databases in use today are so large that answering a query exactly can take minutes, hours, or even days. One way to address this problem is to make use of approximation algorithms. Previous work on online aggregation has considered how to give online estimates with ever-increasing accuracy for aggregate functions over relational join and selection queries. However, no existing work is applicable to online estimation over subset-based SQL queries-those queries with a correlated subquery linked to an outer query via a NOT EXISTS, NOT IN, EXISTS, or IN clause (other queries such as EXCEPT and INTERSECT can also be seen as subset-based queries). In this paper we develop algorithms for online estimation over such queries, and consider the difficult problem of providing probabilistic accuracy guarantees at all times during query execution.