TL;DR: A new method for eeciently evaluating queries with aggregate functions is presented, which deals not only with relations but with general bulk types like sets, bags, and lists.
Abstract: A new method for eeciently evaluating queries with aggregate functions is presented. More specii-cally, we introduce a class of aggregate queries where traditional query evaluation strategies in general require O(n 2) time and space in the size of the (at most two) input relations. For this class of aggregate queries our approach needs at most O(n log n) time and linear space. Further, our approach deals not only with relations but with general bulk types like sets, bags, and lists.
TL;DR: This work proposes an approach that gives a rigorous formal semantics for dependencies and uses symbolic reasoning to take scheduling decisions, and can form the basis of a programming language for workflows.
Abstract: Workflows are composite multitransaction activities occurring in heterogeneous environments. They relax the semantic properties of traditional transactions to accommodate the demands of such environments. It is important that workflows be specified declaratively, reasoned about formally, and scheduled automatically. Declarative approaches based on intertask dependencies are prominent in the literature. However, extant approaches often lack a formal semantics, or fail to meet other important criteria. Also, they do not carefully distinguish event types from instances, a distinction that is crucial when the constraint that tasks are loop-free is relaxed. We propose an approach that gives a rigorous formal semantics for dependencies and meets the above conditions. Our approach uses algebraic expressions to represent dependencies and uses symbolic reasoning to take scheduling decisions. It can form the basis of a programming language for workflows.
TL;DR: This work answers the above question in a complex object database environment, providing a theoretical framework, including two alternative formalisms, able to express a relevant set of state integrity constraints with a declarative style and extending the kernel formalism into two alternative directions.
Abstract: Integrity constraints are rules that should guarantee the integrity of a database Provided an adequate mechanism to express them is available, the following question arises: is there any way to populate a database which satisfies the constraints supplied by a database designer? That is, does the database schema, including constraints, admit at least a nonempty model? This work answers the above question in a complex object database environment, providing a theoretical framework, including the following ingredients: (1) two alternative formalisms, able to express a relevant set of state integrity constraints with a declarative style; (2) two specialized reasoners, based on the tableaux calculus, able to check the consistency of complex objects database schemata expressed with the two formalisms The proposed formalisms share a common kernel, which supports complex objects and object identifiers, and which allow the expression of acyclic descriptions of: classes, nested relations and views, built up by means of the recursive use of record, quantified set, and object type constructors and by the intersection, union, and complement operators Furthermore, the kernel formalism allows the declarative formulation of typing constraints and integrity rules In order to improve the expressiveness and maintain the decidability of the reasoning activities, we extend the kernel formalism into two alternative directions The first formalism, OLCP, introduces the capability of expressing path relations Because cyclic schemas are extremely useful, we introduce a second formalism, OLCD, with the capability of expressing cyclic descriptions but disallowing the expression of path relations In fact, we show that the reasoning activity in OLCDP (ie, OLCP with cycles) is undecidable
TL;DR: The problem of maintaining recursively-de ned views, such as the transitive closure of a relation, in traditional relational languages that do not have recursion mechanisms is studied, and it is demonstrated that a number of recursive queries cannot be maintained in an SQL-like language.
Abstract: We study the problem of maintaining recursively-de ned views, such as the transitive closure of a relation, in traditional relational languages that do not have recursion mechanisms. In particular, we show that the transitive closure cannot be maintained in relational calculus under deletion of edges. We use new proof techniques to show this result. These proof techniques generalize to other languages, for example, to the language for nested relations that also contains a number of aggregate functions. Such a language is considered in this paper as a theoretical reconstruction of SQL. Our proof techniques also generalize to other recursive queries. Consequently, we show that a number of recursive queries cannot be maintained in an SQL-like language. We show that this continues to be true in the presence of certain auxiliary relations. We also relate the complexity of updating transitive closure to that of updating the same-generation query and show that the latter is strictly harder than the former. Then we extend this result to that of updating queries based on context-free sets.
TL;DR: A formal framework for physical database design that automates the query translation process and is used for generating an efficient query transformer that translates logical queries into programs that manipulate the physical database.
Abstract: Physical design for object-oriented databases is still in its infancy. Implementation decisions often intrude into the conceptual design (such as inverse links and object decomposition). Furthermore, query optimizers do not always take full advantage of physical design information. This paper proposes a formal framework for physical database design that automates the query translation process. In this framework, the physical database design is specified in a declarative manner. This specification is used for generating an efficient query transformer that translates logical queries into programs that manipulate the physical database. Alternative access paths to physical data are captured as simple rewrite rules that are used for generating alternative plans for a query.
TL;DR: The Heraclitus[OO] (abbreviated H2O) DBPL is presented, which provides a syntax and semantics for working with deltas in the context of object-oriented databases, and a semantically based notion of potential conflict between proposed updates is developed.
Abstract: In the Heraclitus paradigm, a delta value or more simply, delta , is a concrete value that corresponds to a difference between database states. This paper presents the Heraclitus[OO] (abbreviated H2O) DBPL, which provides a syntax and semantics for working with deltas in the context of object-oriented databases. The paper also considers the use of deltas in connectionwith detecting conflict between pairs of proposed updates to a database. This is useful in contextswhere multiple users are each creating and choosing betweenmultiple possible updates. A semantically based notion of potential conflict between proposed updates is developed, along with several conservative approximations based on the use of different kinds of delta.
TL;DR: Object-oriented databases have brought major improvements in data modeling by introducing notions such as inheritance or methods, but it is shown more modestly that many of these features can be formally and cleanly combined in a coherent manner.
Abstract: Object-oriented databases have brought major improvements in data modeling by introducing notions such as inheritance or methods. Extensions in many directions are now considered with introductions of many concepts such as versions, views or roles. These features bring the risk of creating monster data models with a number of incompatible appendixes. We do not propose here any new extension or any novel concept. We show more modestly that many of these features can be formally and (we believe) cleanly combined in a coherent manner.
TL;DR: The main idea is to use safe recursion to control and limit unsafe recursion, and the definition of a finite form of recursion called domain bounded recursion is presented, which is a characterization of its complexity and expressive power.
Abstract: This paper develops a query language for sequence databases, such as genome databases and text databases. Unlike relational data, queries over sequential data can easily produce infinite answer sets, since the universe of sequences is infinite, even for a finite alphabet. The challenge is to develop query languages that are both highly expressive and finite. This paper develops such a language. It is a subset of a recently developed logic called Sequence Datalog [19]. SequenceDatalog distinguishes syntactically between subsequence extraction and sequence construction . Extraction creates sequences of bounded length, and leads to safe recursion; while construction can create sequences of arbitrary length, and leads to unsafe recursion. In this paper, we develop syntactic restrictions for Sequence Datalog that allow sequence construction but preserve finiteness. The main idea is to use safe recursion to control and limit unsafe recursion. The main results are the definition of a finite form of recursion, called domain bounded recursion , and a characterization of its complexity and expressive power. Although finite, the resulting class of programs is highly expressive, since its data complexity is complete for the elementary functions.
TL;DR: It is shown that a system of keys satisfying certain restrictions provides us with an efficient means of comparing values, while avoiding the need to compare object identities directly, and that systems of keys give rise to observational distinguishability relations which lie between these two extremes.
Abstract: We will examine the problem of distinguishing between database instances and values in models which incorporate object-identities and recursive data-structures. We will show that the notion of observational distinguishability is intricately linked to the languages available for querying a database. In particular we will show that, given a simple query language incorporating a test for equality of object-identities, database instances are indistinguishable if they are isomorphic, and that, in a language without any operators on objectidentities, database instances are indistinguishable if a bisimilarity relation holds between them. Further, such a bisimulation relation may be computed on values, but doing so requires the ability to recurse over all the object-identities in an instance. We will then show that systems of keys give rise to observational distinguishability relations which lie between these two extremes. We show that a system of keys satisfying certain restrictions provides us with an efficient means of comparing values, while avoiding the need to compare object identities directly. Comments University of Pennsylvania Department of Computer and Information Science Technical Report No. MSCIS-95-20. This technical report is available at ScholarlyCommons: http://repository.upenn.edu/cis_reports/227 Observational Distinguishability of Databases with Object Identity ' MS-CIS-95-20 Logic and Computation 91
TL;DR: It is shown that although queries admit low data complexity (NC), their naive evaluation is rather naive, and optimization strategies used for relational queries may be inadequate.
Abstract: We consider rational linear constraint databases, and study the problem of evaluating e ciently rstorder queries with linear constraints. We show that although queries admit low data complexity (NC), their naive evaluation is rather ine cient. The computation paradigm is based on relational algebra and constraint solving. We focus on the former, and show that the query processing should di er drastically from classical relational databases. In particular, optimization strategies used for relational queries may be inadequate. We describe the problem and propose some preliminary optimization principles.
TL;DR: An analysis of proposals to overcome the limitations of commercially available object-oriented DBMSs' inability to deal with objects that may change their type during their life and which exhibit a plurality of behaviours is made.
Abstract: One of the limitations of commercially available object-oriented DBMSs is their inability to deal with objects that may change their type during their life and which exhibit a plurality of behaviours. Several proposals have been made to overcome this limitation. An analysis of these proposals is made to show the impact of more general modelling functionalities on the object implementation technique.
TL;DR: A very thin insulator having a dipole structure such as a dielectric material having ferroelectric properties and, preferably, also having thermodielectric properties, is used as the insulator insulating an electrode of an electrode pair from a semiconductor body sandwiched between said electrode pair.
TL;DR: It is shown that equality predicates assume two roles with respect to sets, which are distinguishable between set members and implicitly give meaning to standard set properties such as set equality.
Abstract: The AQUA [16] query algebra allows user-defined equivalence relations as arguments to query operators that generalize standard set operations. These predicates determine what objects are included in the query result, and the duplicates that must be removed. While an expressive enhancement, the use of arbitrary equivalence relations to decide set membership can result in sets with counterintuitive behavior, and therefore can make queries return unreasonable results. In this paper, we show that equality predicates assume two roles with respect to sets. Distinguishers differentiate between set members and implicitly give meaning to standard set properties such as set equality. Constructors determine which object from input sets contribute to the query result. The requirements of distinguishers and constructors differ. AQUA’s set operators are problematic because they use constructors where distinguishers are required. We propose alternatives to AQUA’s set operators that address this limitation.
TL;DR: A new practical and easily implementable technique for partial normalization is described, and it is demonstrated that with very little extra added to the language, one can express a variety of primitives using just one general polynomial-space iterator.
Abstract: We study the problem of choosing a suitable collection of primitives for querying databases with disjunctive information. Theoretical foundations for querying such databases have been developed in [11, 12]. The main tool for querying disjunctive information has come to be known under the name of normalization. In this paper we show how these theoretical results can lead to practical languages for querying databases with disjunctive information. We discuss a collection of primitives that one may want to add to a language in order to be able to ask a variety of queries over incomplete databases (including existential and optimization queries). We describe a new practical and easily implementable technique for partial normalization, and show how to combine it with the known technique for space-e cient normalization. As the result, we demonstrate that with very little extra added to the language, one can express a variety of primitives using just one general polynomial-space iterator. We discuss some practical implications, including nondeterminism of the resulting language, and the implementation project.
TL;DR: In this paper, the authors highlight the potential for significant impact in the field of business process programming, a field that is still in its infancy and where there is the potential to have significant impact.
Abstract: Over the past decade, database programming has focused on languages and methodologies for developing data-intensive applications, and on techniques for efficiently implementing them. Research in this area has yielded languages with rich type systems, orthogonal persistence, support for collection types, and other advanced features. The commercial impact of this research, however, has been rather limited. Database applications are a multi-billion dollar business worldwide, but most applications are still developed using rudimentary tools. The goal of this paper is to urge the database programming research community to rise to the next challenge: business process programming, a field that is in its infancy, and where there is the potential for significant impact. Competitive pressures are forcing enterprises to re-engineer and automate their business processes. Such processes are of long duration, complex, and error-prone. They consist of multiple steps that need to be coordinated; that may be executed by application programs, machines, or humans in different roles across different organizations; and that may execute in heterogeneous, distributed computing environments. This paper introduces topics in business process programming that we believe will be of interest to the database programming community. These range from issues in modeling and specifying business processes, including new transactional models for capturing their execution semantics, to problems of efficient, scalable implementation in the heterogeneous distributing environments prevalent in typical enterprises.
TL;DR: This paper proposes a database manipulation interface for the statically typed, purely functional programming language Haskell that permits on-the-fly dereference during query construction, and allows for straightforward implementation of lazy retrieval in strict state-transition sequences.
Abstract: This paper proposes a database manipulation interface for the statically typed, purely functional programming language Haskell. The data model uses surrogates to permit direct update of stored objects, and the basic interface is designed based on the state-transformer approach, so that the interface is referentially transparent. This approach requires all the operations to be executed in a single state-transition sequence and thus tends to make queries more imperative than expected. The proposed approach lessens this burden on query construction, by using versioning. Versions can be “frozen” or locked, and a set of locked versions can be supplied as an argument to query operations. This intraprogram versioning permits on-the-fly dereference during query construction, and allows for straightforward implementation of lazy retrieval in strict state-transition sequences.
TL;DR: This paper proposes three techniques for improving the execution of object-oriented database queries: reuse/out of order execution, memoization, and buffer replacement policy and introduces schedule level optimization as a framework for integrating these techniques into query processing systems.
Abstract: Query facilities in object-oriented databases lag behind their relational counterparts in performance. This paper identifies important sources of that performance difference, the random I/O problem and the re-reading problem. We propose three techniques for improving the execution of object-oriented database queries: reuse/out of order execution, memoization, and buffer replacement policy. Schedule level optimization is introduced as our framework for integrating these techniques into query processing systems.
TL;DR: A formal framework that can be used to specify and study a number of different semantics for rule execution in active databases, based on a generic active rule language and relies on a transaction rewriting technique.
Abstract: We present a formal framework that can be used to specify and study a number of different semantics for rule execution in active databases. We shall consider the core of several active rule languages that are already available (e.g., Ariel, Starburst and HiPAC) but whose rule execution is specified only by informal descriptions. The framework is based on a generic active rule language and relies on a transaction rewriting technique. This technique takes a user defined transaction, which is viewed as a sequence of basic database updates forming a semantic unit, and translates it into a new transaction that explicitly includes the additional updates due to active rule triggering. We show that this framework provides a basis for the theoretical analysis and the comparison of different execution models of active rules. Moreover, it allows us to formally investigate a number of important issues related to active rule processing, such as transaction equivalence, confluence and optimization, independently of a specific rule execution model.
TL;DR: This paper investigates union-types in object oriented IQL-like schemas and presents an algorithm for detecting schemas that define types with a bounded number of values and an algorithm that verifies whether in a schema the type of a subclass specifies options that are forbidden by its superclasses.
Abstract: In this paper we investigate union-types in object oriented IQL-like schemas. These types can be used to model null values, variant types and generalization classes. They make, however, deciding equivalence and subtyping more difficult. We will show that the complexity of these two problems is co-NP-complete and present complete sets of rules for deciding both problems. The combination of union-types and multiple inheritance makes it also harder to detect typing-conflicts in a schema. We will give an algorithm for deciding this and discuss its complexity. Furthermore, we will present an algorithm for detecting schemas that define types with a bounded number of values. Finally, an algorithm will be presented that verifies whether in a schema the type of a subclass specifies options that are forbidden by its superclasses.
TL;DR: The Tycoon approach to scale the successful notion of a uniform, type-safe persistent object store to communication-intensive applications and applications where long-term activities are allowed to span multiple autonomous network sites is described.
Abstract: We describe the Tycoon approach to scale the successful notion of a uniform, type-safe persistent object store to communication-intensive applications and applications where long-term activities are allowed to span multiple autonomous network sites. Exploiting stream-based data, code and thread exchange primitives we present several distributed programming idioms in Tycoon. These programming patterns range from client-server communication based on polymorphic higher-order remote procedure calls to migrating autonomous agents that are bound dynamically to network resources present at individual network nodes. Following Tycoon’ s add-on approach, these idioms are not cast into built-in syntactic forms, but are expressed by characteristic programming patterns exploiting communication primitives encapsulated by library functions. Moreover, we present a novel form of binding support for ubiquitous resources which drastically reduces communication trafficfor modular distributed applications.
TL;DR: A labelled tree model of data is introduced and various programming structures for querying and transforming such data are investigated and various restrictions of structural recursion that give rise to well-de ned queries even when the input data contains cycles are considered.
Abstract: We investigate languages for querying and transforming unstructured data by which we mean languages than can be used without knowledge of the structure (schema) of the database. There are two reasons for wanting to do this. First, some data models have emerged in which the schema is either completely absent or only provides weak constraints on the data. Second, it is sometimes convenient, for the purposes of browsing, to query the database without reference to the schema. For example one may want to “grep” all character strings in the database, or one might want to find the information associated with a certain field name no matter where it occurs in the database. This paper introduces a labelled tree model of data and investigates various programming structures for querying and transforming such data. In particular, it considers various restrictions of structural recursion that give rise to well-de ned queries even when the input data contains cycles. It also discusses issues of observable equivalence of such structures.