TL;DR: A model and a query language are introduced to establish a theoretical basis for multi-dimensional data analysis based on the notions of dimension and f-table and compared with other approaches.
Abstract: Multidimensional databases are large collections of data, often historical, used for sophisticated analysis oriented to decision making. This activity is supported by an emerging category of software technology, called On-Line Analytical Processing (OLAP). In spite of a lot of commercial tools already available, a fundamental study for OLAP systems is still lacking. In this paper we introduce a model and a query language to establish a theoretical basis for multi-dimensional data. The model is based on the notions of dimension and f-table. Dimensions are linguistic categories corresponding to different ways of looking at the information. F-tables are the constructs used to represent factual data, and are the logical counterpart of multi-dimensional arrays, the way in which current analytical tools store data. The query language is a calculus for f-tables, and as such it offers a high-level support to multi-dimensional data analysis. Scalar and aggregate functions can be embedded in calculus expressions in a natural way. We discuss on conceptual problems related with the design of multidimensional query languages, and compare our model and language with other approaches.
TL;DR: This paper discusses a broader class of view definitions-materialized views defined over a nested data model such as the nested relational model or an object-oriented data model that simplifies data modeling and gives more flexibility.
Abstract: Previous research on materialized views has primarily been in the context of flat relational databases—materialized views defined in terms of one or more flat relations. This paper discusses a broader class of view definitions-materialized views defined over a nested data model such as the nested relational model or an object-oriented data model. An attribute of a tuple deriving the view can be a reference (i.e., a pointer) to a nested relation, with arbitrary levels of nesting possible. The extended capability of this nested data model, together with materialized views, simplifies data modeling and gives more flexibility.
TL;DR: This paper presents a first prototype of a constraint database for spatial information, dedale, implemented on top of the O2 DBMS, with special functions for constraint solving and geometric operations.
Abstract: This paper presents a first prototype of a constraint database for spatial information, dedale. Implemented on top of the O2 DBMS, data is stored in an object-oriented framework, with spatial data represented using linear constraints over a dense domain. The query language is the standard OQL, with special functions for constraint solving and geometric operations.
TL;DR: It is shown that relatively simple hybrid languages are able to define all finite structures expressed by skolemized universally quantified second-order formulae with some constraints on the quantified predicates.
Abstract: Recently there was some attention on integration of description logics of the AL-family with rule-based languages for querying relational databases such as Datalog, so as to achieve the best characteristics of both kinds of formalisms in a common framework Formal analysis on such hybrid languages has been limited to computational complexity: ie, how much time/space it is needed to answer to a specific query? This paper carries out a different formal analysis, the one dealing with expressiveness, which gives precise characterization of the concepts definable as queries We first analyze the applicability to hybrid languages of formal tools developed for characterizing the expressive power of relational query languages We then present some preliminary results on the expressiveness of hybrid languages In particular, we show that relatively simple hybrid languages are able to define all finite structures expressed by skolemized universally quantified second-order formulae with some constraints on the quantified predicates
TL;DR: A compiletime approach is given that provides for a significant reduction of the amount of run-time transaction overhead due to integrity constraint checking in a manner analogous to the one used by Sheard and Stemple (1989) for the relational data model.
Abstract: In the context of the object-oriented data model, a compiletime approach is given that provides for a significant reduction of the amount of run-time transaction overhead due to integrity constraint checking. The higher-order logic Isabelle theorem prover is used to automatically prove which constraints might, or might not be violated by a given transaction in a manner analogous to the one used by Sheard and Stemple (1989) for the relational data model. A prototype transaction verification tool has been implemented, which automates the semantic mappings and generates proof goals for Isabelle. Test results are discussed to illustrate the effectiveness of our approach.
TL;DR: It is shown that recursive queries such as transitive closure, and “alternating paths” can be incrementally maintained in a nested relational language, when some auxiliary relations are allowed.
Abstract: We examine the power of incremental evaluation systems that use an SQL-like language for maintaining recursively-defined views. We show that recursive queries such as transitive closure, and “alternating paths” can be incrementally maintained in a nested relational language, when some auxiliary relations are allowed. In the presence of aggregate functions, even more queries can be maintained, for example, the “same generation” query. In contrast, it is still an open problem whether such queries are maintainable in relational calculus. We then restrict the language so that no nested relations are involved (but we keep the aggregate functions). Such a language captures the capability of most practical relational database systems. We prove that this restriction does not reduce the incremental computational power; that is, any query that can be maintained in a nested language with aggregates, is still maintainable using only fiat relations. We also show that one does not need auxiliary relations of arity more than 2. In particular, this implies that the recursive queries maintainable in the nested language with aggregates, can be also maintained in a practical relational database systems using auxiliary tables of arity at most 2. This is again in sharp contrast to maintenance in relational calculus, which admits a strict arity-based hierarchy.
TL;DR: This paper tells the story of the work on expressive power of relational languages with aggregate functions and proves by far the most powerful result that describes the expressiveness of such languages.
Abstract: It is a folk result that relational algebra or calculus extended with aggregate functions cannot compute the transitive closure. However, proving folk results is sometimes a nontrivial task. In this paper, we tell the story of the work on expressive power of relational languages with aggregate functions. We also prove by far the most powerful result that describes the expressiveness of such languages. There are four main features of our result that distinguish it from previous ones:
1.
It does not rely on any unproven assumptions, such as separation of complexity classes.
2.
It establishes a general property of queries definable with the help of aggregate functions. This property can easily be applied to prove many expressiveness bounds.
3.
The class of aggregate functions is much larger than any previously considered.
4.
The proof is “non-syntactic.” That is, it does not depend on a specific syntax chosen for the language with aggregates.
TL;DR: Several researchers have considered integrating multiple unstructured, semi-structuring, and structured data sources by modeling all sources as edge labeled graphs, which is self-describing and dynamically typed, and captures both schema and data information.
Abstract: Several researchers have considered integrating multiple unstructured, semi-structured, and structured data sources by modeling all sources as edge labeled graphs. Data in this model is self-describing and dynamically typed, and captures both schema and data information. The labels are arbitrary atomic values, such as strings, integers, reals, etc., and the integrated data graph is stored in a unique data repository, as a relation of edges. The relation is dynamically typed, i.e. each edge label is tagged with its type.
TL;DR: Transactional Datalog is developed, a deductive language that integrates queries, updates, and transaction composition in a simple logical framework that extends the deductive-database paradigm with several new capabilities.
Abstract: In the classical model of database transactions, large transactions cannot be built out of smaller ones. Instead, transactions are modelled as atomic and isolated units of work. This model has been widely successful in traditional database applications, in which transactions perform only a few simple operations on small amounts of simply-structured data. Unfortunately, this model is inappropriate for more complex applications in which transactions must be combined and coordinated to achieve a larger goal. Examples include CAD, office automation, collaborative work, manufacturing control, and workflow management. These applications require new transaction models, new methods of transaction management, and new transaction languages. This paper focuses on the latter issue: languages for specifying non-classical transactions, and combining them into complex processes. In particular, we develop Transaction Datalog, a deductive language that integrates queries, updates, and transaction composition in a simple logical framework. This integration extends the deductive-database paradigm with several new capabilities. For instance, Transaction Datalog supports all the properties of classical transactions, such as persistence, atomicity, isolation, abort and rollback. It also supports properties found in many new transaction models, such as subtransaction hierarchies, concurrency within individual transactions, cooperation between concurrent activities, a separation of atomicity and isolation, and fine-grained control over abort and rollback. These capabilities are all provided within a purely logical framework, including a natural model theory and a sound-and-complete proof theory. This paper outlines the problems of developing a compositional transaction language, illustrates our solution (Transaction Datalog) through a series of examples, and develops its formal semantics in terms of a logical inference system.
TL;DR: VQL, a language devoted to querying data stored in multiversion databases, is proposed, which is based on a first order calculus and provides users with the ability of navigating through object versions, and through the states of the universe modeled by the multIVERSion database.
Abstract: In this paper VQL, a language devoted to querying data stored in multiversion databases, is proposed A multiversion database represents several states of the modeled universe A formal model of such a database is presented VQL, which is based on a first order calculus, provides users with the ability of navigating through object versions, and through the states of the universe modeled by the multiversion database
TL;DR: It is proved that this constrained matching guarantees type safe substitutability even in situations where matching alone would not, and can capture subtleties that go far beyond the level of expressiveness of object-oriented type systems.
Abstract: Temporally constrained matching in a persistent and declarative object-oriented system is introduced as a semantic alternative to the existing approaches to the covariance/contravariance problem. While the existing object-oriented type systems are based on subtyping, F-bounded polymorphism and matching, this language system is based entirely on inheritance, which is identified with matching. The type of matching used in this paper relies on the temporal constraint system. We prove that this constrained matching guarantees type safe substitutability even in situations where matching alone would not. This is possible only because the underlying formal system is semantically much richer than the paradigms of type systems. Its temporal constraint system can capture subtleties that go far beyond the level of expressiveness of object-oriented type systems. The temporal nature of the language and its distinctive orthogonal model of persistence make this language system successful in handling a variety of non-trivial applications.
TL;DR: The research presented in this paper is situated in the framework of constraint databases that was introduced by Kanellakis, Kuper, and Revesz in their seminal paper of 1990 and is based on a consequence of a classical result by Tarski.
Abstract: The research presented in this paper is situated in the framework of constraint databases that was introduced by Kanellakis, Kuper, and Revesz in their seminal paper of 1990. In this area, databases and query languages are defined using read polynomial constraints. As a consequence of a classical result by Tarski, first-order queries in the constraint database model are effectively computable, and their result is within the constraint model.
TL;DR: This paper motivates Business Conversations as a system model suitable for the description of human-human, human-software as well as software-software cooperation, and explains why this model be more suitablefor the description o organizational cooperative work than software-centered object models.
Abstract: In this paper we introduce Business Conversations as a highlevel software structuring concept for distributed systems where multiple autonomous agents (possibly in different organizational units) have to coordinate their long-term activities towards the fulfillment of a cooperative task. We first motivate Business Conversations as a system model suitable for the description of human-human, human-software as well as software-software cooperation. We then explain why we consider this model be more suitable for the description o organizational cooperative work than software-centered object models. The core concepts of the Business Conversation model axe described using an object-oriented model. Finally, we report on our experience gained building a prototypical agent programming framework with Business Conversations for agent coordination based on mobile and persistent threads as provided by the persistent programming language Tycoon.
TL;DR: This work presents a formal data model for views in Object DataBase Systems (ODBS) that relaxes the usual coustraiul, where an object belongs to a single class while using a generalization of referent and eatables a deterministic creation of derived objects with complex object identifiers.
Abstract: We present a formal data model for views in Object DataBase Systems (ODBS) as a transformation mechanism for databases Our model relaxes the usual coustraiul, where an object belongs to a single class while using a generalization of referent and eatables a deterministic creation of derived objects with complex object identifiers We define an IQL-like language which enables the manipulation of such referents The view-based transformation is achieved in two steps: an extension of the source instance followed by a projection of the extended instance The extension and projection can be carried ont using four object algebraic operators, namely projection, join-specialization, join and generalization, that specify both the virtual sclaetna and its corresponding virtual instance This simple algebra call express most of the view operators proposed in the literature a ad provides a real restructuring of the source schema and instance
TL;DR: A ‘fold’ operator φ over collection types is defined in terms of which operations such as selection, projection, join and group-by can be defined, as well as aggregation functions such as sum, max and min.
Abstract: This paper investigates the optimisation of aggregation functions in the context of computationally complete database programming languages and aims to generalise and provide a unifying formal foundation for previous work. We define a ‘fold’ operator φ over collection types in terms of which operations such as selection, projection, join and group-by can be defined, as well as aggregation functions such as sum, max and min. We introduce two equivalences for φ which respectively govern the commuting and coalescing of applications of φ. From these two equivalences we then formally derive equivalences governing the commuting and coalescing of iteration operations over collections, the mapping of aggregation functions over grouped collections, the introduction and elimination of aggregation functions, and the promotion of aggregation functions through iteration operations. We also show how some of these equivalences can be used to optimise comprehensions, a high-level query construct supported by many database languages.
TL;DR: Transducer Datalog as discussed by the authors is a query language based on the generalized sequence transducer model, which allows transducers to invoke other transducers as subroutines.
Abstract: This paper develops a database query language called Transducer Datalog motivated by the needs of a new and emerging class of database applications. In these applications, such as text databases and genome databases, the storage and manipulation of long character sequences is a crucial feature. The issues involved in managing this kind of data are not addressed by traditional database systems, either in theory or in practice. To address these issues, in recent work, we introduced a new machine model called a generalized sequence transducer. These generalized transducers extend ordinary transducers by allowing them to invoke other transducers as “subroutines.” This paper establishes the computational properties of Transducer Datalog, a query language based on this new machine model. In the process, we develop a hierarchy of time-complexity classes based on the Ackermann function. The lower levels of this hierarchy correspond to well-known complexity classes, such as polynomial time and hyper-exponential time. We establish a tight relationship between levels in this hierarchy and the depth of subroutine calls within Transducer Datalog programs. Finally, we show that Transducer Datalog programs of arbitrary depth express exactly the sequence functions computable in primitive-recursive time.
TL;DR: This paper establishes the computational properties of Transducer Datalog, a query language based on a new machine model called a generalized sequence transducer, and develops a hierarchy of time-complexity classes based on the Ackermann function.
Abstract: This paper develops a database query language called Transducer Datalog motivated by the needs of a new and emerging class of database applications. In these applications, such as text databases and genome databases, the storage and manipulation of long character sequences is a crucial feature. The issues involved in managing this kind of data are not addressed by traditional database systems, either in theory or in practice. To address these issues, in recent work, we introduced a new machine model called a generalized sequence transducer. These generalized transducers extend ordinary transducers by allowing them to invoke other transducers as “subroutines.” This paper establishes the computational properties of Transducer Datalog, a query language based on this new machine model. In the process, we develop a hierarchy of time-complexity classes based on the Ackermann function. The lower levels of this hierarchy correspond to well-known complexity classes, such as polynomial time and hyper-exponential time. We establish a tight relationship between levels in this hierarchy and the depth of subroutine calls within Transducer Datalog programs. Finally, we show that Transducer Datalog programs of arbitrary depth express exactly the sequence functions computable in primitive-recursive time.
TL;DR: This work considers spatial databases that can be defined in terms of polynomial inequalities, and is interested in monotonic transformations of spatial databases.
Abstract: We consider spatial databases that can be defined in terms of polynomial inequalities, and we are interested in monotonic transformations of spatial databases.
TL;DR: This paper introduces the specification language CoCaA, designed for the specification of both organisational and transactional aspects of cooperative activities, based on the CoACT cooperative transaction model.
Abstract: This paper introduces the specification language CoCaA. The features of COCGA are designed for the specification of both organisational and transactional aspects of cooperative activities, based on the CoACT cooperative transaction model. The novelty of the language lies in its ability to deal with a broad spectrum of cooperative applications, ranging from cooperative document authoring to workflow applications.
TL;DR: Existential quantification of procedures is introduced as a mechanism for languages with dynamic typing, where operations over values of the abstracted type may behave differently according to the actual specialisation.
Abstract: Existential quantification of procedures is introduced as a mechanism for languages with dynamic typing. It allows abstraction over types whose representations require to be manipulated at run time. Universal quantification, the mechanism normally associated with procedural type abstraction, is shown to be unsuitable for this style of abstraction. For many such procedures only a single type specialisation is correct, hence the analogy with existential quantification from predicate logic. For any invocation of an existentially quantified procedure, the run-time system will require to maintain a single type representation for which the abstracted type stands. Existential quantification represents a class of ad hoc polymorphism, where operations over values of the abstracted type may behave differently according to the actual specialisation.
TL;DR: This paper shows a concrete example in which a technique of static analysis, mainly used in the programming language area, can be successfully applied to a database problem and infers an approximation close to the actual resources that the transaction is going to use at run time.
Abstract: This paper shows a concrete example in which a technique of static analysis, mainly used in the programming language area, can be successfully applied to a database problem. The database problem is the automatic (i.e., without a transaction programmer's intervention) realization of a new concurrency control protocol called conservative multiple granularity locking. Being conservative, the scheduler of this protocol ensures that the database resources needed from a transaction are granted before such a transaction begins its execution. Being multigranular, this protocol deals with an hierarchical organization of database resources and it allows to strike a balance between locking overhead and degree of concurrency allowed from one transaction. The analysis we present allows to automatically infer from the text of a transaction a safe approximation of the set of hierarchical database resources needed from the transaction. The analysis gives particular attention to the management of sets of resources to statically foresee if a transaction will access most of the resources in the set. The proposed technique, which can take advantage of statistical information on database resources, infers an approximation close to the actual resources that the transaction is going to use at run time.