TL;DR: An overview of data warehousing and OLAP technologies, with an emphasis on their new requirements, is provided, based on a tutorial presented at the VLDB Conference, 1996.
Abstract: Data warehousing and on-line analytical processing (OLAP) are essential elements of decision support, which has increasingly become a focus of the database industry. Many commercial products and services are now available, and all of the principal database management system vendors now have offerings in these areas. Decision support places some rather different requirements on database technology compared to traditional on-line transaction processing applications. This paper provides an overview of data warehousing and OLAP technologies, with an emphasis on their new requirements. We describe back end tools for extracting, cleaning and loading data into a data warehouse; multidimensional data models typical of OLAP; front end client tools for querying and data analysis; server extensions for efficient query processing; and tools for metadata management and for managing the warehouse. In addition to surveying the state of the art, this paper also identifies some promising research issues, some of which are related to problems that the database research community has worked on for years, but others are only just beginning to be addressed. This overview is based on a tutorial that the authors presented at the VLDB Conference, 1996.
TL;DR: In this paper, a MOLAP algorithm was proposed to compute the Cube operator for multi-dimensional OLAP (MOLAP) systems, which store their data in sparse arrays rather than in tables.
Abstract: Computing multiple related group-bys and aggregates is one of the core operations of On-Line Analytical Processing (OLAP) applications. Recently, Gray et al. [GBLP95] proposed the “Cube” operator, which computes group-by aggregations over all possible subsets of the specified dimensions. The rapid acceptance of the importance of this operator has led to a variant of the Cube being proposed for the SQL standard. Several efficient algorithms for Relational OLAP (ROLAP) have been developed to compute the Cube. However, to our knowledge there is nothing in the literature on how to compute the Cube for Multidimensional OLAP (MOLAP) systems, which store their data in sparse arrays rather than in tables. In this paper, we present a MOLAP algorithm to compute the Cube, and compare it to a leading ROLAP algorithm. The comparison between the two is interesting, since although they are computing the same function, one is value-based (the ROLAP algorithm) whereas the other is position-based (the MOLAP algorithm). Our tests show that, given appropriate compression techniques, the MOLAP algorithm is significantly faster than the ROLAP algorithm. In fact, the difference is so pronounced that this MOLAP algorithm may be useful for ROLAP systems as well as MOLAP systems, since in many cases, instead of cubing a table directly, it is faster to first convert the table to an array, cube the array, then convert the result back to a table.
TL;DR: This paper presents a formal framework to describe evolutions of multidimensional schemas and their effects on the schema and on the instances and describes how the algebra enables a tool supported environment for schema evolution.
Abstract: Database systems offering a multidimensional schema on a logical level (e.g. OLAP systems) are often used in data warehouse environments. The user requirements in these dynamic application areas are subject to frequent changes. This implies frequent structural changes of the database schema. In this paper, we present a formal framework to describe evolutions of multidimensional schemas and their effects on the schema and on the instances. The framework is based on a formal conceptual description of a multidimensional schema and a corresponding schema evolution algebra. Thus, the approach is independent of the actual implementation (e.g. MOLAP or ROLAP). We also describe how the algebra enables a tool supported environment for schema evolution.
TL;DR: This work presents a cross-dimensional and application-independent algebra for the high-level treatment of arbitrary arrays, which forms the conceptual basis of a domain-independent array DBMS, RasDaMan, which offers an SQL-based query language with extensive algebraic query and storage optimization.
Abstract: Recently multidimensional arrays have received considerable attention among the database community, applications ranging from GIS to OLAP. Work on the formalization of arrays frequently focuses on mapping sparse arrays to ROLAP schemata. Database modeling of further array types, such as image data, is done differently and with less rigid methods. A unifying formal framework for general array handling of image, sensor, statistics, and OLAP data is missing.
We present a cross-dimensional and application-independent algebra for the high-level treatment of arbitrary arrays. An array constructor, a generalized aggregate, plus a multidimensional sorter allow to declaratively manipulate arrays. This algebra forms the conceptual basis of a domain-independent array DBMS, RasDaMan, which offers an SQL-based query language with extensive algebraic query and storage optimization. The system is in practical use in neuro science.
We introduce the algebra and show how the operators transform to the array query language. The universality of our approach is demonstrated by a number of examples from imaging, statistics, and OLAP.
TL;DR: Existing ROLAP methods that implement the data cube are reviewed and six orthogonal parameters/dimensions that characterize them are identified and placed at the appropriate points within the problem space defined by these parameters and several clusters that the techniques form with various interesting properties are identified.
Abstract: Implementation of the data cube is an important and scientifically interesting issue in On-Line Analytical Processing (OLAP) and has been the subject of a plethora of related publications. Naive implementation methods that compute each node separately and store the result are impractical, since they have exponential time and space complexity with respect to the cube dimensionality. To overcome this drawback, a wide range of methods that provide efficient cube implementation (with respect to both computation and storage) have been proposed, which make use of relational, multidimensional, or graph-based data structures. Furthermore, there are several other methods that compute and store approximate descriptions of data cubes, sacrificing accuracy for condensation. In this article, we focus on Relational-OLAP (ROLAP), following the majority of the efforts so far. We review existing ROLAP methods that implement the data cube and identify six orthogonal parameters/dimensions that characterize them. We place the existing techniques at the appropriate points within the problem space defined by these parameters and identify several clusters that the techniques form with various interesting properties. A careful study of these properties leads to the identification of particularly effective values for the space parameters and indicates the potential for devising new algorithms with better overall performance.