TL;DR: The data cube operator as discussed by the authors generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers.
Abstract: Data analysis applications typically aggregate data across many dimensions looking for unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or one-dimensional answers. Applications need the N-dimensional generalization of these operators. The paper defines that operator, called the data cube or simply cube. The cube operator generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers. The cube treats each of the N aggregation attributes as a dimension of N-space. The aggregate of a particular set of attribute values is a point in this space. The set of points forms an N-dimensionaI cube. Super-aggregates are computed by aggregating the N-cube to lower dimensional spaces. Aggregation points are represented by an "infinite value": ALL, so the point (ALL,ALL,...,ALL, sum(*)) represents the global sum of all items. Each ALL value actually represents the set of values contributing to that aggregation.
TL;DR: The cube operator as discussed by the authors generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers, and treats each of the N aggregation attributes as a dimension of N-space.
Abstract: Data analysis applications typically aggregate data across many dimensions looking for anomalies or unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or one-dimensional aggregates. Applications need the N-dimensional generalization of these operators. This paper defines that operator, called the data cube or simply cube. The cube operator generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers. The novelty is that cubes are relations. Consequently, the cube operator can be imbedded in more complex non-procedural data analysis programs. The cube operator treats each of the N aggregation attributes as a dimension of N-space. The aggregate of a particular set of attribute values is a point in this space. The set of points forms an N-dimensional cube. Super-aggregates are computed by aggregating the N-cube to lower dimensional spaces. This paper (1) explains the cube and roll-up operators, (2) shows how they fit in SQL, (3) explains how users can define new aggregate functions for cubes, and (4) discusses efficient techniques to compute the cube. Many of these features are being added to the SQL Standard.
TL;DR: In this article, a lattice framework is used to express dependencies among views and greedy algorithms are presented to determine a good set of views to materialize, with a small constant factor of optimal.
Abstract: Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like total sales. The values of many of these cells are dependent on the values of other cells in the data cube. A common and powerful query optimization technique is to materialize some or all of these cells rather than compute them from raw data each time. Commercial systems differ mainly in their approach to materializing the data cube. In this paper, we investigate the issue of which cells (views) to materialize when it is too expensive to materialize all views. A lattice framework is used to express dependencies among views. We present greedy algorithms that work off this lattice and determine a good set of views to materialize. The greedy algorithm performs within a small constant factor of optimal under a variety of models. We then consider the most common case of the hypercube lattice and examine the choice of materialized views for hypercubes in detail, giving some good tradeoffs between the space used and the average time to answer a query.
TL;DR: A single disperser spectral imager is presented that exploits recent theoretical work in the area of compressed sensing to achieve snapshot spectral imaging and can be used to capture spatiospectral information of a scene that consists of two balls illuminated by different light sources.
Abstract: We present a single disperser spectral imager that exploits recent theoretical work in the area of compressed sensing to achieve snapshot spectral imaging. An experimental prototype is used to capture the spatiospectral information of a scene that consists of two balls illuminated by different light sources. An iterative algorithm is used to reconstruct the data cube. The average spectral resolution is 3.6 nm per spectral channel. The accuracy of the instrument is demonstrated by comparison of the spectra acquired with the proposed system with the spectra acquired by a nonimaging reference spectrometer.
TL;DR: The remarkable advantage of CASSI is that the entire data cube is sensed with just a few FPA measurements and, in some cases, with as little as a single FPA shot.
Abstract: Imaging spectroscopy involves the sensing of a large amount of spatial information across a multitude of wavelengths. Conventional approaches to hyperspectral sensing scan adjacent zones of the underlying spectral scene and merge the results to construct a spectral data cube. Push broom spectral imaging sensors, for instance, capture a spectral cube with one focal plane array (FPA) measurement per spatial line of the scene [1], [2]. Spectrometers based on optical bandpass filters sequentially scan the scene by tuning the bandpass filters in steps. The disadvantage of these techniques is that they require scanning a number of zones linearly in proportion to the desired spatial and spectral resolution. This article surveys compressive coded aperture spectral imagers, also known as coded aperture snapshot spectral imagers (CASSI) [1], [3], [4], which naturally embody the principles of compressive sensing (CS) [5], [6]. The remarkable advantage of CASSI is that the entire data cube is sensed with just a few FPA measurements and, in some cases, with as little as a single FPA shot.