TL;DR: A novel design, partial sideways cracking, is proposed that achieves performance similar to using presorted data, but without requiring the heavy initial presorting step itself, and brings significant performance benefits for multi-attribute queries.
Abstract: Column-stores gained popularity as a promising physical design alternative. Each attribute of a relation is physically stored as a separate column allowing queries to load only the required attributes. The overhead incurred is on-the-fly tuple reconstruction for multi-attribute queries. Each tuple reconstruction is a join of two columns based on tuple IDs, making it a significant cost component. The ultimate physical design is to have multiple presorted copies of each base table such that tuples are already appropriately organized in multiple different orders across the various columns. This requires the ability to predict the workload, idle time to prepare, and infrequent updates. In this paper, we propose a novel design, partial sideways cracking, that minimizes the tuple reconstruction cost in a self-organizing way. It achieves performance similar to using presorted data, but without requiring the heavy initial presorting step itself. Instead, it handles dynamic, unpredictable workloads with no idle time and frequent updates. Auxiliary dynamic data structures, called cracker maps, provide a direct mapping between pairs of attributes used together in queries for tuple reconstruction. A map is continuously physically reorganized as an integral part of query evaluation, providing faster and reduced data access for future queries. To enable flexible and self-organizing behavior in storage-limited environments, maps are materialized only partially as demanded by the workload. Each map is a collection of separate chunks that are individually reorganized, dropped or recreated as needed. We implemented partial sideways cracking in an open-source column-store. A detailed experimental analysis demonstrates that it brings significant performance benefits for multi-attribute queries.
TL;DR: In this paper, performance metrics data in a multi-dimensional structure such as a nested scorecard matrix is transformed into a flat structure or de-normalized for efficient querying of individual records.
Abstract: Performance metrics data in a multi-dimensional structure such as a nested scorecard matrix is transformed into a flat structure or de-normalized for efficient querying of individual records. Each dimension and header is converted to a column and data values resolved at intersection of dimension levels through an iterative process covering all dimensions and headers of the data structure. A key corresponding to a tuple representation of each cell or a transform of the tuple may be used to identify rows corresponding to the resolved data in cells for further enhanced query capabilities.
TL;DR: In this paper, a column-oriented in-memory database structure is proposed, which includes a main store and a dictionary compressed delta store, where a transaction associated with the column may then be received and recorded within the delta store.
Abstract: According to some embodiments, a column-oriented in-memory database structure may be established. The database structure may, for example, include a main store and a dictionary compressed delta store. Moreover, the delta store may comprise a value identifier vector and a delta dictionary associated with a column of the database. A transaction associated with the column may then be received and recorded within the delta store. According to some embodiments, entries associated with the transaction may be added to a value log of the value identifier vector and, independently, to a dictionary log of the delta dictionary.
TL;DR: In this article, the methods, systems, and computer-readable media of columnar storage of a database index are described, including a column store that stores rows of the columnar index in a column-wise fashion, and a delta store, which stores rows in a row-wise manner.
Abstract: Methods, systems, and computer-readable media of columnar storage of a database index are disclosed. A particular columnar index includes a column store that stores rows of the columnar index in a column-wise fashion and a delta store that stores rows of the columnar index in a row-wise fashion. The column store also includes an absence flag array. The absence flag array includes entries that indicate whether certain rows have been logically deleted from the column store.