TL;DR: This work presents the average-case analysis of orthogonal range search for several multidimensional data structures and considers random relaxed K-d trees as a prototypical example to get very precise asymptotic estimates for the expected cost of range searches.
TL;DR: This work presents the average-case analysis of orthogonal range search for several multidimensional data structures and shows that the performance of range searches is related to a variant of partial matches using a mixture of geometric and combinatorial arguments.
Abstract: In this work we present the average-case analysis of orthogonal range search for several multidimensional data structures. We first consider random relaxed K-d trees as a prototypical example. Later we extend these results to many different multidimensional data structures. We show that the performance of range searches is related to the performance of a variant of partial matches using a mixture of geometric and combinatorial arguments. This reduction simplifies the analysis and allows us to give exact lower and upper bounds for the performance of range searches. Furthermore, under suitable conditions ("small range queries"), we can also get a very precise asymptotic estimate for the expected cost of range searches.
TL;DR: This paper develops a general framework for reconciling the synopses on many tables, which may come from different information sources, and shows how to compute the optimal combination ofsynopses for a given workload and a limited amount of available memory.
Abstract: Maintaining statistics on multidimensional data distributions is crucial for predicting the run-time and result size of queries and data analysis tasks with acceptable accuracy. To this end a plethora of techniques have been proposed for maintaining a compact data "synopsis" on a single table, ranging from variants of histograms to methods based on wavelets and other transforms. However, the fundamental question of how to reconcile the synopses for large information sources with many tables has been largely unexplored. This paper develops a general framework for reconciling the synopses on many tables, which may come from different information sources. It shows how to compute the optimal combination of synopses for a given workload and a limited amount of available memory. The practicality of the approach and the accuracy of the proposed heuristics are demonstrated by experiments.
TL;DR: A number of new techniques for parallelizing queries in multidimensional array database management systems are presented and their implementation in the RasDaMan DBMS, the first DBMS for generic multiddimensional array data is discussed.
Abstract: Intra-query parallelism is a well-established mechanism for achievng high performance in (object-) relational database systems. However, the methods have yet not been applied to the upcoming field of multidimensional array databases. Specific properties of multidimensional array data require new parallel algorithms. This paper presents a number of new techniques for parallelizing queries in multidimensional array database management systems. It discusses their implementation in the RasDaMan DBMS, the first DBMS for generic multidimensional array data. The efficiency of the techniques presented is demonstrated using typical queries on large multidimensional data volumes.
TL;DR: In this paper, a system and method for analyzing data identifies a first set of analysis results based on a first data set and an analysis strategy, and then a second set of data, which is a modification of the first set data, is analyzed using the analysis strategy to generate the second set.
Abstract: A system and method for analyzing data identifies a first set of analysis results based on a first set of data and an analysis strategy. A second set of data, which is a modification of the first set of data, is analyzed using the analysis strategy to generate a second set of analysis results. The second set of analysis results are arranged in a hierarchical format. The first set of analysis results are compared with the second set of analysis results to identify differences in the two analysis results.
TL;DR: In this article, a conceptual model, called Multidimensional Fact Network (MFN), allowing to incrementally define data marts and a toolkit, called AURORA, based on MFN and providing a comprehensive set of data warehouse design tools is proposed.
Abstract: The design of data warehouse is a very relevant issue in supporting the management decisional processes and data analysis While the design and maintenance of data warehouses are difficult tasks, enterprise managements are increasingly asking for tools capable to support designers in all the activities involving data warehouse construction In this context it is mandatory to provide designers with the capability to incrementally define data warehouse components In this paper we propose a conceptual model, called Multidimensional Fact Network (MFN), allowing to incrementally define data marts and a toolkit, called AURORA, based on MFN and providing a comprehensive set of data warehouse design tools
TL;DR: A conceptual model, called Multidimensional Fact Network (MFN), allowing to incrementally define data marts and a toolkit, called AURORA, based on MFN and providing a comprehensive set of data warehouse design tools are proposed.
Abstract: The design of data warehouse is a very relevant issue in supporting the management decisional processes and data analysis. While the design and maintenance of data warehouses are difficult tasks, enterprise managements are increasingly asking for tools capable to support designers in all the activities involving data warehouse construction. In this context it is mandatory to provide designers with the capability to incrementally define data warehouse components. In this paper we propose a conceptual model, called Multidimensional Fact Network (MFN), allowing to incrementally define data marts and a toolkit, called AURORA, based on MFN and providing a comprehensive set of data warehouse design tools.
TL;DR: This work concludes that the cluster analysis approach is a useful complement for assessing multidimensional data and that this dataset has been overused for automated decision benchmarking purposes, without a thorough analysis of the data it contains.
Abstract: This work deals with multidimensional data analysis, precisely cluster analysis applied to a very well known dataset, the Wisconsin Breast Cancer dataset. After the introduction of the topics of the paper the cluster analysis concept is shortly explained and different methods of cluster analysis are compared. Further, the Kohonen model of self-organizing maps is briefly described together with an example and with explanations of how the cluster analysis can be performed using the maps. After describing the data set and the methodology used for the analysis we present the findings using textual as well as visual descriptions and conclude that the approach is a useful complement for assessing multidimensional data and that this dataset has been overused for automated decision benchmarking purposes, without a thorough analysis of the data it contains.
TL;DR: Probabilistic multidimensional scaling models provide a means of accounting for the variability inherent in sensory data by using distributions, instead of points, to portray sensory objects to enable the evaluation of alternative product development strategies.
Abstract: Variability is a fundamental characteristic of sensory profile data. Ignoring the variability may result in biased solutions that cannot be improved by the collection of additional data. Probabilistic multidimensional scaling (PMDS) models provide a means of accounting for the variability inherent in sensory data by using distributions, instead of points, to portray sensory objects. For profile data with high levels of variability, the probabilistic model recovers latent structure parameters very well — traditional deterministic MDS models and principal components analyses (PCA) do not. Advantages of the PMDS models include their parsimony, testability and extensibility. Two particularly attractive PMDS attributes are their ability to relate consumers’ expressions of liking to product profiles and their ability to estimate a product's “ perceptual share” from liking and profile data. Used as a criterion with what-if modeling, perceptual share estimates enable the evaluation of alternative product development strategies.
TL;DR: This paper focuses on data parallelism and investigates multidimensional operations concerning their capability to be executed on parts of a multiddimensional object and introduces different possibilities of splittingMultidimensional objects for parallel processing and merging the results, which affects the architecture of the parallel query execution.
Abstract: Intra-query parallelism is a well-established mechanism for achieving high performance in (object-) relational database systems. However, the methods have yet not been applied to the upcoming field of multidimensional array databases. Specific properties of multidimensional array data require new parallel algorithms. This paper focuses on the parallel execution of expensive multidimensional operations on a single very large multidimensional object. Experiences of the ESTEDI project have shown that this scenario is very typical for the database-supported analysis of large multidimensional data. In the paper we concentrate on data parallelism and investigate multidimensional operations concerning their capability to be executed on parts of a multidimensional object. Furthermore, we introduce different possibilities of splitting multidimensional objects for parallel processing and merging the results, which affects the architecture of the parallel query execution. Intra-operator parallelism was implemented in the Array DBMS RasDaMan.
TL;DR: This paper proposes the general procedure of the conversion of relational into multidimensional database schema and can be partly automated because some decisions of attribute type must be made during conversion.
Abstract: It is universally recognized that operational information systems lean on the relational model and datawarehouses on the multidimensional model. The phrase On-Line Analytical Processing (OLAP) means summarizing, consolidating, viewing, and synthesizing data according to multiple dimensions. The process of modeling data warehouse may start from operational system’s database. It may be helpful to convert a relational database schema into a multidimensional database schema in order to discover dimensions that are hidden in a relational database. However, only a few efforts have been done investigating the conversion of relational into multidimensional database schema. This paper proposes the general procedure of such a conversion. The procedure can be partly automated because some decisions of attribute type must be made during conversion.
TL;DR: The proposed model is a hierarchical extension of the latent trait family of models developed in GTM as a generalization of GTM to noise models from the exponential family of distributions.
Abstract: We present a general framework for interactive visualization and analysis of multi-dimensional data points. The proposed model is a hierarchical extension of the latent trait family of models developed in [4] as a generalization of GTM to noise models from the exponential family of distributions. As some members of the exponential family of distributions are suitable for modeling discrete observations, we give a brief example of using our methodology in interactive visualization and semantic discovery in a corpus of text-based documents. We also derive formulas for computing local magnification factors of latent trait projection manifolds.
TL;DR: This paper tackles the problem of the multidimensional analysis of patents based on the use of textual and statistical analysis techniques and presents the different steps required for the textual and Statistical analysis of patent data.
Abstract: In this paper we tackle the problem of the multidimensional analysis of patents based on the use of textual and statistical analysis techniques. The use of correspondence and cluster analysis permit to identify technological trends and innovation. Furthermore the interactions between the different fields of activities are captured through the use of these statistical methods. Also indicators based on patents can be produced in order to depict in a quantitative way the technological activity in a European level. Finally here are presented the different steps required for the textual and statistical analysis of patent data.
TL;DR: The segmented page indexing (SP-indexing) as discussed by the authors uses two kinds of I/O units: pages for random disk accesses and segments for sequential accesses.
Abstract: This paper presents an index clustering technique called the segmented page indexing (SP-indexing) for multidimensional index structures. The design objectives of the SP-indexing are twofold: (1) to improve the range query performance of the multidimensional indexing methods and (2) to provide a compromise between optimal index clustering and excessive full index reorganization overhead. The SP-indexing uses two kinds of I/O units: pages for random disk accesses and segments for sequential accesses. The SP-indexing improves the range query performance by offering high-performance sequential disk access within a segment. Experimental results demonstrate that the SP-indexing improves the range query performance up to several times compared with the traditional page-based indexing methods with respect to the total elapsed time.
TL;DR: Experimental results demonstrate that the segmented page indexing improves the range query performance up to several times compared with the traditional page-based indexing methods with respect to the total elapsed time.
Abstract: This paper presents an index clustering technique called the segmented page indexing (SP-indexing) for multidimensional index structures. The design objectives of the SP-indexing are twofold: (1) to improve the range query performance of the multidimensional indexing methods and (2) to provide a compromise between optimal index clustering and excessive full index reorganization overhead. The SP-indexing uses two kinds of I/O units: pages for random disk accesses and segments for sequential accesses. The SP-indexing improves the range query performance by offering high-performance sequential disk access within a segment. Experimental results demonstrate that the SP-indexing improves the range query performance up to several times compared with the traditional page-based indexing methods with respect to the total elapsed time.
TL;DR: In this article, the slicing state of a dimension assigned to a page axis and the slicing result of a member of the dimension of the item axis among the selected members are shown.
Abstract: PROBLEM TO BE SOLVED: To provide technology which can support efficient multidimensional data analysis by shortening the operation procedure up to the display of a graph and a table used for multidimensional data analysis. SOLUTION: This method has a step for displaying the slicing state of a dimension assigned to a page axis, a step for selecting a member showing that the slicing state is sliced among members of dimensions assigned to the page axis and an item axis in a display area, a step for calculating the number of combinations of members of the dimension of the page axis among selected members as the number of pieces of drawing information, a step for generating pieces of drawing information corresponding to the combinations by obtaining the slicing result of the member of the dimension of the item axis among the selected members, and a step for dividing the display area for displaying the slicing result by the calculated number of the pieces of drawing information and outputting the pieces of drawing information generated in the respective divided areas.
TL;DR: A new method to classify informational motor tasks, through a multidimensional approach of difficulty, makes it possible either to aggregate tasks with identical profile, or to differentiate them according to each descriptor's weight on dependent variables.
Abstract: This article deals with a new method to classify informational motor tasks, through a multidimensional approach of difficulty The following experiment is used: 345 subjects are asked to point at computer-managed mobile targets The 3 selected target parameters are: velocity, area and spatial uncertainty Independence and numerical scalability are the reasons for this choice As performance does not sump up difficulty, other dependent variables have been added: number of attempts, reaction time and motor time The use of state-of-the-art multidimensional data analysis (HAC, CFA) makes it possible either to aggregate tasks with identical profile, or to differentiate them according to each descriptor's weight on dependent variables Above all, the study specifies the contribution of each variable in the a posteriori definition of the difficulty by the researcher> KEY WORDS: Difficulty, data processing, motor task, classification, data analysis
TL;DR: In this article, a designation table for preserving a dimension for catching change or a data item, and a data extraction processing function are provided for this information analysis system, where the data set is accumulated in a multidimensional database for change analysis different from ones for normal analysis and multi-dimensional analysis.
Abstract: PROBLEM TO BE SOLVED: To analyze and features of a set of multidimensional data by focusing attention on a set wherein a specific dimension or a data item changes with respect to the set of the multidimensional data. SOLUTION: A designation table for preserving a dimension for catching change or a data item, and a data extraction processing function are previously provided for this information analysis system. In the data extraction processing function, whether or not the dimension or the data of the data item designated by the designation table are changed from the prior extraction stage is determined. When changed, the data set is accumulated in a multidimensional database for change analysis different from ones for normal analysis and multidimensional analysis, and the multidimensional database is analyzed. COPYRIGHT: (C)2004,JPO
TL;DR: The paper puts an emphasis on the technical project of multidimensional data - analysis and deals with transforming the business data into data beneficial to decision-making, and enabling the managers to make more scientific decisions efficiently.
Abstract: With coming information revolution, most enterprises generate and store a great deal of data about their daily-work. Coupled with the description of conception, category and storage of OLAP, the paper puts an emphasis on the technical project of multidimensional data - analysis and deals with transforming the business data into data beneficial to decision-making, and enabling the managers to make more scientific decisions efficiently.
TL;DR: This work concludes that the cluster analysis approach is a useful complement for assessing multidimensional data and that this dataset has been overused for automated decision benchmarking purposes, without a thorough analysis of the data it contains.
Abstract: This work deals with multidimensional data analysis, precisely cluster analysis applied to a very well known dataset, the Wisconsin Breast Cancer dataset. After the introduction of the topics of the paper the cluster analysis concept is shortly explained and different methods of cluster analysis are compared. Further, the Kohonen model of self-organizing maps is briefly described together with an example and with explanations of how the cluster analysis can be performed using the maps. After describing the data set and the methodology used for the analysis we present the findings using textual as well as visual descriptions and conclude that the approach is a useful complement for assessing multidimensional data and that this dataset has been overused for automated decision benchmarking purposes, without a thorough analysis of the data it contains.
TL;DR: This study presents a template model to help users declare the interesting multidimensional inter-transactional associations to be mined and shows, through a series of experiments on both synthetic and real-life data sets, that these optimization techniques can yield significant performance benefits.
Abstract: Multidimensional inter-transactional association rules extend the traditional association rules to describe more general associations among items with multiple properties across transactions. “After McDonald and Burger King open branches, KFC will open a branch two months later and one mile away” is an example of such rules. Since the number of potential inter-transactional association rules tends to be extremely large, mining inter-transactional associations poses more challenges on efficient processing than mining traditional intra-transactional associations. In order to make such association rule mining truly practical and computationally tractable, in this study we present a template model to help users declare the interesting multidimensional inter-transactional associations to be mined. With the guidance of templates, several optimization techniques, i.e., joining, converging, and speeding, are devised to speed up the discovery of inter-transactional association rules. We show, through a series of experiments on both synthetic and real-life data sets, that these optimization techniques can yield significant performance benefits.
TL;DR: A new automated method that performs unsupervised pixel purity determination and endmember extraction from multidimensional datasets; this is achieved by using both spatial and spectral information in a combined manner.
Abstract: Spectral mixture analysis provides an efficient mechanism for the interpretation and classification of remotely sensed multidimensional imagery. It aims to identify a set of reference signatures (also known as endmembers) that can be used to model the reflectance spectrum at each pixel of the original image. Thus, the modeling is carried out as a linear combination of a finite number of ground components. Although spectral mixture models have proved to be appropriate for the purpose of large hyperspectral dataset subpixel analysis, few methods are available in the literature for the extraction of appropriate endmembers in spectral unmixing. Most approaches have been designed from a spectroscopic viewpoint and, thus, tend to neglect the existing spatial correlation between pixels. This paper presents a new automated method that performs unsupervised pixel purity determination and endmember extraction from multidimensional datasets; this is achieved by using both spatial and spectral information in a combined manner. The method is based on mathematical morphology, a classic image processing technique that can be applied to the spectral domain while being able to keep its spatial characteristics. The proposed methodology is evaluated through a specifically designed framework that uses both simulated and real hyperspectral data.
TL;DR: The case of adaptive testing under a multidimensional response model with large numbers of constraints on the content of the test is addressed and the procedure is illustrated for five different cases of multidimensionality.
Abstract: The case of adaptive testing under a multidimensional response model with large numbers of constraints on the content of the test is addressed. The items in the test are selected using a shadow test approach. The 0–1 linear programming model that assembles the shadow tests maximizes posterior expected Kullback-Leibler information in the test. The procedure is illustrated for five different cases of multidimensionality. These cases differ in (a) the numbers of ability dimensions that are intentional or should be considered as “nuisance dimensions” and (b) whether the test should or should not display a simple structure with respect to the intentional ability dimensions.
TL;DR: Computational approaches to multi-channel, 3-D, time-course image recording are discussed, including the introduction of a new interdisciplinary effort to develop an effective framework for the analysis of multidimensional image data.
Abstract: Relatively recent strides in noninvasive medical imaging and in vital microscopy have yielded a new form of massive data type: the multi-channel, 3-D, time-course image recording. These multidimensional datasets document dynamic changes within the full volume of a specimen over time, often simultaneously monitoring several different parameters. As a result, visual data collected from a single living specimen can now go beyond the 2-D domain, engaging the viewer’s full capacity to discern changes across space, time, and additional dimensions such as image spectra. A challenge rising out of these advances is to display these data so that the investigator can visualize and interactively explore the recording’s full spatial, temporal. and spectral content, to better understand what cannot be seen directly through the microscope eyepiece. The challenges of multidimensional image analysis are not unique to the biologist or microscopist—space scientists and climatologists have been struggling with these issues for some time in their analysis of atmospheric data. Here we discuss computational approaches to this type of data, including the introduction of a new interdisciplinary effort to develop an effective framework for the analysis of multidimensional image data.
TL;DR: The authors present schema for a bioinformatics system accomplishing multidimensional data analysis based on a data warehouse infrastructure that extracts related genes from functional pathways using heterogeneous data from protein-protein interactions and domain analysis.
Abstract: The authors present schema for a bioinformatics system accomplishing multidimensional data analysis based on a data warehouse infrastructure. They apply cluster analysis for downstream relationship inference. Their initial application extracts related genes from functional pathways using heterogeneous data from protein-protein interactions and domain analysis.