TL;DR: This paper introduces OpenCubes, an RDFS vocabulary for the specification and publication of multidimensional cubes on the Semantic Web, and shows how classical OLAP operations can be implemented over Open Cubes using SPARQL 1.1 without the need of mapping the multiddimensional information to the local database.
Abstract: Traditional OLAP tools have proven to be successful in analyzing large sets of enterprise data. For today's business dynamics, sometimes these highly curated data is not enough. External data (particularly web data), may be useful to enhance local analysis. In this paper we discuss the extraction of multidimensional data from web sources, and their representation in RDFS. We introduce Open Cubes, an RDFS vocabulary for the specification and publication of multidimensional cubes on the Semantic Web, and show how classical OLAP operations can be implemented over Open Cubes using SPARQL 1.1, without the need of mapping the multidimensional information to the local database (the usual approach to multidimensional analysis of Semantic Web data). We show that our approach is plausible for the data sizes that can usually be retrieved to enhance local data repositories.
TL;DR: Ten of the most relevant challenges for bibliometric analysis when dealing with multidimensional data are discussed, and a reference data model is proposed that, according to different goals, can help analysis designers and bibliographic experts in working with large collections of bibliographical data.
Abstract: The complexity and variety of bibliographic data is growing, and efforts to define new methodologies and techniques for bibliometric analysis are intensifying. In this complex scenario, one of the most crucial issues is the quality of data and the capability of bibliometric analysis to cope with multiple data dimensions. Although the problem of enforcing a multidimensional approach to the analysis and management of bibliographic data is not new, a reference design pattern and a specific conceptual model for multidi- mensional analysis of bibliographic data are still missing. In this paper, we discuss ten of the most relevant challenges for bibliometric analysis when dealing with multidimensional data, and we propose a reference data model that, according to different goals, can help analysis designers and bibliographic experts in working with large collections of biblio- graphic data.
TL;DR: Cloud Computing: Data-Intensive Computing and Scheduling explores the evolution of classical techniques and describes completely new methods and innovative algorithms that demonstrate how cloud computing can meet business requirements and serve as the infrastructure of multidimensional data analysis applications.
Abstract: As more and more data is generated at a faster-than-ever rate, processing large volumes of data is becoming a challenge for data analysis software. Addressing performance issues, Cloud Computing: Data-Intensive Computing and Scheduling explores the evolution of classical techniques and describes completely new methods and innovative algorithms. The book delineates many concepts, models, methods, algorithms, and software used in cloud computing. After a general introduction to the field, the text covers resource management, including scheduling algorithms for real-time tasks and practical algorithms for user bidding and auctioneer pricing. It next explains approaches to data analytical query processing, including pre-computing, data indexing, and data partitioning. Applications of MapReduce, a new parallel programming model, are then presented. The authors also discuss how to optimize multiple group-by query processing and introduce a MapReduce real-time scheduling algorithm. A useful reference for studying and using MapReduce and cloud computing platforms, this book presents various technologies that demonstrate how cloud computing can meet business requirements and serve as the infrastructure of multidimensional data analysis applications.
TL;DR: In this paper, a cloud data warehouse system for supporting mass data processing and providing a data mining service, a multidimensional analysis service and a data presentation service is presented, where the main control module is used for sending instructions to other application modules and controlling the flow direction of data stream.
Abstract: The invention discloses a cloud data warehouse system for supporting mass data processing and providing a data mining service, a multidimensional analysis service and a data presentation service. The cloud data warehouse system mainly comprises a main control module, a data loading module and a data mining module, wherein the main control module is used for sending instructions to other application modules and controlling the flow direction of data stream; the data loading module is used for loading data from an external database; and the data mining module is used for carrying out data mining calculation on the data. The cloud data warehouse system further comprises a multidimensional analysis module for carrying out multidimensional analysis on the data and a data presentation module for presenting the data by means of making reports.
TL;DR: In this article, a multilevel and multidimensional method and a device for analyzing data attributes are presented, wherein the method comprises the following steps: building a public code platform for maintaining the public basic data.
Abstract: The invention relates to a multilevel and multidimensional method and a device for analyzing data attributes, wherein the method comprises the following steps: building a public code platform for maintaining the public basic data, wherein the public basic data comprises basic data and/or a public code table; building a data relationship model for the basic data analysis as the basic data analysis platform; performing the multilevel and multidimensional analysis on the data attributes according to the data relationship of the basic data analysis platform. The multilevel and multidimensional method and the device for analyzing data attributes achieve a tree-shaped hierarchy show for the multidimensional analysis on the data attributes through the analysis dimensionalities of the basic data analysis types and the tree-shaped hierarchy expansion, and fast achieve the building and flexible expansion for the public basic data through the public code platform. The method and the device fast achieve the building of the data attribute analysis types and analysis dimensions based on the basic data analysis platform, and achieve the multilevel and multidimensional analysis on the data attributes and dynamic expansion of the analysis dimensions in a tree-shaped hierarchy expansion way. The dimensional data analysis rules are arranged to flexibly achieve the statistics summary of the analysis data.
TL;DR: This paper overviews NoSQL and adjacent technologies, and evaluates Hadoop/Pig using TPC-H benchmark, through two different scenarios of clouds.
Abstract: NoSQL systems rose alongside internet companies, which have different challenges in dealing with data that the traditional RDBMS solutions could not cope with. Indeed, in order to handle the continuous growth of data, NoSQL alternatives feature dynamic horizontal scaling rather than vertical scaling. To date few studies address OLAP benchmarking of NoSQL systems. This paper overviews NoSQL and adjacent technologies, and evaluates Hadoop/Pig using TPC-H benchmark, through two different scenarios of clouds. The first scenario assumes that data is saved on a data cloud and business questions are routed to the cloud for processing; while the second scenario assumes pre-summarized data calculus in a first step and multidimensional analysis in a second step. Finally, the paper reports thorough performance tests on Hadoop for various data volumes, workloads, and cluster’ sizes.
TL;DR: The multidimensional analysis results reveal some potential regularity between criminal's action and the cases, so as to help the policemen make correct judgments.
TL;DR: Three dimensions can be assigned to naturalness of voice, temporal distortions and calmness in text-to-speech systems and will be used in the future to build a dimension-based quality predictor for synthetic speech.
Abstract: This paper presents research on perceptual quality dimensions of synthetic speech. We generated 57 stimuli from 16/19 female/male German text-to-speech systems (TTS) and asked listeners to judge the perceptual distances between them in a sorting task. Through a subsequent multidimensional scaling algorithm, we extracted three dimensions. Via expert listening and a comparison to ratings gathered on 16 attribute scales, the three dimensions can be assigned to naturalness of voice, temporal distortions and calmness. These dimensions are discussed in detail and compared to the perceptual quality dimensions from previous multidimensional analyses. Moreover, the results are analyzed depending on the type of TTS system. The identified dimensions will be used in the future to build a dimension-based quality predictor for synthetic speech.
TL;DR: A new model of atypical cluster is proposed to effectively represent those events and efficiently retrieve them from massive data and can provide more accurate information with only 15% to 20% time cost of the baselines.
Abstract: A Cyber-Physical System (CPS) integrates physical devices (e.g., sensors, cameras) with cyber (or informational) components to form a situation-integrated analytical system that may respond intelligently to dynamic changes of the real-world situations. CPS claims many promising applications, such as traffic observation, battlefield surveillance and sensor-network based monitoring. One important research topic in CPS is about the atypical event analysis, i.e., retrieving the events from large amount of data and analyzing them with spatial, temporal and other multi-dimensional information. Many traditional approaches are not feasible for such analysis since they use numeric measures and cannot describe the complex atypical events. In this study, we propose a new model of atypical cluster to effectively represent those events and efficiently retrieve them from massive data. The micro-cluster is designed to summarize individual events, and the macro-cluster is used to integrate the information from multiple event. To facilitate scalable, flexible and online analysis, the concept of significant cluster is defined and a guided clustering algorithm is proposed to retrieve significant clusters in an efficient manner. We conduct experiments on real datasets with the size of more than 50 GB, the results show that the proposed method can provide more accurate information with only 15% to 20% time cost of the baselines.
TL;DR: A conceptual modeling framework is introduced that extends traditional multidimensional models and OLAP operators to address the new set of requirements for data extracted from social media and introduces new equivalence classes that allow us to reason with "similar" concepts in new ways.
Abstract: With the advent of social media there is an ever increasing amount of unstructured data that can be analyzed to obtain insights. Two prominent examples are sentiment analysis and the discovery of correlated concepts. A convenient representation of information in such scenarios is in terms of concepts extracted from the unstructured data, and measures, such as sentiment scores, associated with these concepts. Typically, social media analysis reports these concepts and their associated measures. We argue that much richer insights can be obtained through the use of OLAP-style multidimensional analysis. It is fairly straightforward to see how to add traditional dimension hierarchies such as time and geography, and to analyze the data along these dimensions using traditional OLAP operations such as roll-up; for instance, to answer queries of the form "What was the average sentiment for X in Europe during the past month?" However, it is trickier to answer queries of the form "What was the average sentiment for concepts related to X in Europe during the past month?" We introduce a conceptual modeling framework that extends traditional multidimensional models and OLAP operators to address the new set of requirements for data extracted from social media. In this model, we organize data along both traditional dimensions (we call these metadata dimensions) and concept dimensions, which model relationships among concepts using parent-child hierarchies. Specifically: (i) we allow operations on parent-child hierarchies to be treated in a uniform way as operations on traditional dimension hierarchies; (ii) to model the rich relationships that can exist among concepts, we extend the parent-child hierarchies to be rooted level-DAGs rather than simply trees; and (iii) we introduce new equivalence classes that allow us to reason with "similar" concepts in new ways. We show that our modeling and operator framework facilitates multidimensional analysis to gain further insights from social media data than is possible with existing methods.
TL;DR: An ontology-based natural language interface whose goal is to simplify and make more flexible and intuitive the interaction between users and OLAP solutions is described, so that average users can be autonomous in analyzing their data.
Abstract: Current technology facilitates access to the vast amount of information that is produced every day. Both individuals and companies are active consumers of data from the Web and other sources, and these data guide decision making. Due to the huge volume of data to be processed in a business context, managers rely on decision support systems to facilitate data analysis. OLAP tools are Business Intelligence solutions for multidimensional analysis of data, allowing the user to control the perspective and the degree of detail in each dimension of the analysis. A conventional OLAP system is configured to a set of analysis scenarios associated with multidimensional data cubes in the repository. To handle a more spontaneous query, not supported in these provided scenarios, one must have specialized technical skills in data analytics. This makes it very difficult for average users to be autonomous in analyzing their data, as they will always need the assistance of specialists. This article describes an ontology-based natural language interface whose goal is to simplify and make more flexible and intuitive the interaction between users and OLAP solutions. Instead of programming an MDX query, the user can freely write a question in his own human language. The system interprets this question by combining the requested information elements, and generates an answer from the OLAP repository.
TL;DR: In this paper, a data analysis platform and method for an electric power system is presented, which can be used for extracting unknown and valuable data through a machine learning algorithm, thereby proving a powerful support for enterprise decision making.
Abstract: The invention discloses a data analysis platform and method for an electric power system The platform stores data to be processed by virtue of a cloud distributed database, performs instant inquiry and/or multidimensional analysis and/or machine learning and the like on the data to be processed according to a received operation instruction of analyzing the data to be processed and carries out the knowledge mining on the data to be processed through the machine learning algorithm analysis The data analysis platform and method for the electric power system disclosed by the invention can be used for extracting unknown and valuable data through a machine learning algorithm, thereby proving a powerful support for enterprise decision making
TL;DR: The approach allows to classify users by their activity, popularity, behavior as well as to organize messages by topic, impact, origin, method of generation, etc, adding more intelligence to the analysis and extends the limits of OLAP.
Abstract: The standard approach to OLAP requires measures and dimensions of a cube to be known at the design stage. Besides, dimensions are required to be non-volatile, balanced and normalized. These constraints appear too rigid for many data sets, especially semi-structured ones, such as user-generated content in social networks and other web applications. We enrich the multidimensional analysis of such data via content-driven discovery of dimensions and classification hierarchies. Discovered elements are dynamic by nature and evolve along with the underlying data set.
We demonstrate the benefits of our approach by building a data warehouse for the public stream of the popular social network and microblogging service Twitter. Our approach allows to classify users by their activity, popularity, behavior as well as to organize messages by topic, impact, origin, method of generation, etc. Such capturing of the dynamic characteristic of the data adds more intelligence to the analysis and extends the limits of OLAP.
TL;DR: This chapter focuses on data cube technology, which provides many effective and scalable methods for cube computation and methods for multidimensional data analysis.
Abstract: This chapter focuses on data cube technology. Data warehouse systems provide online analytical processing (OLAP) tools for interactive analysis of multidimensional data at varied granularity levels. OLAP tools typically use the data cube and a multidimensional data model to provide flexible access to summarized data. A data cube can interactively explore the data in a multidimensional way through OLAP operations like drill-down (to see more specialized data such as total sales per city) or roll-up (to see the data at a more generalized level such as total sales per country). Although the data cube concept was originally intended for OLAP, it is also useful for data mining. Multidimensional data mining is an approach to data mining that integrates OLAP-based data analysis with knowledge discovery techniques. It is also known as exploratory multidimensional data mining and online analytical mining (OLAM). It searches for interesting patterns by exploring the data in multidimensional space. Users can interactively drill down or roll up to varying abstraction levels to find classification models, clusters, predictive rules, and outliers. Methods for data cube computation and methods for multidimensional data analysis are focused on. Precomputing a data cube (or parts of a data cube) allows for fast accessing of summarized data. Given the high dimensionality of most data, multidimensional analysis can run into performance bottlenecks. Therefore, it is important to study data cube computation techniques. Data cube technology provides many effective and scalable methods for cube computation. Studying these methods also help in the understanding and further development of scalable methods for other data mining tasks such as the discovery of frequent patterns.
TL;DR: In this article, a method of generating a model-based MAS cube comprises creating a data source comprising a data warehouse in the memory via the processor, creating a view providing a dimension, a fact and an outrigger from the created data source, and creating the MAS cube comprising at least one measure group.
Abstract: Systems, methods and computer program products that provide a framework for the creation, editing, manipulation and use of model-based, multidimensional analysis services (MAS) cubes and using substitute dimensions in such cubes are disclosed. To permit a user to obtain better and automatic access to business intelligence, a method of generating a model-based MAS cube comprises creating a data source comprising a data warehouse in the memory via the processor, creating a data source view providing a dimension, a fact and an outrigger from the created data source, and creating the MAS cube comprising at least one measure group. Using substitute dimensions comprises finding all relevant substitutions for a measure group, creating a table for the measure group in the data source view, adding a property as the primary key of the substitute dimension and generating a query containing an inner join logical link between the substitute and original dimension.
TL;DR: In this article, a system for implementing multidimensional processing on failure mode and effect analysis (FMEA) data, and a processing method of the system, is described.
Abstract: The invention discloses a system for implementing multidimensional processing on failure mode and effect analysis (FMEA) data, and a processing method of the system. According to the method, when a subsystem in a real-time processing system continuously makes response to a request of a network user, and simultaneously, a multidimensional data analysis operation procedure is finished by the subsystem. The method comprises the following steps of: (1) setting an analysis dimension storage unit for setting a method comprising multidimensional data analysis and an equation; (2) responding the request of the user by an FMEA application system, acquiring ineffective data information and product data information which are required to be extracted during calculation of the data, and transmitting a result to an FMEA data multidimensional processing subsystem; (3) summing up the data transmitted by the user through the multidimensional data processing subsystem in the FMEA data multidimensional processing subsystem, wherein a summed result comprises the ineffective data information and the product information which are required by multidimensional data analysis; (4) extracting the information required by the step (1) from a database; (5) calculating an analyzed data result according to the equation set in the step (1); and (6) displaying the result on a client in forms of a three-dimensional diagram and a report. For a high-level risk priority coefficient failure mode, a system supplies a reference solution to the user.
TL;DR: This chapter presents a framework for measuring environmental impact at the item level using Traceability Graph, which supports rough level analysis of products and their histories and applies multidimensional analysis for traceability data.
Abstract: Monitoring the environmental performance of a product is recognized to be increasingly important. The most common method of measuring the environmental performance is the international standards of Life Cycle Assessment (LCA). Typically, measuring is based on estimations and average values at product category level. In this chapter, the authors present a framework for measuring environmental impact at the item level. Using Traceability Graph, emissions and resources can be monitored from the data management perspective. The model can be mapped to any precision level of physical tracing. At the most precise level, even a single physical object and its components can be analyzed. This, of course, demands that the related objects and their components are identified and mapped to the database. From the opposite perspective, the authors’ model also supports rough level analysis of products and their histories. In terms of the Traceability Cube, multidimensional analysis can be applied for traceability data.
TL;DR: An intelligent power consumption business intelligence system architecture based on cloud computing technology is proposed, and the key techniques and algorithms to mine knowledge from massive data are discussed in detail.
Abstract: In order to extract valuable knowledge from massive intelligent power consumption data to support decision-making, an intelligent power consumption business intelligence system architecture based on cloud computing technology is proposed in the paper. First, the design principle and architecture of the system are introduced. And then the key techniques and algorithms to mine knowledge from massive data are discussed in detail. Finally, to verify the effectiveness of the system, a parallel multidimensional analysis and parallel data mining practice for intelligent power consumption data is implemented, which shows a good application effect.
TL;DR: This work proposes Pattern Track to reveal the implied relations and patterns while also maintaining flexible operations on data values, and multi-interactive operations are supported to help users dig out the implied multivariate correlations.
Abstract: In multidimensional data analysis, one important task is to investigate the inner relations and patterns. Numerous visual representations have been proposed, such as parallel coordinates and scatter plots. However, parallel coordinates emphasize the overall patterns and data distributions while ignoring the correlation of data values over more than three dimensions. And when rendering more poly lines, the visual clutter may occur. Scatter plots are powerful in revealing the data distributions but the overall patterns are restrained. We propose Pattern Track to reveal the implied relations and patterns while also maintaining flexible operations on data values. The layout of Pattern Track is composed of a mixture of automatic computations and interactive adjustments by mapping all dimension axes to concentric circles and integrating three levels of concentric group -- data values, patterns and gradient circles. Multi-interactive operations are supported to help users dig out the implied multivariate correlations. Experimental results demonstrate the effectiveness.
TL;DR: The theoretical approach framework for deploying the data mart will ground a multidimensional analysis on „how the different respondents answered to the questions included into the questionnaire?
Abstract: Beyond the traditional data analysis approaches based on SPSS (or similar statistical software tools), an alternative demarche will be subject of our debate. Performant data analysis can be completed based on a multidimensional view of the collected data. This implies an additional data mart powered with information obtained through an ETL process from the collected data. Measures and dimensions will facilitate a subject-oriented, time-based analysis. The theoretical approach framework for deploying the data mart will ground a multidimensional analysis on „how the different respondents answered to the questions included into the questionnaire?“. In addition, a study case was proposed, a questionnaire built and different analyses presented.
TL;DR: In this paper, the authors proposed a neighbourhood analysis approach based on cyclic evaluation of the mean squared error when grouping field locations according to the set of neighbourhood constraints to delineate field areas of any shape that stand out from the rest of the field.
Abstract: . Several different spatial clustering algorithms have been implemented to group geospatial sensor-based measurements of soil attributes into a set of relatively homogeneous management zones. Although they allow multidimensional data analysis, complexity and frequently occurring discontinuities of so called “management zones†make this technology less appealing to the potential user. With the neighbourhood analysis approach presented in this paper, the primary goal was to delineate field areas of any shape that stand out from the rest of the field in terms of a measured soil attribute. To illustrate this approach, popular apparent soil electrical conductivity (EC a ) data have been used to delineate continuous areas of the field that unify measurements with the greatest deviation from the average field conditions. The algorithm is based on cyclic evaluation of the mean squared error when grouping field locations according to the set of neighbourhood constraints.
TL;DR: In this article, a method for realizing linked presentation of database forms and maps is proposed, which comprises the following steps of: 1, generating a data exchange file, 2, transferring the data exchange files to a map assembly, 3, loading space data by virtue of the map assembly; 4, continuing loading the data of the data exchanged file by virtue on map assembly.
Abstract: The invention discloses a method for realizing linked presentation of database forms and maps. The method comprises the following steps of: 1, generating a data exchange file; 2, transferring the data exchange file to a map assembly; 3, loading space data by virtue of the map assembly; 4, continuing loading the data of the data exchange file by virtue of the map assembly; and 5, loading figure predefined configuration of maps by virtue of the map assembly. By utilizing the method, the linked presentation of the database forms and maps can be realized based on the data exchange file and the problem of a single data presentation manner in the existing multidimensional analysis presentation technology can be solved.
TL;DR: The objective is to make proposals for the design of a SIS SID-quality and meet the needs of different stakeholders of the university and reproduce a set of metadata specific to multidimensional databases specific to the decision-oriented universities.
Abstract: Our objective is to make proposals for the design of a SIS SID-quality and meet the needs of different stakeholders of the university. This is where we join (which is poorly modeled by the concept of data marts in the current tools of the market), namely the modeling of data resources. Often the documents are deposited on the information system of an organization without classification, without indexing, with all the information on their content, their purpose, their technical requirements and practices. The method of describing the properties of a document is a binding step involves an author and a culture of destruction of documents. Few users perform document properties they file on a system design and information. Then it is naturally more difficult to retrieve these information gaps which usually take the form of voids, it is still necessary that the input fields are provided adequate and appropriately organized, arranged and explained. Indeed, it often happens for example on an intranet of an organization the drop zones are not conducive to give relevant information on the properties of materials downloaded. In the best case, the documents are managed by their own systems, accessible through their own search engine or by federated search engines. Why we try to answer the question: how to reproduce a set of metadata specific to multidimensional databases specific to the decision-oriented universities.
TL;DR: The ways that can improve the Software cost estimation Guidelines in order to replace those that are expected to be abolished at February, 2012, and solve the problems that are being occurred in the current Software Cost Estimation Guidelines are presented.
Abstract: This paper presents the ways that can improve the Software Cost Estimation Guidelines in order to replace those that are expected to be abolished at February, 2012, and solve the problems that are being occurred in the current Software Cost Estimation Guidelines. By using multidimensional modeling of OLAP(On-Line Analytical Processing), this paper does three dimensional modeling that considers the product/service view, process view and skill view. Also, it presents the identification method of cost estimation data through the view of each dimension. Furthermore, it defines the software cost estimation process and adapts them into the bottom up estimation and the top down estimation. Finally, it proposes the access of cost estimation data by the multidimensional analysis of OLAP.
TL;DR: The paper considers the problems of multidimensional numerical data processing and analysis of CFD inverse problems and optimization problems and data analysis scheme is applied to the practical problem of the space-time structures localization for interacting time-dependent flows.
Abstract: The paper considers the problems of multidimensional numerical data processing and analysis. The volumes of numerical data are written in a form of multidimensional arrays. The data are considered as numerical solutions of CFD inverse problems and optimization problems. Data analysis scheme is applied to the practical problem of the space-time structures localization for interacting time-dependent flows. Версия статьи с цветными иллюстрациями размещена по адресу http://www.keldysh.ru/pages/cgraph/publications/cgd_publ.htm.
TL;DR: In this article, the authors proposed a data shaping system composed of a computing system consisting of an analysis management sub system, a data shape sub system and an analysis object DB 400, which calculates an association degree among data items of data according to a request from a user.
Abstract: PROBLEM TO BE SOLVED: To provide a technique for achieving reduction of trials and tribulations necessary for data shaping at a time of multidimensional data analysis by a user.SOLUTION: The data shaping system is composed of a computing system 101 (including an analysis management sub system 200, a data shaping sub system 300, and an analysis object DB 400), and includes a main key recommendation analysis part 302. The system calculates an association degree among respective data items of analysis object data according to a request from a user. On the basis of the association degree, the system analyzes and extracts data items suited for a main key to be recommended to the user and outputs the information to the user. The user is allowed to select a data item as a main key from the output information and perform processing including data shaping on respective data items of the analysis object data for the selected data item.
TL;DR: An analysis of multi-dimension data in the online sales system through the experiments which are based on the knowledge of Data Warehousing and On-Line Analytical Processing and using the related technology of Extraction-Transformation-Loading and Analysis Services has made.
Abstract: An analysis of multi-dimension data in the online sales system through the experiments which are based on the knowledge of Data Warehousing and On-Line Analytical Processing and using the related technology of Extraction-Transformation-Loading and Analysis Services has made.Auxiliary support on the online sales is provided while the On-Line Analytical Processing multidimensional technology is used to show the Results of the analysis about slice,cut,drill,rotary in multi-dimensional data.
TL;DR: In this article, a method and a device for indexing data provided by the embodiment of the invention, N unidimensional indexes are obtained according to N dimensions, wherein the N un-dimensional indexes are corresponding to the N dimensions and are independent of each other, and whether address records of the N independent un-dimensions have intersection is judged, so that data oriented by the address records correspond to the intersection are obtained as index target data.
Abstract: The embodiment of the invention discloses a method and a device for indexing data. According to the method and the device for indexing data provided by the embodiment of the invention, N unidimensional indexes are obtained according to N dimensions, wherein the N unidimensional indexes are corresponding to the N dimensions and are independent of each other, and whether address records of the N independent unidimensional indexes have intersection is judged, so that data oriented by the address records corresponding to the intersection are obtained as index target data. According to the method and the device for indexing data, the problem that the unidimensional indexing technology can not meet the requirements on combination query and multidimensional analysis of multidimensional indexes is solved, and the count values of the label number of flag bits corresponding to the address records of the N unidimensional indexes are determined, so that the requirements on speed of multidimensional analysis are easily and conveniently satisfied, the indexing complexity is reduced, and the performance for accurately indexing data is improved.
TL;DR: A new efficient approach based on the space-filling curves is proposed for classifying multispectral satellite images and removes 'Boundary Effects' of the Hilbert curve, multiple Hilbert curves, z curves, and the Pseudo-Hilbert curve are used jointly.
Abstract: With the wide usage of multispectral images, a fast efficient multidimensional clustering method becomes not only meamnglul but also necessary. In general, to speed up the multidimensional images' analysis, a multidimensional feature vector should be transformed into a lower dimensional space. The Hilbert curve is a continuous one-to-one mapping from N-dimensional space to one-dimensional space, and can preserves neighborhood as much as possible. However, because the Hilbert curve is generated by a recurve division process, 'Boundary Effects' will happen, which means data that are close in N-dimensional space may not be close in one-dimensional Hilbert order. In this paper, a new efficient approach based on the space-filling curves is proposed for classifying multispectral satellite images. In order to remove 'Boundary Effects' of the Hilbert curve, multiple Hilbert curves, z curves, and the Pseudo-Hilbert curve are used jointly. The proposed method extracts category clusters from one-dimensional data without computing any distance in N-dimensional space. Furthermore, multispectral images can be analyzed hierarchically from coarse data distribution to fine data distribution in accordance with different application. The experimental results performed on LANDSAT data have demonstrated that the proposed method is efficient to manage the multispectral images and can be applied easily.