Top 30 papers published in the topic of Multidimensional analysis in 2011

Showing papers on "Multidimensional analysis published in 2011"

Journal Article•10.1007/S10115-010-0283-2•

Statistical outlier detection using direct density ratio estimation

[...]

Shohei Hido¹, Yuta Tsuboi², Hisashi Kashima³, Masashi Sugiyama⁴, Takafumi Kanamori⁵ - Show less +1 more•Institutions (5)

Kyoto University¹, IBM², University of Tokyo³, Tokyo Institute of Technology⁴, Nagoya University⁵

01 Feb 2011-Knowledge and Information Systems

TL;DR: A new statistical approach to the problem of inlier-based outlier detection, i.e., finding outliers in the test set based on the training set consisting only of inliers, using the ratio of training and test data densities as an outlier score is proposed.

...read moreread less

Abstract: We propose a new statistical approach to the problem of inlier-based outlier detection, i.e., finding outliers in the test set based on the training set consisting only of inliers. Our key idea is to use the ratio of training and test data densities as an outlier score. This approach is expected to have better performance even in high-dimensional problems since methods for directly estimating the density ratio without going through density estimation are available. Among various density ratio estimation methods, we employ the method called unconstrained least-squares importance fitting (uLSIF) since it is equipped with natural cross-validation procedures, allowing us to objectively optimize the value of tuning parameters such as the regularization parameter and the kernel width. Furthermore, uLSIF offers a closed-form solution as well as a closed-form formula for the leave-one-out error, so it is computationally very efficient and is scalable to massive datasets. Simulations with benchmark and real-world datasets illustrate the usefulness of the proposed approach.

...read moreread less

219 citations

Journal Article•10.1016/J.JOCS.2011.05.009•

The pursuit of hubbiness: Analysis of hubs in large multidimensional networks

[...]

Michele Berlingerio¹, Michele Coscia², Michele Coscia¹, Fosca Giannotti¹, Anna Monreale¹, Anna Monreale², Dino Pedreschi² - Show less +3 more•Institutions (2)

Istituto di Scienza e Tecnologie dell'Informazione¹, University of Pisa²

01 Aug 2011-Journal of Computational Science

TL;DR: The findings show that: (i) multidimensional hubs do exist and their characterization yields interesting insights and (ii) it is possible to detect the most influential dimensions that cause the different hub behaviors.

...read moreread less

62 citations

Journal Article•10.1109/TKDE.2010.59•

Adaptive Cluster Distance Bounding for High-Dimensional Indexing

[...]

Sharadh Ramaswamy, Kenneth Rose¹•Institutions (1)

University of California, Santa Barbara¹

01 Jun 2011-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A new cluster-adaptive distance bound based on separating hyperplane boundaries of Voronoi clusters to complement the cluster based index is proposed, which enables efficient spatial filtering, with a relatively small preprocessing storage overhead and is applicable to euclidean and Mahalanobis similarity measures.

...read moreread less

Abstract: We consider approaches for similarity search in correlated, high-dimensional data sets, which are derived within a clustering framework. We note that indexing by “vector approximation” (VA-File), which was proposed as a technique to combat the “Curse of Dimensionality,” employs scalar quantization, and hence necessarily ignores dependencies across dimensions, which represents a source of suboptimality. Clustering, on the other hand, exploits interdimensional correlations and is thus a more compact representation of the data set. However, existing methods to prune irrelevant clusters are based on bounding hyperspheres and/or bounding rectangles, whose lack of tightness compromises their efficiency in exact nearest neighbor search. We propose a new cluster-adaptive distance bound based on separating hyperplane boundaries of Voronoi clusters to complement our cluster based index. This bound enables efficient spatial filtering, with a relatively small preprocessing storage overhead and is applicable to euclidean and Mahalanobis similarity measures. Experiments in exact nearest-neighbor set retrieval, conducted on real data sets, show that our indexing method is scalable with data set size and data dimensionality and outperforms several recently proposed indexes. Relative to the VA-File, over a wide range of quantization resolutions, it is able to reduce random IO accesses, given (roughly) the same amount of sequential IO operations, by factors reaching 100X and more.

...read moreread less

42 citations

Reference Book•10.1201/B11429•

Statistical Learning and Data Science

[...]

Mireille Gettler Summa, Leon Bottou, Bernard Goldfarb, Fionn Murtagh, Catherine Pardoux, Myriam Touati - Show less +2 more

19 Dec 2011

TL;DR: Exploring the foundations and recent breakthroughs in the field, Statistical Learning and Data Science demonstrates how data analysis can improve personal and collective health and the well-being of the authors' social, business, and physical environments.

...read moreread less

Abstract: Data analysis is changing fast. Driven by a vast range of application domains and affordable tools, machine learning has become mainstream. Unsupervised data analysis, including cluster analysis, factor analysis, and low dimensionality mapping methods continually being updated, have reached new heights of achievement in the incredibly rich data world that we inhabit. Statistical Learning and Data Science is a work of reference in the rapidly evolving context of converging methodologies. It gathers contributions from some of the foundational thinkers in the different fields of data analysis to the major theoretical results in the domain. On the methodological front, the volume includes conformal prediction and frameworks for assessing confidence in outputs, together with attendant risk. It illustrates a wide range of applications, including semantics, credit risk, energy production, genomics, and ecology. The book also addresses issues of origin and evolutions in the unsupervised data analysis arena, and presents some approaches for time series, symbolic data, and functional data. Over the history of multidimensional data analysis, more and more complex data have become available for processing. Supervised machine learning, semi-supervised analysis approaches, and unsupervised data analysis, provide great capability for addressing the digital data deluge. Exploring the foundations and recent breakthroughs in the field, Statistical Learning and Data Science demonstrates how data analysis can improve personal and collective health and the well-being of our social, business, and physical environments.

...read moreread less

27 citations

Journal Article•10.1016/J.JBI.2011.03.004•

Statistical file matching of flow cytometry data

[...]

Gyemin Lee¹, William G. Finn¹, Clayton Scott¹•Institutions (1)

University of Michigan¹

01 Aug 2011-Journal of Biomedical Informatics

TL;DR: In this article, the authors address the challenge of imputing the high-dimensional jointly distributed values of marker attributes based on overlapping marginal observations and introduce an alternative approach based on nearest neighbor imputation restricted to a cell's subpopulation.

...read moreread less

24 citations

Journal Article•10.1109/MSP.2011.942468•

Video is a Cube

[...]

Christian Keimel, Martin Rothbucher, Hao Shen, Klaus Diepold

01 Nov 2011-IEEE Signal Processing Magazine

TL;DR: This work provides an introduction to the design of video quality metrics by using data analysis methods, which are different from traditional approaches, and uses multidimensional data analysis, an extension of well-established data analysis techniques, to better exploit higher-dimensional data.

...read moreread less

Abstract: Quality of experience (QoE) is becoming increasingly important in signal processing applications. In taking inspiration from chemometrics, we provide an introduction to the design of video quality metrics by using data analysis methods, which are different from traditional approaches. These methods do not necessitate a complete understanding of the human visual system (HVS). We use multidimensional data analysis, an extension of well-established data analysis techniques, allowing us to better exploit higher-dimensional data. In the case of video quality metrics, it enables us to exploit the temporal properties of video more properly; the complete three-dimensional structure of the video cube is taken into account in metrics' design. Starting with the well-known principal component analysis and an introduction to the notation of multiway arrays, we then present their multidimensional extensions, delivering better quality prediction results. Although we focus on video quality, the presented design principles can easily be adapted to other modalities and to even higher dimensional data sets as well.

...read moreread less

20 citations

Journal Article•10.1145/2047414.2047433•

DWEVOLVE: a requirement based framework for data warehouse evolution

[...]

Garima Thakur¹, Anjana Gosain¹•Institutions (1)

Guru Gobind Singh Indraprastha University¹

14 Nov 2011-ACM Sigsoft Software Engineering Notes

TL;DR: This paper presents a theoretical framework called DWEVOLVE to support data warehouse evolution, which enhances the functionality of previously designed framework by taking into account the requirements specified by the users.

...read moreread less

Abstract: Data warehouse integrate information from numerous data sources under a unified schema and format to provide effective results from multidimensional data analysis in order to facilitate reporting and trend analysis. These information sources are dynamic in nature and keep on changing owing to the autonomous nature of transactions being carried out in the organization along with the complexity involved in gathering requirements from the users. Requirements elicitation and collection is difficult to perform because user needs keep on changing. As a consequence, the data warehouse must evolve so that it improves the data quality by easily incorporating the changes in requirements as well as source schema. In this paper we present a theoretical framework called DWEVOLVE to support data warehouse evolution. The proposed framework enhances the functionality of previously designed framework by taking into account the requirements specified by the users. Provisions have also been made to define and generate customized reports according to the user needs.

...read moreread less

16 citations

Patent•

Model Based OLAP Cube Framework

[...]

Vijay Aski¹, Danny Chen¹, Christopher Lauren¹•Institutions (1)

Microsoft¹

25 Dec 2011

TL;DR: In this paper, the authors present a framework for the creation, editing, manipulation and use of model-based, multidimensional analysis services (MAS) cubes, where a user can create a new MAS cube by targeting a set of facts and adding dimensions to the facts.

...read moreread less

Abstract: Systems, methods and computer program products that provide a framework for the creation, editing, manipulation and use of model-based, multidimensional analysis services (MAS) cubes are disclosed. A method of generating a model-based MAS cube comprises creating a data source comprising a data warehouse in the memory via the processor, creating a data source view providing a dimension, a fact and an outrigger from the created data source, and creating the MAS cube comprising at least one measure group. A key performance indicator (KPI) may be calculated from the MAS cube as a scorecard of a display associated with the processor. A user of the model-based MAS cube may create a new cube by targeting a set of facts and adding dimensions to the facts.

...read moreread less

13 citations

Journal Article•

A study on building data warehouse of hospital information system.

[...]

Ping Li¹, Tao Wu, Mu Chen, Bin Zhou, Wei-guo Xu - Show less +1 more•Institutions (1)

Tongji University¹

01 Aug 2011-Chinese Medical Journal

TL;DR: To integrate and make full use of medical data effectively, a data warehouse modeling method is proposed for the hospital information system and can also be employed for a distributed-hospital medical service system.

...read moreread less

Abstract: Background Existing hospital information systems with simple statistical functions cannot meet current management needs. It is well known that hospital resources are distributed with private property rights among hospitals, such as in the case of the regional coordination of medical services. In this study, to integrate and make full use of medical data effectively, we propose a data warehouse modeling method for the hospital information system. The method can also be employed for a distributed-hospital medical service system. Methods To ensure that hospital information supports the diverse needs of health care, the framework of the hospital information system has three layers: datacenter layer, system-function layer, and user-interface layer. This paper discusses the role of a data warehouse management system in handling hospital information from the establishment of the data theme to the design of a data model to the establishment of a data warehouse. Online analytical processing tools assist user-friendly multidimensional analysis from a number of different angles to extract the required data and information. Results Use of the data warehouse improves online analytical processing and mitigates deficiencies in the decision support system. The hospital information system based on a data warehouse effectively employs statistical analysis and data mining technology to handle massive quantities of historical data, and summarizes from clinical and hospital information for decision making. Conclusions This paper proposes the use of a data warehouse for a hospital information system, specifically a data warehouse for the theme of hospital information to determine latitude, modeling and so on. The processing of patient information is given as an example that demonstrates the usefulness of this method in the case of hospital information management. Data warehouse technology is an evolving technology, and more and more decision support information extracted by data mining and with decision-making technology is required for further research.

...read moreread less

11 citations

Journal Article•

Concept-Oriented Model: Extending Objects with Identity, Hierarchies and Semantics

[...]

Alexandr Savinov

01 Jan 2011-The Computer Science Journal of Moldova

TL;DR: What partial order is needed for and how it is used to solve typical data analysis tasks like logical navigation, multidimensional analysis and reasoning about data are discussed.

...read moreread less

Abstract: The concept-oriented data model (COM) is an emerging approach to data modeling which is based on three novel principles: duality, inclusion and order. These three structural principles provide a basis for modeling domain-speciflc identities, object hierarchies and data semantics. In this paper these core principles of COM are presented from the point of view of object data models (ODM). We describe the main data modeling construct, called concept, as well as two relations in which it participates: inclusion and partial order. Concepts generalize conventional classes by extending them with identity class. Inclusion relation generalizes inheritance by making objects elements of a hierarchy. We discuss what partial order is needed for and how it is used to solve typical data analysis tasks like logical navigation, multidimensional analysis and reasoning about data.

...read moreread less

11 citations

Journal Article•10.5121/IJCSES.2011.2404•

Uclean : a requirement based object - oriented etl framework

[...]

Payal Pahwa, Shweta Taneja, Garima Thakur

30 Nov 2011-International Journal of Computer Science & Engineering Survey

TL;DR: This paper proposes a conceptual ETL framework for an object oriented data warehouse design, the framework is called UCLEAN and takes into account the concept of requirements of the users.

...read moreread less

Abstract: Data warehouse is used to provide effective results from multidimensional data analysis. The accuracy and correctness of these results depend on the quality of the data. To improve data quality, data must be properly extracted, transformed and loaded into the data warehouse. This ETL process is the key to the success of a data warehouse. In this paper we propose a conceptual ETL framework for an object oriented data warehouse design, the framework is called UCLEAN. This framework takes into account the concept of requirements of the users .The data is extracted from different UML sources and is converted into a multidimensional model. It is then cleaned and loaded in the data warehouse. We validate the effectiveness of the framework through a case study.

...read moreread less

Journal Article•10.1016/J.PROENV.2011.09.194•

The Application of Multidimensional Data Analysis in the EIA Database of Electric Industry

[...]

Che Lei¹, Ding Feng, Cui Wei¹, Zhang Ai-xin¹, Chen Zhen-hu¹ - Show less +1 more•Institutions (1)

Beijing Information Science & Technology University¹

01 Jan 2011-Procedia environmental sciences

TL;DR: Using Multidimensional data analysis technology to statistic and analyze the Environmental Impact Assessment (EIA) basic data of electric industry can get the analysis results of the distribution situation, investment, resource consumption, pollutants, and the environmental impact of power construction projects in different periods and different regions.

...read moreread less

Abstract: Multidimensional data analysis can observe and process data from several angles, obtaining useful information for management decision-making departments and providing effective support by turning business data into management data. Based on SQL Server 2008 and adopting Multidimensional data analysis technology to statistic and analyze the Environmental Impact Assessment (EIA) basic data of electric industry, we can get the analysis results of the distribution situation, investment, resource consumption, pollutants, and the environmental impact of power construction projects in different periods and different regions so as to make users analyze the construction and development of electric industry from multiple angles and provide effective scientific decisive data support for environmental management and decision-making.

...read moreread less

Journal Article•10.1049/IET-SPR.2009.0296•

Fast multidimensional scaling analysis for mobile positioning

[...]

S. Qin, Qun Wan, Zhangxin Chen, A.-M. Huang

24 Feb 2011-Iet Signal Processing

TL;DR: A fast and computationally simple subspace-based algorithm for mobile positioning with the use of time-of-arrival (TOA) measurements of three base stations (BSs) is derived and analysed.

...read moreread less

Abstract: The problem of locating and tracking a mobile station (MS) in which real-time computation is needed has received considerable attentions. In this letter, a fast and computationally simple subspace-based algorithm for mobile positioning with the use of time-of-arrival (TOA) measurements of three base stations (BSs) is derived and analysed. Since the Lagrange multiplier is introduced to avoid eigendecomposition of multidimensional similarity matrix, the proposed algorithm offers very competitive performance at low computational complexity.

...read moreread less

Journal Article•10.1007/S00450-010-0138-9•

Computational intelligence in biomedical imaging: multidimensional analysis of spatio-temporal patterns

[...]

Axel Wismüller¹•Institutions (1)

University of Rochester Medical Center¹

01 Feb 2011-Computer Science - Research and Development

TL;DR: This contribution covers both conceptual foundations and applications of such methods for pattern recognition and analysis to a wide scope of radiological data sets, such as structural and functional segmentation in Magnetic Resonance Imaging (MRI), ranging from functional MRI for human brain mapping to the monitoring of disease progression in multiple sclerosis by automatic lesion segmentation.

...read moreread less

Abstract: Technical innovations in radiology, such as advanced cross-sectional imaging methods, have opened up new vistas for the exploration of structure and function of the human body enabling both high spatial and temporal resolution. However, these techniques have led to vast amounts of data whose precise and reliable visual analysis by radiologists requires a considerable amount of human intervention and expertise, thus resulting in a cost factor of substantial economic relevance. Hence, the computer-assisted analysis of biomedical image data has moved into the focus of interest as an issue of high priority research efforts. In this context, innovative approaches to exploratory analysis of huge complex spatio-temporal patterns play a key role to improve computer-assisted signal and image processing in radiology. Examples of such approaches are various unsupervised vector quantization methods or supervised function approximation techniques, such as Generalized Radial-Basis-Functions- (GRBF-) neural networks. Recent developments motivated by concepts of computational intelligence are the `Deformable Feature Map' (DM) as an algorithm for self-organized model adaptation, the `Mutual Connectivity Analysis' (MCA) as an instrument for the analysis of large time-series ensembles and the `Exploratory Observation Machine' (XOM) as a novel general framework for learning by self-organization--three methods that the author has invented and applied to biomedical real-world applications. This contribution covers both conceptual foundations and applications of such methods for pattern recognition and analysis to a wide scope of radiological data sets, such as structural and functional segmentation in Magnetic Resonance Imaging (MRI), ranging from functional MRI for human brain mapping to the monitoring of disease progression in multiple sclerosis by automatic lesion segmentation, as well as novel approaches to image time-series analysis in MRI mammography for breast cancer diagnosis. Current projects related to the modeling of speech production and to genome-wide expression analysis of microarray data in bioinformatics confirm the broad applicability of the presented methods.

...read moreread less

Journal Article•

Video is a Cube: Multidimensional Analysis and Video Quality Metrics

[...]

Christian Keimel, Martin Rothbucher, Hao Shen, Klaus Diepold

01 Jan 2011-IEEE Signal Processing Magazine

TL;DR: This work provides an introduction to the design of video quality metrics by using data analysis methods, which are different from traditional approaches, and uses multidimensional data analysis, an extension of well established data analysis techniques, to exploit higher dimensional data better.

...read moreread less

Abstract: Quality of Experience is becoming increasingly important in signal processing applications. In taking inspiration from chemometrics, we provide an introduction to the design of video quality metrics by using data analysis methods, which are different from traditional approaches. These methods do not necessitate a complete understanding of the human visual system. We use multidimensional data analysis, an extension of well established data analysis techniques, allowing us to exploit higher dimensional data better. In the case of video quality metrics, it enables us to exploit the temporal properties of video more properly, the complete three dimensional structure of the video cube is taken into account in metrics’ design. Starting with the well known principal component analysis and an introduction to the notation of multi-way arrays, we then present their multidimensional extensions, delivering better quality prediction results. Although we focus on video quality, the presented design principles can easily be adapted to other modalities and to even higher dimensional datasets as well.

...read moreread less

Proceedings Article•10.1109/ICCSE.2011.6028591•

A multidimensional data analysis system based on MDA for educational data warehousing

[...]

Xuejian Yan¹, Xueqing Li¹•Institutions (1)

Shandong University¹

26 Sep 2011

TL;DR: From the perspective of MDA, a multidimensional data model is created, which is implemented by using J2EE architecture, and flash RIA technology, which provides users with good visual modeling and data display interfaces.

...read moreread less

Abstract: The existing management information systems in universities are often designed for specific management applications, and there are still many problems and shortcomings on data analysis and decision support. This paper analyzes the main problems, then from the perspective of MDA, creates a multidimensional data model and builds a multidimensional data analysis system for educational data warehousing and data mining. This system is implemented by using J2EE architecture, and flash RIA technology, which provides users with good visual modeling and data display interfaces.

...read moreread less

Journal Article•

A Novel Method for Selecting and Materializing Views based on OLAP Signatures and GRASP

[...]

Andresson da Silva Firmino¹, Rodrigo Costa Mateus¹, Valéria Cesário Times¹, Lucídio dos Anjos Formiga Cabral², Thiago Luís Lopes Siqueira³, Ricardo Rodrigues Ciferri⁴, Cristina Dutra de Aguiar Ciferri⁵ - Show less +3 more•Institutions (5)

Federal University of Pernambuco¹, Federal University of Paraíba², São Paulo Federal Institute of Education, Science and Technology³, Federal University of São Carlos⁴, University of São Paulo⁵

13 Sep 2011-Journal of Information and Data Management

TL;DR: A novel method for selecting and materializing views based on OLAP signatures and GRASP (Greedy Randomized Adaptive Search) is proposed, which allows for a hybrid method, which traverses the solution space in a comprehensive manner as performed in purely random approaches.

...read moreread less

Abstract: Although the materialization of views reduces the execution time of OLAP queries, the materialization of a large number of views may exceed computer storage thresholds. Thus, given a certain storage cost threshold, there is a need for selecting the best views to be materialized, i.e. views that t the storage requirements and provide the lowest response time to process OLAP queries. Several solutions have been proposed in the literature to solve this problem. However, most studies have adopted strictly greedy or purely random approaches. Also, most of them do not encompass the entire cycle of execution of multidimensional analysis, or do not specify and implement the whole cycle of multidimensional query execution. In this paper, we address these issues by proposing a novel method for selecting and materializing views based on OLAP signatures and GRASP (Greedy Randomized Adaptive Search). On the one hand, using OLAP signatures and their relationships with descriptions of the data cube, we are able to identify which views should be materialized for being more beneficial to the user query processing. On the other hand, using GRASP allows us to dene a hybrid method, which traverses the solution space in a comprehensive manner as performed in purely random approaches, while examines only the regions of the search space with a great concentration of good solutions generated by a greedy approach. GRASP was compared to other VSP algorithms, namely Pick by Size (PBS) and Ant Colony Optimization (ACO), and performance tests indicated that compared to PBS, the proposed method obtained a time reduction of about 20.4% in query processing. In addition, GRASP was more scalable than PBS, since it selected and materialized a smaller set of views, even when there was a wide range of possible views to be chosen. Also, GRASP obtained nearly the same query runtime of ACO (i.e. a small performance loss of about 2.84% was obtained by GRASP), but a shorter time for the selection of views than the ACO algorithm (i.e. a gain in processing time of about 77% was produced by GRASP).

...read moreread less

Journal Article•10.18209/IAKLE.2011.22.2.375•

A Study on the Generic Features of Korean Research Articles Based on Multidimensional Analysis.

[...]

Hong Hye Ran

01 Jun 2011-Journal of Korean Language Education

Patent•

Multidimensional data selection device, multidimensional data selection method and multidimensional data selection program

[...]

Hirata Takahisa

10 Feb 2011

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To selectively present an important dimension by defining an importance level of a dimension, in general multidimensional data analysis. SOLUTION: A dimension importance level calculation part 102 extracts a plurality of combinations between pieces of m-dimensional (m is an integer ≥1 and COPYRIGHT: (C)2011,JPO&INPIT

...read moreread less

Accounting and Financial Data Analysis Data Mining Tools

[...]

Diana Codreanu, Ionela Popa, Denisa Elena Parpandel

25 Jul 2011

TL;DR: This paper wishes to present advanced techniques for analysis and exploitation of data stored in a multidimensional database.

...read moreread less

Abstract: Computerized accounting systems in recent years have seen an increase in complexity due to the competitive economic environment but with the help of data analysis solutions such as OLAP and Data Mining can be a multidimensional data analysis, can detect the fraud and can discover knowledge hidden in data, ensuring such information is useful for decision making within the organization. In the literature there are many definitions for data mining but all boils down to same idea: the process takes place to extract new information from large data collections, information without the aid of data mining tools would be very difficult to obtain. Information obtained by data mining process has the advantage that only respond to the question of what happens but at the same time argue and show why certain things are happening. In this paper we wish to present advanced techniques for analysis and exploitation of data stored in a multidimensional database.

...read moreread less

Proceedings Article•10.1145/2063576.2063828•

RFID data analysis using tensor calculus for supply chain management

[...]

Roberto De Virgilio¹, Franco Milicchio¹•Institutions (1)

Roma Tre University¹

24 Oct 2011

TL;DR: A general model for supply chain management based on the first principles of linear algebra, in particular on tensorial calculus is proposed, capable of exploiting recent parallel and distributed technologies, and subdividing tensor objects into sub-blocks, and processing them independently.

...read moreread less

Abstract: In current trends of consumer products market, there is a growing significance of the role of retailers in the governance of supply chains. RFID is a promising infrastructure-less technology, allowing to connect an object with its virtual counterpart, i.e., its representation within information systems. However, the amount of RFID data in supply chain management is vast, posing significant challenges for attaining acceptable performance on their analysis. Current approaches provide hard-coded solutions, with high consumption of resources; moreover, these exhibit very limited flexibility dealing with multidimensional queries, at various levels of granularity and complexity. In this paper we propose a general model for supply chain management based on the first principles of linear algebra, in particular on tensorial calculus. Leveraging our abstract algebraic framework, our technique allows both quick decentralized on-line processing, and centralized off-line massive business logic analysis, according to needs and requirements of supply chain actors. Experimental results show that our approach, utilizing recent linear algebra techniques can process analysis efficiently, when compared to recent approaches. In particular, we are able to carry out the required computations even in high memory constrained environments, such as on mobile devices. Moreover, when dealing with massive amounts of data, we are capable of exploiting recent parallel and distributed technologies, subdividing our tensor objects into sub-blocks, and processing them independently.

...read moreread less

Book Chapter•10.1007/978-3-642-13312-1_38•

From Histogram Data to Model Data Analysis

[...]

Marina Marino¹, Simona Signoriello¹•Institutions (1)

University of Naples Federico II¹

1 Jan 2011

TL;DR: The idea is to approximate histogram data using B-spline functions in order to synthetize the information within data trough some characteristic function parameters that will be the new data that could be analyzed with methodologies of multidimensional data analysis.

...read moreread less

Abstract: The aim of this work is to propose a new approach for dealing with histogram data in symbolic data analysis framework. The idea is to approximate histogram data using B-spline functions in order to synthetize the information within data trough some characteristic function parameters. This parameters will be the new data that could be, subsequently, analyzed with methodologies of multidimensional data analysis.

...read moreread less

Posted Content•

Modelling Financial-Accounting Decisions by Means of OLAP Tools

[...]

Diana Elena Codreanu

01 Mar 2011-Database Systems Journal

TL;DR: Computerized accounting systems have grown their complexity by means of data analyzing information solutions such as OLAP and Data Mining which help perform a multidimensional analysis of financial-accounting data, potential frauds can be detected and data hidden information can be revealed, trends for certain indicators can be set up, therefore ensuring useful information to a company’s decision making

...read moreread less

Abstract: At present, one can say that a company’s good running largely depends on the information quantity and quality it relies on when making decisions. The information needed to underlie decisions and be obtained due to the existence of a high-performing information system which makes it possible for the data to be shown quickly, synthetically and truly, also providing the opportunity for complex analyses and predictions. In such circumstances, computerized accounting systems, too, have grown their complexity by means of data analyzing information solutions such as OLAP and Data Mining which help perform a multidimensional analysis of financial-accounting data, potential frauds can be detected and data hidden information can be revealed, trends for certain indicators can be set up, therefore ensuring useful information to a company’s decision making

...read moreread less

Journal Article•10.20533/IJISR.2042.4639.2011.0005•

Decision Support in Uncertain Environments

[...]

Hany M. S. Lala, Kamal A. ElDahshan

1 Jun 2011

TL;DR: It is concluded that data warehousing and data mining are essentials for an effective decision support system for handling uncertain climate data consolidated with SAS application system.

...read moreread less

Abstract: There are ambiguities and vagueness in solar radiation records during a day. As a consequence, the development and adaptation of automatic knowledge acquisition techniques under uncertainty is entirely advisable. Among them fuzzy sets theory formulizes and analyses the situations in which the uncertainty is due to non-precise environment. Any good decision is based on information. Decision makers should reduce their uncertainty by obtaining as much reliable and consistent information as possible. In this paper, we introduce a novel decision support system for handling uncertain climate data consolidated with SAS application system. This novel approach is based on the membership function and multidimensional analysis. From our findings, we conclude that data warehousing and data mining are essentials for an effective decision support system. Because organizations frequently process uncertain information, decision makers should reduce their uncertainty by obtaining as much reliable and consistent information as possible. Due to the potential benefits of a data warehouse as well as internal and external pressure for creating a competitive advantage, many organizations have launched data warehouse projects with the expectations of acquiring a consistent and reliable source of data for their DSS. Therefore, a DSS with data warehouse should improve the performance of users by improving information accessibility which positively affects the quality of decision making [1]. At present there is a great need to provide decision makers from middle management upward with information at the correct level of detail to support decision making. Data warehousing, on-line analytical processing (OLAP), and data mining provide this functionality [2]. Although there are researchers in fields of certainty, such as the natural science fields of physics, mathematics, biology, and in the social science fields of philosophy, economy, society, psychology, cognition, almost nobody places suspicion on the uncertain essence of the world. An increasing number of scientists believe that uncertainty is the charm of the world and only uncertainty itself is certain! For a long time, humans thought uncertainty was equal to randomness. Probability theory is the main mathematical tool to solve the problem of randomness. With more in-depth studies, people found a kind of uncertainties that could not be described with randomness. That was “fuzziness”. Fuzziness is a characteristic feature of modern science to describe quantitative relationships and space formation by using precise definitions and rigidly proven theorems, and to explore the laws of the objective world by using precisely controlled experimental methods, accurate measurements, and calculation so as to establish a rigorous theoretic system [3]. There are ambiguities and vagueness in solar radiation records during a day. There is a desired need for providing a simple technique whereby uncertainties in the process of solar radiation measurements being handled. In this paper, we introduce a novel decision support system for handling uncertain climate data consolidated with SAS application system. This novel approach is based on the membership function and multidimensional analysis. The rest of the paper is organized as follows. The next section reviews related work. In Section 3 we will describe the characteristics of decision support systems. In Section 4 we demonstrate the basic data warehousing concepts. Section 5 provides the definition of a general framework for fuzzy sets in the presence of uncertainty. Section 6 introduces climate case study. In Section 7 we present our approach for solving this problem. Section 8 presents the results analysis. Section 9 shows International Journal for Information Security Research (IJISR), Volume 1, Issue 2, June 2011 Copyright © 2011, Infonomics Society 35

...read moreread less

Data organization of the relational database with the usage of set theory

[...]

A. V. Melikov

1 Jan 2011

TL;DR: The usage of set theory is described for the data organization of the relational database into the required structure and the database of informational questionnaire system is taken for the input data.

...read moreread less

Abstract: Summary . In this investigation they describe the usage of set theory for the data organization of the relational database into the required structure. They take the database of informational questionnaire system for the input data. The data is orga-nized for the further transferring into the multidimensional data analysis system. Key words: information questionnaire system; relational database; multidi-mensional analysis (MDA); set theory. Introduction Fast and flexible data analysis is one of the main requirements while constructing business-analytics architecture. There are all the transactions of the detail layer in the relational data storage. Using this data the users would like to get the final information, to add their own calculations and to analyze the data with help of the mechanism for spontaneous requests creating. Programs for multidimensional data analysis – are the products that stand in between the database and elec-tronic worksheet (according to the mechanism principle) and create a special product class of that kind. [1]. Historically they are the evolution of electronic worksheets, but nowadays they resemble databasesбольше напоминают базы данных. The most important feature of electronic worksheet is the possibility to start up links between the cells with help of the formulas. The most important feature of programs for multidimen-sional data analysis is the possibility to easily form the multidimensional received data cube and to modify its screen view. Nowadays there are programs of three types: 1) addition to the popular electronic worksheet systems; 2) addition to the popular database management system; 3) standalone products, that are capable to import data from differ-ent sources.

...read moreread less

Book Chapter•

Improving diagnosis processes through multidimensional analysis in medical institutions

[...]

Orlando Belo

1 Jan 2011

Book Chapter•10.1007/978-3-642-20291-9_1•

Information networks mining and analysis

[...]

Philip S. Yu¹•Institutions (1)

University of Illinois at Chicago¹

18 Apr 2011

TL;DR: This talk presents various issues and solutions on scalable mining and analysis of information networks, and illustrates how to apply network analysis technique to solve classical frequent item-set mining in a more efficient top-down fashion.

...read moreread less

Abstract: With the ubiquity of information networks and their broad applications, there have been numerous studies on the construction, online analytical processing, and mining of information networks in multiple disciplines, including social network analysis, World-Wide Web, database systems, data mining, machine learning, and networked communication and information systems. Moreover, with a great demand of research in this direction, there is a need to understand methods for analysis of information networks from multiple disciplines. In this talk, we will present various issues and solutions on scalable mining and analysis of information networks. These include data integration, data cleaning and data validation in information networks, summarization, OLAP and multidimensional analysis in information networks. Finally, we illustrate how to apply network analysis technique to solve classical frequent item-set mining in a more efficient top-down fashion.

...read moreread less

Journal Article•10.4028/WWW.SCIENTIFIC.NET/AMM.52-54.978•

The Design and Implementation of Agricultural Production Data Warehouse

[...]

Li Juan Zhou¹, Xiao Xu He¹, Kang Li¹•Institutions (1)

Capital Normal University¹

01 Mar 2011-Applied Mechanics and Materials

TL;DR: This paper designed a data warehouse model of agricultural production and built an effective and viable agricultural production data warehouse, by using some key technologies: multidimensional data analysis, cube, materialization view selection, materialized view maintenance.

...read moreread less

Abstract: This paper designed a data warehouse model of agricultural production. And it built an effective and viable agricultural production data warehouse, by using some key technologies: multidimensional data analysis, cube, materialized view selection, materialized view maintenance. Finally, it provided a solution for the effective management and maintenance problems about high-capacity heterogeneous data.

...read moreread less

Journal Article•10.1093/BIOINFORMATICS/BTR143•

Improving the efficiency of multidimensional scaling in the analysis of high-dimensional data using singular value decomposition

[...]

Christophe Bécavin¹, Nicolas Tchitchek¹, Colette Mintsa-Eya¹, Annick Lesne¹, Arndt Benecke¹ - Show less +1 more•Institutions (1)

Institut des Hautes Études Scientifiques¹

01 May 2011-Bioinformatics

TL;DR: This work demonstrates the most efficient and accurate initialization strategy for MDS algorithms, reducing considerably computational load and rendering MDS methodology much more useful in the analysis of high-dimensional data such as functional genomics datasets.

...read moreread less

Abstract: Motivation: Multidimensional scaling (MDS) is a well-known multivariate statistical analysis method used for dimensionality reduction and visualization of similarities and dissimilarities in multidimensional data. The advantage of MDS with respect to singular value decomposition (SVD) based methods such as principal component analysis is its superior fidelity in representing the distance between different instances specially for high-dimensional geometric objects. Here, we investigate the importance of the choice of initial conditions for MDS, and show that SVD is the best choice to initiate MDS. Furthermore, we demonstrate that the use of the first principal components of SVD to initiate the MDS algorithm is more efficient than an iteration through all the principal components. Adding stochasticity to the molecular dynamics simulations typically used for MDS of large datasets, contrary to previous suggestions, likewise does not increase accuracy. Finally, we introduce a k nearest neighbor method to analyze the local structure of the geometric objects and use it to control the quality of the dimensionality reduction. Results: We demonstrate here the, to our knowledge, most efficient and accurate initialization strategy for MDS algorithms, reducing considerably computational load. SVD-based initialization renders MDS methodology much more useful in the analysis of high-dimensional data such as functional genomics datasets. Contact: arndt@ihes.fr

...read moreread less

Journal Article•10.1109/TKDE.2010.101•

Anonymous Publication of Sensitive Transactional Data

[...]

Gabriel Ghinita¹, Panos Kalnis², Yufei Tao³•Institutions (3)

Purdue University¹, King Abdullah University of Science and Technology², The Chinese University of Hong Kong³

01 Feb 2011-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work proposes two categories of novel anonymization methods based on approximate nearest-neighbor (NN) search in high-dimensional spaces, which is efficiently performed through locality-sensitive hashing (LSH) and two data transformations that capture the correlation in the underlying data: reduction to a band matrix and Gray encoding-based sorting.

...read moreread less

Abstract: Existing research on privacy-preserving data publishing focuses on relational data: in this context, the objective is to enforce privacy-preserving paradigms, such as k-anonymity and l-diversity, while minimizing the information loss incurred in the anonymizing process (i.e., maximize data utility). Existing techniques work well for fixed-schema data, with low dimensionality. Nevertheless, certain applications require privacy-preserving publishing of transactional data (or basket data), which involve hundreds or even thousands of dimensions, rendering existing methods unusable. We propose two categories of novel anonymization methods for sparse high-dimensional data. The first category is based on approximate nearest-neighbor (NN) search in high-dimensional spaces, which is efficiently performed through locality-sensitive hashing (LSH). In the second category, we propose two data transformations that capture the correlation in the underlying data: 1) reduction to a band matrix and 2) Gray encoding-based sorting. These representations facilitate the formation of anonymized groups with low information loss, through an efficient linear-time heuristic. We show experimentally, using real-life data sets, that all our methods clearly outperform existing state of the art. Among the proposed techniques, NN-search yields superior data utility compared to the band matrix transformation, but incurs higher computational overhead. The data transformation based on Gray code sorting performs best in terms of both data utility and execution time.

...read moreread less