Journal Article10.1145/3349265
A Comprehensive Survey on Cloud Data Mining (CDM) Frameworks and Algorithms
30
TL;DR: This article presents the existing frameworks, services, platforms, and algorithms for cloud data mining and provides taxonomies on the basis of data mining techniques such as clustering, classification, and association rule mining.
read more
Abstract: Data mining is used for finding meaningful information out of a vast expanse of data. With the advent of Big Data concept, data mining has come to much more prominence. Discovering knowledge out of a gigantic volume of data efficiently is a major concern as the resources are limited. Cloud computing plays a major role in such a situation. Cloud data mining fuses the applicability of classical data mining with the promises of cloud computing. This allows it to perform knowledge discovery out of huge volumes of data with efficiency. This article presents the existing frameworks, services, platforms, and algorithms for cloud data mining. The frameworks and platforms are compared among each other based on similarity, data mining task support, parallelism, distribution, streaming data processing support, fault tolerance, security, memory types, storage systems, and others. Similarly, the algorithms are grouped on the basis of parallelism type, scalability, streaming data mining support, and types of data managed. We have also provided taxonomies on the basis of data mining techniques such as clustering, classification, and association rule mining. We also have attempted to discuss and identify the major applications of cloud data mining. The various taxonomies for cloud data mining frameworks, platforms, and algorithms have been identified. This article aims at gaining better insight into the present research realm and directing the future research toward efficient cloud data mining in future cloud systems.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Sentiment Analysis of Reviews in Natural Language: Roman Urdu as a Case Study
01 Jan 2022
TL;DR: In this paper , the authors presented a model to classify the polarity of the review(s) in Roman Urdu text (reviews) for the purpose of raw data was scraped from the reviews of 20 songs from Indo-Pak Music Industry.
29
Industrial Dataspace for smart manufacturing: connotation, key technologies, and framework
TL;DR: Smart manufacturing is a popular concept for smarter decision-making and more efficient production, but distributed methods for data management and processing have many challenges.
28
•Journal Article
Ask a better question, get a better answer A new approach to private data analysis
TL;DR: This paper presents a new perspective on the classical problem of statistical disclosure control – revealing accurate statistics about a population while preserving the privacy of individuals through cryptographic techniques.
26
Research on big data analysis model of multi energy power generation considering pollutant emission—Empirical analysis from Shanxi Province
TL;DR: In this article, a data-mining algorithm was used to analyze the development of thermal power, hydropower, wind power, waste heat, gas, and other power sources, which will contribute to the decision-making basis for controlling power emissions, improving the utilization rate of renewable energy, and optimizing the energy structure.
21
Memetic Spider Monkey Optimization for Spam Review Detection Problem
TL;DR: Wang et al. as mentioned in this paper proposed a hybrid spider monkey optimization with a memetic search to improve the local search ability of SMO, which is called Memetic Spider Monkey Optimization (MeSMO).
9
References
•Book
Data Mining: Concepts and Techniques
Jiawei Han,Micheline Kamber,Jian Pei +2 more
- 08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
- 06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
•Proceedings Article
A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise
Martin Ester,Hans-Peter Kriegel,Jörg Sander,Xiaowei Xu +3 more
- 02 Aug 1996
TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.
20.3K
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Mining frequent patterns without candidate generation
Jiawei Han,Jian Pei,Yiwen Yin +2 more
- 16 May 2000
TL;DR: This study proposes a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develops an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth.
Related Papers (5)
Astha Pareek,Manish Gupta +1 more
- 01 Jan 2012
Qing Qiu Cai,Hong Gang Cui,Hao Tang +2 more
- 03 Aug 2017