A Comprehensive Survey on Cloud Data Mining (CDM) Frameworks and Algorithms

doi:10.1145/3349265

Journal Article10.1145/3349265

A Comprehensive Survey on Cloud Data Mining (CDM) Frameworks and Algorithms

Hrishav Bakul Barua, +1 more

- 13 Sep 2019

- ACM Computing Surveys

- Vol. 52, Iss: 5, pp 104

30

TL;DR: This article presents the existing frameworks, services, platforms, and algorithms for cloud data mining and provides taxonomies on the basis of data mining techniques such as clustering, classification, and association rule mining.

Abstract: Data mining is used for finding meaningful information out of a vast expanse of data. With the advent of Big Data concept, data mining has come to much more prominence. Discovering knowledge out of a gigantic volume of data efficiently is a major concern as the resources are limited. Cloud computing plays a major role in such a situation. Cloud data mining fuses the applicability of classical data mining with the promises of cloud computing. This allows it to perform knowledge discovery out of huge volumes of data with efficiency. This article presents the existing frameworks, services, platforms, and algorithms for cloud data mining. The frameworks and platforms are compared among each other based on similarity, data mining task support, parallelism, distribution, streaming data processing support, fault tolerance, security, memory types, storage systems, and others. Similarly, the algorithms are grouped on the basis of parallelism type, scalability, streaming data mining support, and types of data managed. We have also provided taxonomies on the basis of data mining techniques such as clustering, classification, and association rule mining. We also have attempted to discuss and identify the major applications of cloud data mining. The various taxonomies for cloud data mining frameworks, platforms, and algorithms have been identified. This article aims at gaining better insight into the present research realm and directing the future research toward efficient cloud data mining in future cloud systems.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1109/access.2022.3150172

Sentiment Analysis of Reviews in Natural Language: Roman Urdu as a Case Study

01 Jan 2022

- IEEE Access

TL;DR: In this paper , the authors presented a model to classify the polarity of the review(s) in Roman Urdu text (reviews) for the purpose of raw data was scraped from the reviews of 20 songs from Indo-Pak Music Industry.

...read moreread less

29

Journal Article•10.1080/00207543.2021.1955996

Industrial Dataspace for smart manufacturing: connotation, key technologies, and framework

Guo Jingwei, +4 more

- 16 Aug 2021

- International Journal of Production Rese...

TL;DR: Smart manufacturing is a popular concept for smarter decision-making and more efficient production, but distributed methods for data management and processing have many challenges.

...read moreread less

28

•Journal Article

Ask a better question, get a better answer A new approach to private data analysis

Cynthia Dwork

- 01 Jan 2006

- Lecture Notes in Computer Science

TL;DR: This paper presents a new perspective on the classical problem of statistical disclosure control – revealing accurate statistics about a population while preserving the privacy of individuals through cryptographic techniques.

...read moreread less

26

Journal Article•10.1016/J.JCLEPRO.2021.128154

Research on big data analysis model of multi energy power generation considering pollutant emission—Empirical analysis from Shanxi Province

Dongfang Ren, +2 more

- 20 Sep 2021

- Journal of Cleaner Production

TL;DR: In this article, a data-mining algorithm was used to analyze the development of thermal power, hydropower, wind power, waste heat, gas, and other power sources, which will contribute to the decision-making basis for controlling power emissions, improving the utilization rate of renewable energy, and optimizing the energy structure.

...read moreread less

21

Journal Article•10.1089/big.2020.0188

Memetic Spider Monkey Optimization for Spam Review Detection Problem

V Sundaram

- 01 Apr 2023

- Big data

TL;DR: Wang et al. as mentioned in this paper proposed a hybrid spider monkey optimization with a memetic search to improve the local search ability of SMO, which is called Memetic Spider Monkey Optimization (MeSMO).

...read moreread less

9

...

Expand

References

•Book

Data Mining: Concepts and Techniques

Jiawei Han, +2 more

- 08 Sep 2000

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.

...read moreread less

29.9K

Journal Article•10.21276/IJRE.2018.5.5.4

MapReduce: simplified data processing on large clusters

Jeffrey Dean, +1 more

- 06 Dec 2004

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.

...read moreread less

22.7K

•Proceedings Article

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

Martin Ester, +3 more

- 02 Aug 1996

TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.

...read moreread less

20.3K

Journal Article•10.1145/1327452.1327492

MapReduce: simplified data processing on large clusters

Jeffrey Dean, +1 more

- 01 Jan 2008

- Communications of The ACM

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.

...read moreread less

18.6K

Journal Article•10.1145/335191.335372

Mining frequent patterns without candidate generation

Jiawei Han, +2 more

- 16 May 2000

TL;DR: This study proposes a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develops an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth.

...read moreread less

7K

...

Expand

A Comprehensive Survey on Cloud Data Mining (CDM) Frameworks and Algorithms

Chat with Paper

AI Agents for this Paper

Citations

Sentiment Analysis of Reviews in Natural Language: Roman Urdu as a Case Study

Industrial Dataspace for smart manufacturing: connotation, key technologies, and framework

Ask a better question, get a better answer A new approach to private data analysis

Research on big data analysis model of multi energy power generation considering pollutant emission—Empirical analysis from Shanxi Province

Memetic Spider Monkey Optimization for Spam Review Detection Problem

References

Data Mining: Concepts and Techniques

MapReduce: simplified data processing on large clusters

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

MapReduce: simplified data processing on large clusters

Mining frequent patterns without candidate generation

Related Papers (5)

NEMICO: Mining Network Data through Cloud-Based Data Mining Techniques

Comprehensive Survey of Big Data Mining Approaches in Cloud Systems

Making knowledge discovery services scalable on clouds for big data mining

Review of Data Mining Techniques in Cloud Computing Database

Big data mining analysis method based on cloud computing