Open challenges for data stream mining research

doi:10.1145/2674026.2674028

Journal Article10.1145/2674026.2674028

Open challenges for data stream mining research

Georg Krempl, +9 more

- 25 Sep 2014

- Sigkdd Explorations

- Vol. 16, Iss: 1, pp 1-10

321

TL;DR: This article presents a discussion on eight open challenges for data stream mining, which cover the full cycle of knowledge discovery and involve such problems as protecting data privacy, dealing with legacy systems, handling incomplete and delayed information, analysis of complex data, and evaluation of stream mining algorithms.

Abstract: Every day, huge volumes of sensory, transactional, and web data are continuously generated as streams, which need to be analyzed online as they arrive. Streaming data can be considered as one of the main sources of what is called big data. While predictive modeling for data streams and big data have received a lot of attention over the last decade, many research approaches are typically designed for well-behaved controlled problem settings, overlooking important challenges imposed by real-world applications. This article presents a discussion on eight open challenges for data stream mining. Our goal is to identify gaps between current research and meaningful applications, highlight open problems, and define new application-relevant research directions for data stream mining. The identified challenges cover the full cycle of knowledge discovery and involve such problems as: protecting data privacy, dealing with legacy systems, handling incomplete and delayed information, analysis of complex data, and evaluation of stream mining algorithms. The resulting analysis is illustrated by practical applications and provides general suggestions concerning lines of future research in data stream mining.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Data Mining - Concepts and Techniques.

Petra Perner

- 01 Jan 2002

14.6K

•Journal Article•10.1016/J.INFFUS.2017.02.004

Ensemble learning for data stream analysis

Bartosz Krawczyk, +4 more

- 01 Sep 2017

- Information Fusion

TL;DR: This paper surveys research on ensembles for data stream classification as well as regression tasks and discusses advanced learning concepts such as imbalanced data streams, novelty detection, active and semi-supervised learning, complex data representations and structured outputs.

...read moreread less

1K

•Journal Article•10.1007/S10618-015-0448-4

Characterizing concept drift

Geoffrey I. Webb, +4 more

- 01 Jul 2016

- Data Mining and Knowledge Discovery

TL;DR: This work presents the first comprehensive framework for quantitative analysis of drift, giving rise to a new comprehensive taxonomy of concept drift types and a solid foundation for research into mechanisms to detect and address concept drift.

...read moreread less

511

Journal Article•10.1145/3373464.3373470

Machine learning for streaming data: state of the art, challenges, and opportunities

Heitor Murilo Gomes, +4 more

- 26 Nov 2019

- Sigkdd Explorations

TL;DR: Incremental learning, online learning, and data stream learning are terms commonly associated with learning algorithms that update their models given a continuous influx of data without performing any act of reinforcement learning.

...read moreread less

261

•Journal Article•10.1109/ACCESS.2019.2926642

Leveraging Machine Learning and Big Data for Smart Buildings: A Comprehensive Survey

Basheer Qolomany, +6 more

- 03 Jul 2019

- IEEE Access

TL;DR: This survey surveys the area of smart building with a special focus on the role of techniques from machine learning and big data analytics, and reviews the current trends and challenges faced in the development ofSmart building services.

...read moreread less

208

...

Expand

References

•Journal Article•10.1126/SCIENCE.286.5439.509

Emergence of Scaling in Random Networks

Albert-László Barabási, +1 more

- 15 Oct 1999

- Science

TL;DR: A model based on these two ingredients reproduces the observed stationary scale-free distributions, which indicates that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems.

...read moreread less

39.1K

•Journal Article•10.5555/944919.944937

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003

- Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

36.2K

•Book

Data Mining: Concepts and Techniques

Jiawei Han, +2 more

- 08 Sep 2000

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.

...read moreread less

29.9K

•Proceedings Article

Latent Dirichlet Allocation

David M. Blei, +2 more

- 03 Jan 2001

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

25.5K

Journal Article•10.21276/IJRE.2018.5.5.4

MapReduce: simplified data processing on large clusters

Jeffrey Dean, +1 more

- 06 Dec 2004

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.

...read moreread less

22.7K

...

Expand

Open challenges for data stream mining research

Chat with Paper

AI Agents for this Paper

Citations

Data Mining - Concepts and Techniques.

Ensemble learning for data stream analysis

Characterizing concept drift

Machine learning for streaming data: state of the art, challenges, and opportunities

Leveraging Machine Learning and Big Data for Smart Buildings: A Comprehensive Survey

References

Emergence of Scaling in Random Networks

Latent dirichlet allocation

Data Mining: Concepts and Techniques

Latent Dirichlet Allocation

MapReduce: simplified data processing on large clusters

Related Papers (5)

A survey on concept drift adaptation

Learning with Drift Detection

Mining high-speed data streams

Learning from Time-Changing Data with Adaptive Windowing

Mining data streams: a review