Journal Article10.1145/2674026.2674028
Open challenges for data stream mining research
Georg Krempl,Indre Žliobaite,Dariusz Brzezinski,Eyke Hüllermeier,Vincent Lemaire,Tino Noack,Ammar Shaker,Sonja Sievi,Myra Spiliopoulou,Jerzy Stefanowski +9 more
TL;DR: This article presents a discussion on eight open challenges for data stream mining, which cover the full cycle of knowledge discovery and involve such problems as protecting data privacy, dealing with legacy systems, handling incomplete and delayed information, analysis of complex data, and evaluation of stream mining algorithms.
read more
Abstract: Every day, huge volumes of sensory, transactional, and web data are continuously generated as streams, which need to be analyzed online as they arrive. Streaming data can be considered as one of the main sources of what is called big data. While predictive modeling for data streams and big data have received a lot of attention over the last decade, many research approaches are typically designed for well-behaved controlled problem settings, overlooking important challenges imposed by real-world applications. This article presents a discussion on eight open challenges for data stream mining. Our goal is to identify gaps between current research and meaningful applications, highlight open problems, and define new application-relevant research directions for data stream mining. The identified challenges cover the full cycle of knowledge discovery and involve such problems as: protecting data privacy, dealing with legacy systems, handling incomplete and delayed information, analysis of complex data, and evaluation of stream mining algorithms. The resulting analysis is illustrated by practical applications and provides general suggestions concerning lines of future research in data stream mining.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Ensemble learning for data stream analysis
TL;DR: This paper surveys research on ensembles for data stream classification as well as regression tasks and discusses advanced learning concepts such as imbalanced data streams, novelty detection, active and semi-supervised learning, complex data representations and structured outputs.
1K
Characterizing concept drift
TL;DR: This work presents the first comprehensive framework for quantitative analysis of drift, giving rise to a new comprehensive taxonomy of concept drift types and a solid foundation for research into mechanisms to detect and address concept drift.
511
Machine learning for streaming data: state of the art, challenges, and opportunities
TL;DR: Incremental learning, online learning, and data stream learning are terms commonly associated with learning algorithms that update their models given a continuous influx of data without performing any act of reinforcement learning.
261
Leveraging Machine Learning and Big Data for Smart Buildings: A Comprehensive Survey
Basheer Qolomany,Ala Al-Fuqaha,Ajay Gupta,Driss Benhaddou,Safaa Alwajidi,Junaid Qadir,Alvis Fong +6 more
TL;DR: This survey surveys the area of smart building with a special focus on the role of techniques from machine learning and big data analytics, and reviews the current trends and challenges faced in the development ofSmart building services.
References
Emergence of Scaling in Random Networks
TL;DR: A model based on these two ingredients reproduces the observed stationary scale-free distributions, which indicates that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems.
39.1K
Latent dirichlet allocation
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
•Book
Data Mining: Concepts and Techniques
Jiawei Han,Micheline Kamber,Jian Pei +2 more
- 08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
•Proceedings Article
Latent Dirichlet Allocation
David M. Blei,Andrew Y. Ng,Michael I. Jordan +2 more
- 03 Jan 2001
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
- 06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Related Papers (5)
Pedro Domingos,Geoff Hulten +1 more
- 01 Aug 2000
Albert Bifet,Ricard Gavaldà +1 more
- 01 Jan 2007
Mohamed Medhat Gaber,Arkady Zaslavsky,Shonali Krishnaswamy +2 more
- 01 Jun 2005