A survey on data preprocessing for data stream mining

doi:10.1016/J.NEUCOM.2017.01.078

Journal Article10.1016/J.NEUCOM.2017.01.078

A survey on data preprocessing for data stream mining

Sergio Ramrez-Gallego, +4 more

- 24 May 2017

- Neurocomputing

- Vol. 239, pp 39-57

474

TL;DR: This survey summarizes, categorize and analyze those contributions on data preprocessing that cope with streaming data, and takes into account the existing relationships between the different families of methods (feature and instance selection, and discretization).

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1613/JAIR.1.11192

SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary

Alberto Fernández, +3 more

- 01 Jan 2018

- Journal of Artificial Intelligence Resea...

TL;DR: The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered "de facto" standard in the framework of learning from imbalanced data because of its simplicity in the design, as well as its robustness when applied to different type of problems.

...read moreread less

1.6K

•Journal Article•10.1109/TKDE.2018.2876857

Learning under Concept Drift: A Review

Jie Lu, +5 more

- 01 Dec 2019

- IEEE Transactions on Knowledge and Data ...

TL;DR: A high quality, instructive review of current research developments and trends in the concept drift field is conducted, and a framework of learning under concept drift is established including three main components: concept drift detection, concept drift understanding, and concept drift adaptation.

...read moreread less

995

•Journal Article•10.1109/TKDE.2018.2876857

Learning under Concept Drift: A Review

Jie Lu, +5 more

- 13 Apr 2020

- arXiv: Learning

TL;DR: In this paper, the authors present a review of the recent research in the field of concept drift and propose a framework of learning under concept drift. But, the focus of this survey is on the detection, understanding and adaptation of the concept drift in streaming data.

...read moreread less

752

Journal Article•10.1145/3373464.3373470

Machine learning for streaming data: state of the art, challenges, and opportunities

Heitor Murilo Gomes, +4 more

- 26 Nov 2019

- Sigkdd Explorations

TL;DR: Incremental learning, online learning, and data stream learning are terms commonly associated with learning algorithms that update their models given a continuous influx of data without performing any act of reinforcement learning.

...read moreread less

261

•Journal Article•10.1016/J.NEUNET.2019.09.004

Spiking Neural Networks and online learning: An overview and perspectives

Jesus L. Lobo, +3 more

- 01 Jan 2020

- Neural Networks

TL;DR: In this article, the authors present a comprehensive overview of the use of Spiking Neural Networks for online learning in non-stationary data streams and propose a new algorithm to adapt to these changes as fast as possible, while maintaining good performance scores.

...read moreread less

250

...

Expand

References

•Book

Principal Component Analysis

Ian T. Jolliffe

- 01 May 1986

TL;DR: In this article, the authors present a graphical representation of data using Principal Component Analysis (PCA) for time series and other non-independent data, as well as a generalization and adaptation of principal component analysis.

...read moreread less

17.7K

•Journal Article•10.1109/TIT.1967.1053964

Nearest neighbor pattern classification

Thomas M. Cover, +1 more

- 01 Jan 1967

- IEEE Transactions on Information Theory

TL;DR: The nearest neighbor decision rule assigns to an unclassified sample point the classification of the nearest of a set of previously classified points, so it may be said that half the classification information in an infinite sample set is contained in the nearest neighbor.

...read moreread less

15.2K

Journal Article•10.1007/b98835

Principal Component Analysis

I. Jolliffe

- 01 Oct 2002

TL;DR: This chapter discusses the properties of Population Principal Components, and the role of Principal Components in Regression Analysis, and discusses generalizations and Adaptations of Principal Component Analysis.

...read moreread less

8.6K

•Book Chapter•10.1007/978-3-319-58347-1_10

Domain-adversarial training of neural networks

Yaroslav Ganin, +7 more

- 01 Jan 2016

- Journal of Machine Learning Research

TL;DR: In this article, a new representation learning approach for domain adaptation is proposed, in which data at training and test time come from similar but different distributions, and features that cannot discriminate between the training (source) and test (target) domains are used to promote the emergence of features that are discriminative for the main learning task on the source domain.

...read moreread less

7.7K

Domain-Adversarial Training of Neural Networks.

Yaroslav Ganin, +7 more

- 01 Jan 2017

TL;DR: A new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions, which can be achieved in almost any feed-forward model by augmenting it with few standard layers and a new gradient reversal layer.

...read moreread less

5.6K

...

Expand

A survey on data preprocessing for data stream mining

Chat with Paper

AI Agents for this Paper

Citations

SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary

Learning under Concept Drift: A Review

Learning under Concept Drift: A Review

Machine learning for streaming data: state of the art, challenges, and opportunities

Spiking Neural Networks and online learning: An overview and perspectives

References

Principal Component Analysis

Nearest neighbor pattern classification

Principal Component Analysis

Domain-adversarial training of neural networks

Domain-Adversarial Training of Neural Networks.

Related Papers (5)

A survey on concept drift adaptation

Ensemble learning for data stream analysis

Mining time-changing data streams

Learning from Time-Changing Data with Adaptive Windowing

Mining high-speed data streams