Book Chapter10.1007/978-3-642-29035-0_33
Tutorial: data stream mining and its applications
Latifur Khan,Wei Fan +1 more
- 15 Apr 2012
- pp 328-329
10
TL;DR: A number of applications of stream mining will be presented such as adaptive malicious code detection, on-line malicious URL detection, evolving insider threat detection and textual stream classification.
read more
Abstract: Data streams are continuous flows of data Examples of data streams include network traffic, sensor data, call center records and so on Their sheer volume and speed pose a great challenge for the data mining community to mine them Data streams demonstrate several unique properties: infinite length, concept-drift, concept-evolution, feature-evolution and limited labeled data Concept-drift occurs in data streams when the underlying concept of data changes over time Concept-evolution occurs when new classes evolve in streams Feature-evolution occurs when feature set varies with time in data streams Data streams also suffer from scarcity of labeled data since it is not possible to manually label all the data points in the stream Each of these properties adds a challenge to data stream mining
Multi-step methodologies and techniques, and multi-scan algorithms, suitable for knowledge discovery and data mining, cannot be readily applied to data streams This is due to well-known limitations such as bounded memory, high speed data arrival, online/timely data processing, and need for one-pass techniques (ie, forgotten raw data) issues etc In spite of the success and extensive studies of stream mining techniques, there is no single tutorial dedicated to a unified study of the new challenges introduced by evolving stream data like change detection, novelty detection, and feature evolution This tutorial presents an organized picture on how to handle various data mining techniques in data streams: in particular, how to handle classification and clustering in evolving data streams by addressing these challenges The importance and significance of research in data stream mining has been manifested in most recent launch of large scale stream processing prototype in many important application areas In the same time, commercialization of streams (eg, IBM InfoSphere streams, etc) brings new challenge and research opportunities to the Data Mining (DM) community In this tutorial a number of applications of stream mining will be presented such as adaptive malicious code detection, on-line malicious URL detection, evolving insider threat detection and textual stream classification
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Incremental one-class classifier based on convex–concave hull
Javad Hamidzadeh,Mona Moradi +1 more
TL;DR: Considering time complexity, an incremental convex–concave hull classification method, called ICCHC, is proposed which can significantly reduce the computational time and expand the target class boundary and can be adapted to the gradual concept drift.
7
Context-Aware Clustering and the Optimized Whale Optimization Algorithm: An Effective Predictive Model for the Smart Grid
Prashant Ahire,Pramod Patil +1 more
TL;DR: In this paper , the authors proposed an Artificial Neural Network (ANN) predictive algorithm using the bio-inspired optimization algorithm called OWOA, which is the modified algorithm of the existing WOA to overcome the problems of slow convergence speed and easily falling into the local optimal solutions.
6
Fast and Accurate Terrain Image Classification for ASTER Remote Sensing by Data Stream Mining and Evolutionary-EAC Instance-Learning-Based Algorithm
TL;DR: In this paper, an evolutionary expand-and-contract instance-based learning algorithm (EEAC-IBL) was proposed for real-time data stream mining in remote sensing.
4
Concept Tracking and Adaptation for Drifting Data Streams under Extreme Verification Latency
Maria Arostegi,Ana I. Torre-Bastida,Jesus L. Lobo,Miren Nekane Bilbao,Javier Del Ser +4 more
- 15 Oct 2018
TL;DR: This work proposes a simple yet effective learning technique to classify non-stationary data streams under extreme verification latency by predicting the trajectory of concepts in the feature space, and is compared to a benchmark of incremental and static learning methods over a set of public non- stationary synthetic datasets.
4
Genetic programming-based regression for temporal data
Cry Kuranga,Nelishia Pillay +1 more
TL;DR: A genetic programming-based predictive model for temporal data with a numerical target that tracks changes in a dataset due to concept drift and reacts to the change by clustering the data and then inducing nonlinear models that describe generated clusters.
1