Parallel Computing TEDA for High Frequency Streaming Data Clustering
Xiaowei Gu,Plamen Angelov,German Gutierrez,Jose Antonio Iglesias,Araceli Sanchis +4 more
- 23 Oct 2016
- pp 238-253
TL;DR: This newly proposed approach is developed within the recently introduced TEDA theory and inherits all advantages from it and is very suitable and promising for real-time high frequency streaming processing and data analytics.
read more
Abstract: In this paper, a novel online clustering approach called Parallel_TEDA is introduced for processing high frequency streaming data. This newly proposed approach is developed within the recently introduced TEDA theory and inherits all advantages from it. In the proposed approach, a number of data stream processors are involved, which collaborate with each other efficiently to achieve parallel computation as well as a much higher processing speed. A fusion center is involved to gather the key information from the processors which work on chunks of the whole data stream and generate the overall output. The quality of the generated clusters is being monitored within the data processors all the time and stale clusters are being removed to ensure the correctness and timeliness of the overall clustering results. This, in turn, gives the proposed approach a stronger ability of handling shifts/drifts that may take place in live data streams. The numerical experiments performed with the proposed new approach Parallel_TEDA on benchmark datasets present higher performance and faster processing speed when compared with the alternative well-known approaches. The processing speed has been demonstrated to fall exponentially with more data processors involved. This new online clustering approach is very suitable and promising for real-time high frequency streaming processing and data analytics.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Workload forecasting and energy state estimation in cloud data centres: ML-centric approach
TL;DR: In this paper, the authors proposed an ML-based model to predict load and energy to aid resource management decisions in cloud data centers, which is based on the Gated Recurrent Unit (GRU) algorithm.
50
Autonomous Data Density pruning fuzzy neural network for Optical Interconnection Network
Paulo Vitor de Campos Souza,Eduardo Soares,Augusto Junio Guimarães,Vanessa Souza Araujo,Vinicius Jonathan Silva Araujo,Thiago Silva Rezende +5 more
TL;DR: The proposed model uses an autonomous data density approach in a pruned fuzzy neural network, wich favours the compactness of the model.
Self-organising transparent learning system
Xiaowei Gu
- 01 Jan 2018
TL;DR: The newly proposed self-organising transparent deep learning systems are able to achieve human-level performance comparable to or even better than the deep convolutional neural networks on image classification problems with the merits of being fully transparent, self-evolving, highly efficient, parallelisable and human-interpretable.
6
Cloud-based evolving intelligent method for weather time series prediction
Eduardo Soares,Vania Corrêa Mota,Ricardo Poucas,Daniel Leite +3 more
- 09 Jul 2017
TL;DR: This paper concerns the application of a cloud-based intelligent evolving method, namely, a typicality-and-eccentricity-based method for data analysis (TEDA), to predict monthly mean temperature in different cities of Brazil.
3
References
Some methods for classification and analysis of multivariate observations
James B. MacQueen
- 01 Jan 1967
TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
•Proceedings Article
A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise
Martin Ester,Hans-Peter Kriegel,Jörg Sander,Xiaowei Xu +3 more
- 02 Aug 1996
TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.
20.3K
•Proceedings Article
A density-based algorithm for discovering clusters in large spatial Databases with Noise
Martin Ester,Hans-Peter Kriegel,Jörg Sander,Xiaowei Xu +3 more
- 01 Jan 1996
TL;DR: DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.
FCM: The fuzzy c-means clustering algorithm
TL;DR: A FORTRAN-IV coding of the fuzzy c -means (FCM) clustering program is transmitted, which generates fuzzy partitions and prototypes for any set of numerical data.
6.4K
Hierarchical clustering schemes
TL;DR: A useful correspondence is developed between any hierarchical system of such clusters, and a particular type of distance measure, that gives rise to two methods of clustering that are computationally rapid and invariant under monotonic transformations of the data.
5.1K