Parallel Computing TEDA for High Frequency Streaming Data Clustering

doi:10.1007/978-3-319-47898-2_25

Open AccessBook Chapter10.1007/978-3-319-47898-2_25

Parallel Computing TEDA for High Frequency Streaming Data Clustering

Xiaowei Gu, +4 more

- 23 Oct 2016

- pp 238-253

12

TL;DR: This newly proposed approach is developed within the recently introduced TEDA theory and inherits all advantages from it and is very suitable and promising for real-time high frequency streaming processing and data analytics.

Abstract: In this paper, a novel online clustering approach called Parallel_TEDA is introduced for processing high frequency streaming data. This newly proposed approach is developed within the recently introduced TEDA theory and inherits all advantages from it. In the proposed approach, a number of data stream processors are involved, which collaborate with each other efficiently to achieve parallel computation as well as a much higher processing speed. A fusion center is involved to gather the key information from the processors which work on chunks of the whole data stream and generate the overall output. The quality of the generated clusters is being monitored within the data processors all the time and stale clusters are being removed to ensure the correctness and timeliness of the overall clustering results. This, in turn, gives the proposed approach a stronger ability of handling shifts/drifts that may take place in live data streams. The numerical experiments performed with the proposed new approach Parallel_TEDA on benchmark datasets present higher performance and faster processing speed when compared with the alternative well-known approaches. The processing speed has been demonstrated to fall exponentially with more data processors involved. This new online clustering approach is very suitable and promising for real-time high frequency streaming processing and data analytics.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1016/J.FUTURE.2021.10.019

Workload forecasting and energy state estimation in cloud data centres: ML-centric approach

Tahseen Khan, +4 more

- 01 Mar 2022

- Future Generation Computer Systems

TL;DR: In this paper, the authors proposed an ML-based model to predict load and energy to aid resource management decisions in cloud data centers, which is based on the Gated Recurrent Unit (GRU) algorithm.

...read moreread less

50

Book•10.1007/978-3-319-47898-2

Advances in Big Data

Plamen Angelov, +4 more

- 01 Jan 2017

9

•Journal Article•10.1007/S12530-020-09336-3

Autonomous Data Density pruning fuzzy neural network for Optical Interconnection Network

Paulo Vitor de Campos Souza, +5 more

- 01 Dec 2021

- Evolving Systems

TL;DR: The proposed model uses an autonomous data density approach in a pruned fuzzy neural network, wich favours the compactness of the model.

...read moreread less

8

•Dissertation•10.17635/LANCASTER/THESIS/407

Self-organising transparent learning system

Xiaowei Gu

- 01 Jan 2018

TL;DR: The newly proposed self-organising transparent deep learning systems are able to achieve human-level performance comparable to or even better than the deep convolutional neural networks on image classification problems with the merits of being fully transparent, self-evolving, highly efficient, parallelisable and human-interpretable.

...read moreread less

6

Proceedings Article•10.1109/FUZZ-IEEE.2017.8015532

Cloud-based evolving intelligent method for weather time series prediction

Eduardo Soares, +3 more

- 09 Jul 2017

TL;DR: This paper concerns the application of a cloud-based intelligent evolving method, namely, a typicality-and-eccentricity-based method for data analysis (TEDA), to predict monthly mean temperature in different cities of Brazil.

...read moreread less

3

References

Some methods for classification and analysis of multivariate observations

James B. MacQueen

- 01 Jan 1967

TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.

...read moreread less

28.1K

•Proceedings Article

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

Martin Ester, +3 more

- 02 Aug 1996

TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.

...read moreread less

20.3K

•Proceedings Article

A density-based algorithm for discovering clusters in large spatial Databases with Noise

Martin Ester, +3 more

- 01 Jan 1996

TL;DR: DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.

...read moreread less

17.8K

Journal Article•10.1016/0098-3004(84)90020-7

FCM: The fuzzy c-means clustering algorithm

James C. Bezdek, +2 more

- 01 Jan 1984

- Computers & Geosciences

TL;DR: A FORTRAN-IV coding of the fuzzy c -means (FCM) clustering program is transmitted, which generates fuzzy partitions and prototypes for any set of numerical data.

...read moreread less

6.4K

Journal Article•10.1007/BF02289588

Hierarchical clustering schemes

S. C. Johnson

- 01 Sep 1967

- Psychometrika

TL;DR: A useful correspondence is developed between any hierarchical system of such clusters, and a particular type of distance measure, that gives rise to two methods of clustering that are computationally rapid and invariant under monotonic transformations of the data.

...read moreread less

5.1K