Clustering Time Series Data through Autoencoder-based Deep Learning Models.

Open AccessPosted Content

Clustering Time Series Data through Autoencoder-based Deep Learning Models.

- 11 Apr 2020

13

TL;DR: A novel technique is introduced to utilize the characteristics of given time series data in order to create labels and thus be able to transform the problem from unsupervised learning into supervised learning and the results show that the proposed procedure is capable of achieving 87.5\% accuracy in clustering and predicting the labels for unseen time seriesData.

Abstract: Machine learning and in particular deep learning algorithms are the emerging approaches to data analysis. These techniques have transformed traditional data mining-based analysis radically into a learning-based model in which existing data sets along with their cluster labels (i.e., train set) are learned to build a supervised learning model and predict the cluster labels of unseen data (i.e., test set). In particular, deep learning techniques are capable of capturing and learning hidden features in a given data sets and thus building a more accurate prediction model for clustering and labeling problem. However, the major problem is that time series data are often unlabeled and thus supervised learning-based deep learning algorithms cannot be directly adapted to solve the clustering problems for these special and complex types of data sets. To address this problem, this paper introduces a two-stage method for clustering time series data. First, a novel technique is introduced to utilize the characteristics (e.g., volatility) of given time series data in order to create labels and thus be able to transform the problem from unsupervised learning into supervised learning. Second, an autoencoder-based deep learning model is built to learn and model both known and hidden features of time series data along with their created labels to predict the labels of unseen time series data. The paper reports a case study in which financial and stock time series data of selected 70 stock indices are clustered into distinct groups using the introduced two-stage procedure. The results show that the proposed procedure is capable of achieving 87.5\% accuracy in clustering and predicting the labels for unseen time series data.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Fig. 11. KMeans clustering: The range of volatility and returns for Cluster ”2”

Fig. 9. KMeans clustering: The range of volatility and returns for Cluster ”0”

TABLE III NUMERICAL PREDICTION OF TIME SERIES’ CLUSTER LABELS.

Fig. 1. A synergic methodology for time series clustering.

Fig. 2. The flowchart of the introduced timer series clustering.

Citations

Journal Article•10.1145/3448074

Unsupervised Human Activity Representation Learning with Multi-task Deep Clustering

Haojie Ma, +3 more

- 29 Mar 2021

TL;DR: In this paper, an end-to-end multi-task deep clustering framework was proposed to solve the problem of unsupervised human activity recognition, which infers activities from unlabeled datasets without the need of domain knowledge.

...read moreread less

46

•Journal Article•10.1109/jiot.2023.3243391

Unsupervised Deep Learning for IoT Time Series

01 Jan 2023

- IEEE Internet of Things Journal

TL;DR: Wang et al. as discussed by the authors investigated unsupervised deep learning for IoT time series, i.e., anomaly detection and clustering, under a unified framework, and discussed the application scenarios, public datasets, existing challenges, and future research directions in this area.

...read moreread less

16

•Journal Article•10.1109/access.2023.3269693

A Survey on Dimensionality Reduction Techniques for Time-series Data

01 Jan 2023

- IEEE Access

TL;DR: In this article , the authors present twelve different dimensionality reduction algorithms that are specifically suited for working with time-series data and fall into different categories, such as supervision, linearity, time and memory complexity, hyper-parameters, and drawbacks.

...read moreread less

14

Proceedings Article•10.1109/BIGDATA50022.2020.9377825

Predicting Consequences of Cyber-Attacks

Prerit Datta, +3 more

- 10 Dec 2020

TL;DR: In this paper, the authors used machine learning and natural language processing techniques to predict the consequences of cyber-attacks and achieved an accuracy of 60% using tf-idf features and 57% using Doc2Vec method for models based on LinearSVC model.

...read moreread less

7

Journal Article•10.1016/j.asoc.2023.110409

A representation learning framework for stock movement prediction

Xuemei Li

- 01 Sep 2023

- Applied Soft Computing

TL;DR: Wang et al. as discussed by the authors presented an end-to-end stock movement prediction framework (CLSR) utilizing contrastive learning to exploit the correlation between intra-day data and enhance stock representation in order to improve the accuracy of stock prediction.

...read moreread less

6

References

Some methods for classification and analysis of multivariate observations

James B. MacQueen

- 01 Jan 1967

TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.

...read moreread less

28.1K

•Proceedings Article

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

Martin Ester, +3 more

- 02 Aug 1996

TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.

...read moreread less

20.3K

•Proceedings Article

A density-based algorithm for discovering clusters in large spatial Databases with Noise

Martin Ester, +3 more

- 01 Jan 1996

TL;DR: DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.

...read moreread less

17.8K

Journal Article•10.1016/J.PATCOG.2005.01.025

Clustering of time series data-a survey

T. Warren Liao

- 01 Nov 2005

- Pattern Recognition

TL;DR: This paper surveys and summarizes previous works that investigated the clustering of time series data in various application domains, including general-purpose clustering algorithms commonly used in time series clustering studies.

...read moreread less

2.7K

Proceedings Article•10.1145/882082.882086

A symbolic representation of time series, with implications for streaming algorithms

Jessica Lin, +3 more

- 13 Jun 2003

TL;DR: A new symbolic representation of time series is introduced that is unique in that it allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic approach that lower bound corresponding distance measuresdefined on the original series.

...read moreread less

2.1K