Open AccessPosted Content
Clustering Time Series Data through Autoencoder-based Deep Learning Models.
TL;DR: A novel technique is introduced to utilize the characteristics of given time series data in order to create labels and thus be able to transform the problem from unsupervised learning into supervised learning and the results show that the proposed procedure is capable of achieving 87.5\% accuracy in clustering and predicting the labels for unseen time seriesData.
read more
Abstract: Machine learning and in particular deep learning algorithms are the emerging approaches to data analysis. These techniques have transformed traditional data mining-based analysis radically into a learning-based model in which existing data sets along with their cluster labels (i.e., train set) are learned to build a supervised learning model and predict the cluster labels of unseen data (i.e., test set). In particular, deep learning techniques are capable of capturing and learning hidden features in a given data sets and thus building a more accurate prediction model for clustering and labeling problem. However, the major problem is that time series data are often unlabeled and thus supervised learning-based deep learning algorithms cannot be directly adapted to solve the clustering problems for these special and complex types of data sets. To address this problem, this paper introduces a two-stage method for clustering time series data. First, a novel technique is introduced to utilize the characteristics (e.g., volatility) of given time series data in order to create labels and thus be able to transform the problem from unsupervised learning into supervised learning. Second, an autoencoder-based deep learning model is built to learn and model both known and hidden features of time series data along with their created labels to predict the labels of unseen time series data. The paper reports a case study in which financial and stock time series data of selected 70 stock indices are clustered into distinct groups using the introduced two-stage procedure. The results show that the proposed procedure is capable of achieving 87.5\% accuracy in clustering and predicting the labels for unseen time series data.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Fig. 11. KMeans clustering: The range of volatility and returns for Cluster ”2” 
Fig. 9. KMeans clustering: The range of volatility and returns for Cluster ”0” 
TABLE III NUMERICAL PREDICTION OF TIME SERIES’ CLUSTER LABELS. 
Fig. 13. Loss vs. epochs. 
Fig. 1. A synergic methodology for time series clustering. 
Fig. 2. The flowchart of the introduced timer series clustering.
Citations
Unsupervised Human Activity Representation Learning with Multi-task Deep Clustering
Haojie Ma,Zhijie Zhang,Wenzhong Li,Sanglu Lu +3 more
- 29 Mar 2021
TL;DR: In this paper, an end-to-end multi-task deep clustering framework was proposed to solve the problem of unsupervised human activity recognition, which infers activities from unlabeled datasets without the need of domain knowledge.
46
Unsupervised Deep Learning for IoT Time Series
01 Jan 2023
TL;DR: Wang et al. as discussed by the authors investigated unsupervised deep learning for IoT time series, i.e., anomaly detection and clustering, under a unified framework, and discussed the application scenarios, public datasets, existing challenges, and future research directions in this area.
16
A Survey on Dimensionality Reduction Techniques for Time-series Data
01 Jan 2023
TL;DR: In this article , the authors present twelve different dimensionality reduction algorithms that are specifically suited for working with time-series data and fall into different categories, such as supervision, linearity, time and memory complexity, hyper-parameters, and drawbacks.
14
Predicting Consequences of Cyber-Attacks
Prerit Datta,Natalie R. Lodinger,Akbar Siami Namin,Keith S. Jones +3 more
- 10 Dec 2020
TL;DR: In this paper, the authors used machine learning and natural language processing techniques to predict the consequences of cyber-attacks and achieved an accuracy of 60% using tf-idf features and 57% using Doc2Vec method for models based on LinearSVC model.
7
A representation learning framework for stock movement prediction
TL;DR: Wang et al. as discussed by the authors presented an end-to-end stock movement prediction framework (CLSR) utilizing contrastive learning to exploit the correlation between intra-day data and enhance stock representation in order to improve the accuracy of stock prediction.
6
References
Some methods for classification and analysis of multivariate observations
James B. MacQueen
- 01 Jan 1967
TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
•Proceedings Article
A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise
Martin Ester,Hans-Peter Kriegel,Jörg Sander,Xiaowei Xu +3 more
- 02 Aug 1996
TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.
20.3K
•Proceedings Article
A density-based algorithm for discovering clusters in large spatial Databases with Noise
Martin Ester,Hans-Peter Kriegel,Jörg Sander,Xiaowei Xu +3 more
- 01 Jan 1996
TL;DR: DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.
Clustering of time series data-a survey
TL;DR: This paper surveys and summarizes previous works that investigated the clustering of time series data in various application domains, including general-purpose clustering algorithms commonly used in time series clustering studies.
2.7K
A symbolic representation of time series, with implications for streaming algorithms
Jessica Lin,Eamonn Keogh,Stefano Lonardi,Bill Chiu +3 more
- 13 Jun 2003
TL;DR: A new symbolic representation of time series is introduced that is unique in that it allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic approach that lower bound corresponding distance measuresdefined on the original series.