Linear Transformations and the k-Means Clustering Algorithm: Applications to Clustering Curves
TL;DR: It is shown that clustering the raw data will often give results similar to clustering regression coefficients obtained using an orthogonal design matrix, and a suitable linear transformation of the regression coefficients is achieved.
read more
Abstract: Functional data can be clustered by plugging estimated regression coefficients from individual curves into the k-means algorithm. Clustering results can differ depending on how the curves are fit to the data. Estimating curves using different sets of basis functions corresponds to different linear transformations of the data. k-means clustering is not invariant to linear transformations of the data. The optimal linear transformation for clustering will stretch the distribution so that the primary direction of variability aligns with actual differences in the clusters. It is shown that clustering the raw data will often give results similar to clustering regression coefficients obtained using an orthogonal design matrix. Clustering functional data using an L2 metric on function space can be achieved by clustering a suitable linear transformation of the regression coefficients. An example where depressed individuals are treated with an antidepressant is used for illustration.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A partial overview of the theory of statistics with functional data
TL;DR: The theory and practice of statistical methods in situations where the available data are functions (instead of real numbers or vectors) is often referred to as functional data analysis (FDA) as discussed by the authors.
413
A Comparison of Hierarchical Methods for Clustering Functional Data
TL;DR: A simulation study compares the performance of four major hierarchical methods for clustering functional data and yields concrete suggestions to future researchers to determine the best method for clustered their functional data.
KmL: k-means for longitudinal data
TL;DR: KmL is a new implementation of k-means designed to work specifically on longitudinal data that gives much better results on non-polynomial trajectories and is compared to Proc Traj both on artificial and real data.
A survey of functional principal component analysis
TL;DR: A review of functional principal component analysis, and its use in explanatory analysis, modeling and forecasting, and classification of functional data is provided in this article from both methodological and practical viewpoints.
Optical types of inland and coastal waters
Evangelos Spyrakos,Ruth O'Donnell,Peter D. Hunter,Claire Miller,Marian Scott,Stefan G. H. Simis,Claire Neil,Claudio Clemente Faria Barbosa,Caren Binding,Shane Bradt,Mariano Bresciani,Giorgio Dall'Olmo,Claudia Giardino,Anatoly A. Gitelson,Tiit Kutser,Lin Li,Bunkei Matsushita,Victor Martinez-Vicente,Mark W. Matthews,Igor Ogashawara,Antonio Ruiz-Verdú,John F. Schalles,Emma Tebbs,Yunlin Zhang,Andrew N. Tyler +24 more
TL;DR: In this article, a comprehensive dataset from more than 250 aquatic systems, representing a wide range of conditions, was analyzed in order to develop a typology of optical water types (OWTs) for inland and coastal waters.
References
Some methods for classification and analysis of multivariate observations
James B. MacQueen
- 01 Jan 1967
TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
•Book
A practical guide to splines
Carl de Boor
- 01 Jan 1978
TL;DR: This book presents those parts of the theory which are especially useful in calculations and stresses the representation of splines as linear combinations of B-splines as well as specific approximation methods, interpolation, smoothing and least-squares approximation, the solution of an ordinary differential equation by collocation, curve fitting, and surface fitting.
A Practical Guide to Splines.
TL;DR: This book is based on the author's experience with calculations involving polynomial splines.
7K