Journal Article10.1186/s13015-024-00265-3
Metric multidimensional scaling for large single-cell datasets using neural networks
Stefan Canzar,Van Hoan,Slobodan Jelić,Sӧren Laue,Domagoj Matijević,Tomislav Prusina +5 more
TL;DR: Metric multidimensional scaling for large single-cell datasets using neural networks efficiently scales to large datasets and provides a non-linear embedding.
read more
Abstract: Abstract Metric multidimensional scaling is one of the classical methods for embedding data into low-dimensional Euclidean space. It creates the low-dimensional embedding by approximately preserving the pairwise distances between the input points. However, current state-of-the-art approaches only scale to a few thousand data points. For larger data sets such as those occurring in single-cell RNA sequencing experiments, the running time becomes prohibitively large and thus alternative methods such as PCA are widely used instead. Here, we propose a simple neural network-based approach for solving the metric multidimensional scaling problem that is orders of magnitude faster than previous state-of-the-art approaches, and hence scales to data sets with up to a few million cells. At the same time, it provides a non-linear mapping between high- and low-dimensional space that can place previously unseen cells in the same embedding.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Table 1 Running times of SMACOF and our neural network based approach on data sets of different size 
Table 2 Average scores of ARI and NMI metrics across 5 real data sets 
Fig. 1 We use an open box example (a) in order to illustrate the power of nonlinear mapping, such as metric MDS (d), over the linear mapping, such as PCA (b) and projected metric MDS (c) 
Fig. 2 The loss of the metric MDS problem for different values of the target dimension for train and test data sets. The loss function is displayed on a logarithmic scale. Due to its quadratic running time, SMACOF was run only on the smallest USPS data set 
Fig. 3 Comparison of the loss function of the metric MDS problem for random projections (RP), PCA, and our neural network (NN) approach 
Fig. 4 Comparison of metric MDS and PCA
Citations
Orthogonal outlier detection and dimension estimation for improved MDS embedding of biological datasets
Wanxin Li,Jules Mirone,Ashok Prasad,Nina Miolane,Carine Legrand,Khanh Dao Duc +5 more
TL;DR: DeCOr-MDS effectively detects and removes orthogonal outliers from biological datasets, improving the accuracy of MDS embedding.
References
Reducing the Dimensionality of Data with Neural Networks
TL;DR: In this article, an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data is described.
A global geometric framework for nonlinear dimensionality reduction.
TL;DR: An approach to solving dimensionality reduction problems that uses easily measured local metric information to learn the underlying global geometry of a data set and efficiently computes a globally optimal solution, and is guaranteed to converge asymptotically to the true structure.
Approximation by superpositions of a sigmoidal function
TL;DR: It is demonstrated that finite linear combinations of compositions of a fixed, univariate function and a set of affine functionals can uniformly approximate any continuous function ofn real variables with support in the unit hypercube.
Integrating single-cell transcriptomic data across different conditions, technologies, and species.
TL;DR: An analytical strategy for integrating scRNA-seq data sets based on common sources of variation is introduced, enabling the identification of shared populations across data sets and downstream comparative analysis.
Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis
TL;DR: The fundamental hypothesis is that dissimilarities and distances are monotonically related, and a quantitative, intuitively satisfying measure of goodness of fit is defined to this hypothesis.
7.6K