Metric multidimensional scaling for large single-cell datasets using neural networks

doi:10.1186/s13015-024-00265-3

Journal Article10.1186/s13015-024-00265-3

Metric multidimensional scaling for large single-cell datasets using neural networks

Stefan Canzar, +5 more

- 11 Jun 2024

- Algorithms for Molecular Biology

- Vol. 19, Iss: 1

1

TL;DR: Metric multidimensional scaling for large single-cell datasets using neural networks efficiently scales to large datasets and provides a non-linear embedding.

Abstract: Abstract Metric multidimensional scaling is one of the classical methods for embedding data into low-dimensional Euclidean space. It creates the low-dimensional embedding by approximately preserving the pairwise distances between the input points. However, current state-of-the-art approaches only scale to a few thousand data points. For larger data sets such as those occurring in single-cell RNA sequencing experiments, the running time becomes prohibitively large and thus alternative methods such as PCA are widely used instead. Here, we propose a simple neural network-based approach for solving the metric multidimensional scaling problem that is orders of magnitude faster than previous state-of-the-art approaches, and hence scales to data sets with up to a few million cells. At the same time, it provides a non-linear mapping between high- and low-dimensional space that can place previously unseen cells in the same embedding.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Table 1 Running times of SMACOF and our neural network based approach on data sets of different size

Table 2 Average scores of ARI and NMI metrics across 5 real data sets

Fig. 1 We use an open box example (a) in order to illustrate the power of nonlinear mapping, such as metric MDS (d), over the linear mapping, such as PCA (b) and projected metric MDS (c)

Fig. 2 The loss of the metric MDS problem for different values of the target dimension for train and test data sets. The loss function is displayed on a logarithmic scale. Due to its quadratic running time, SMACOF was run only on the smallest USPS data set

Fig. 3 Comparison of the loss function of the metric MDS problem for random projections (RP), PCA, and our neural network (NN) approach

Citations

Journal Article•10.3389/fbinf.2023.1211819

Orthogonal outlier detection and dimension estimation for improved MDS embedding of biological datasets

Wanxin Li, +5 more

- 10 Aug 2023

- Frontiers in bioinformatics

TL;DR: DeCOr-MDS effectively detects and removes orthogonal outliers from biological datasets, improving the accuracy of MDS embedding.

...read moreread less

References

Journal Article•10.1126/SCIENCE.1127647

Reducing the Dimensionality of Data with Neural Networks

Geoffrey E. Hinton, +1 more

- 28 Jul 2006

- Science

TL;DR: In this article, an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data is described.

...read moreread less

20.9K

Journal Article•10.1126/SCIENCE.290.5500.2319

A global geometric framework for nonlinear dimensionality reduction.

Joshua B. Tenenbaum, +2 more

- 22 Dec 2000

- Science

TL;DR: An approach to solving dimensionality reduction problems that uses easily measured local metric information to learn the underlying global geometry of a data set and efficiently computes a globally optimal solution, and is guaranteed to converge asymptotically to the true structure.

...read moreread less

15.9K

Journal Article•10.1007/BF02551274

Approximation by superpositions of a sigmoidal function

George Cybenko

- 01 Dec 1989

- Mathematics of Control, Signals, and Sys...

TL;DR: It is demonstrated that finite linear combinations of compositions of a fixed, univariate function and a set of affine functionals can uniformly approximate any continuous function ofn real variables with support in the unit hypercube.

...read moreread less

14.4K

•Journal Article•10.1038/NBT.4096

Integrating single-cell transcriptomic data across different conditions, technologies, and species.

Andrew Butler, +4 more

- 02 Apr 2018

- Nature Biotechnology

TL;DR: An analytical strategy for integrating scRNA-seq data sets based on common sources of variation is introduced, enabling the identification of shared populations across data sets and downstream comparative analysis.

...read moreread less

11.9K

Journal Article•10.1007/BF02289565

Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis

Joseph B. Kruskal

- 01 Mar 1964

- Psychometrika

TL;DR: The fundamental hypothesis is that dissimilarities and distances are monotonically related, and a quantitative, intuitively satisfying measure of goodness of fit is defined to this hypothesis.

...read moreread less

7.6K

...

Expand