Interpolative decomposition

Topic Tools

Papers published on a yearly basis

Papers

Journal Article•10.1073/PNAS.0803205106•

CUR matrix decompositions for improved data analysis

[...]

Michael W. Mahoney¹, Petros Drineas²•Institutions (2)

Stanford University¹, Rensselaer Polytechnic Institute²

20 Jan 2009-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: An algorithm is presented that preferentially chooses columns and rows that exhibit high “statistical leverage” and exert a disproportionately large “influence” on the best low-rank fit of the data matrix, obtaining improved relative-error and constant-factor approximation guarantees in worst-case analysis, as opposed to the much coarser additive-error guarantees of prior work.

...read moreread less

Abstract: Principal components analysis and, more generally, the Singular Value Decomposition are fundamental data analysis tools that express a data matrix in terms of a sequence of orthogonal or uncorrelated vectors of decreasing importance. Unfortunately, being linear combinations of up to all the data points, these vectors are notoriously difficult to interpret in terms of the data and processes generating the data. In this article, we develop CUR matrix decompositions for improved data analysis. CUR decompositions are low-rank matrix decompositions that are explicitly expressed in terms of a small number of actual columns and/or actual rows of the data matrix. Because they are constructed from actual data elements, CUR decompositions are interpretable by practitioners of the field from which the data are drawn (to the extent that the original data are). We present an algorithm that preferentially chooses columns and rows that exhibit high “statistical leverage” and, thus, in a very precise statistical sense, exert a disproportionately large “influence” on the best low-rank fit of the data matrix. By selecting columns and rows in this manner, we obtain improved relative-error and constant-factor approximation guarantees in worst-case analysis, as opposed to the much coarser additive-error guarantees of prior work. In addition, since the construction involves computing quantities with a natural and widely studied statistical interpretation, we can leverage ideas from diagnostic regression analysis to employ these matrix decompositions for exploratory data analysis.

...read moreread less

1,050 citations

Journal Article•10.1073/PNAS.0709640104•

Randomized algorithms for the low-rank approximation of matrices

[...]

Edo Liberty¹, Franco Woolfe¹, Per-Gunnar Martinsson², Vladimir Rokhlin, Mark Tygert¹ - Show less +1 more•Institutions (2)

Yale University¹, University of Colorado Boulder²

18 Dec 2007-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: Two recently proposed randomized algorithms for the construction of low-rank approximations to matrices are described and shown to be considerably more efficient and reliable than the classical (deterministic) ones; they also parallelize naturally.

...read moreread less

Abstract: We describe two recently proposed randomized algorithms for the construction of low-rank approximations to matrices, and demonstrate their application (inter alia) to the evaluation of the singular value decompositions of numerically low-rank matrices. Being probabilistic, the schemes described here have a finite probability of failure; in most cases, this probability is rather negligible (10−17 is a typical value). In many situations, the new procedures are considerably more efficient and reliable than the classical (deterministic) ones; they also parallelize naturally. We present several numerical examples to illustrate the performance of the schemes.

...read moreread less

708 citations

Journal Article•10.1145/1039488.1039494•

Fast monte-carlo algorithms for finding low-rank approximations

[...]

Alan Frieze¹, Ravi Kannan², Santosh Vempala³•Institutions (3)

Carnegie Mellon University¹, Yale University², Massachusetts Institute of Technology³

01 Nov 2004-Journal of the ACM

TL;DR: An algorithm is developed that is qualitatively faster, provided the authors may sample the entries of the matrix in accordance with a natural probability distribution, and implies that in constant time, it can be determined if a given matrix of arbitrary size has a good low-rank approximation.

...read moreread less

Abstract: We consider the problem of approximating a given m × n matrix A by another matrix of specified rank k, which is smaller than m and n. The Singular Value Decomposition (SVD) can be used to find the "best" such approximation. However, it takes time polynomial in m, n which is prohibitive for some modern applications. In this article, we develop an algorithm that is qualitatively faster, provided we may sample the entries of the matrix in accordance with a natural probability distribution. In many applications, such sampling can be done efficiently. Our main result is a randomized algorithm to find the description of a matrix D* of rank at most k so that holds with probability at least 1 − δ (where v·vF is the Frobenius norm). The algorithm takes time polynomial in k,1/e, log(1/δ) only and is independent of m and n. In particular, this implies that in constant time, it can be determined if a given matrix of arbitrary size has a good low-rank approximation.

...read moreread less

657 citations

Journal Article•10.1016/S0024-3795(96)00301-1•

A Theory of Pseudoskeleton Approximations

[...]

S. A. Goreinov¹, Eugene E. Tyrtyshnikov¹, N. L. Zamarashkin¹•Institutions (1)

Russian Academy of Sciences¹

01 Aug 1997-Linear Algebra and its Applications

TL;DR: In this paper, Hong and Pan prove that it is possible to choose columns and rows of a matrix A formin a pseudoskeleton component which approximates A with B <&<& + $ n )) accuracy in the sense of the e-norm.

...read moreread less

587 citations

Journal Article•10.1137/07070471X•

Relative-Error $CUR$ Matrix Decompositions

[...]

Petros Drineas, Michael W. Mahoney¹, S. Muthukrishnan²•Institutions (2)

Yahoo!¹, Google²

01 May 2008-SIAM Journal on Matrix Analysis and Applications

TL;DR: Subspace sampling as discussed by the authors is a sampling method for low-rank matrix decompositions with relative error guarantees. But it is not known whether such a matrix decomposition exists in general.

...read moreread less

Abstract: Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the input data. In this paper, we propose and study matrix approximations that are explicitly expressed in terms of a small number of columns and/or rows of the data matrix, and thereby more amenable to interpretation in terms of the original data. Our main algorithmic results are two randomized algorithms which take as input an $m\times n$ matrix $A$ and a rank parameter $k$. In our first algorithm, $C$ is chosen, and we let $A'=CC^+A$, where $C^+$ is the Moore-Penrose generalized inverse of $C$. In our second algorithm $C$, $U$, $R$ are chosen, and we let $A'=CUR$. ($C$ and $R$ are matrices that consist of actual columns and rows, respectively, of $A$, and $U$ is a generalized inverse of their intersection.) For each algorithm, we show that with probability at least $1-\delta$, $\|A-A'\|_F\leq(1+\epsilon)\,\|A-A_k\|_F$, where $A_k$ is the “best” rank-$k$ approximation provided by truncating the SVD of $A$, and where $\|X\|_F$ is the Frobenius norm of the matrix $X$. The number of columns of $C$ and rows of $R$ is a low-degree polynomial in $k$, $1/\epsilon$, and $\log(1/\delta)$. Both the Numerical Linear Algebra community and the Theoretical Computer Science community have studied variants of these matrix decompositions over the last ten years. However, our two algorithms are the first polynomial time algorithms for such low-rank matrix approximations that come with relative-error guarantees; previously, in some cases, it was not even known whether such matrix decompositions exist. Both of our algorithms are simple and they take time of the order needed to approximately compute the top $k$ singular vectors of $A$. The technical crux of our analysis is a novel, intuitive sampling method we introduce in this paper called “subspace sampling.” In subspace sampling, the sampling probabilities depend on the Euclidean norms of the rows of the top singular vectors. This allows us to obtain provable relative-error guarantees by deconvoluting “subspace” information and “size-of-$A$” information in the input matrix. This technique is likely to be useful for other matrix approximation and data analysis problems.

...read moreread less

502 citations

...

Expand

Year	Papers
2021	9
2020	10
2019	7
2018	6
2017	7
2016	10

Topic Tools

Papers published on a yearly basis

Papers

CUR matrix decompositions for improved data analysis

Randomized algorithms for the low-rank approximation of matrices

Fast monte-carlo algorithms for finding low-rank approximations

A Theory of Pseudoskeleton Approximations

Relative-Error $CUR$ Matrix Decompositions

Related Topics (5)

Performance Metrics