Sparse solutions for linear prediction problems

Open Access

Sparse solutions for linear prediction problems

- 01 Jan 2006

37

TL;DR: This thesis is concerned with finding linear identities among time series, and asking how to bound the generalization error by using sparse vectors as hypotheses in the machine learning versions of these problems.

Abstract: The simplicity of an idea has long been regarded as a sign of elegance and, when shown to coincide with accuracy, a hallmark of profundity. In this thesis our ideas are vectors used as predictors, and sparsity is our measure of simplicity. A vector is sparse when it has few nonzero elements. We begin by asking the question: given a matrix of n time series (vectors which evolve in a "sliding" manner over time) as columns, what are the simplest linear identities among them? Under basic learning assumptions, we justify that such simple identities are likely to persist in the future. It is easily seen that our question is akin to finding sparse vectors in the null space of this matrix. Hence we are confronted with the problem of finding an optimally sparse basis for any vector space. This is a computationally challenging problem with many promising applications, such as iterative numerical optimization, fast dimensionality reduction, graph algorithms on cycle spaces, and of course the time series work of this thesis. In part I, we give a brief exposition of the questions to be addressed here: finding linear identities among time series, and asking how we may bound the generalization error by using sparse vectors as hypotheses in the machine learning versions of these problems. In part II, we focus on the theoretical justification for maximizing sparsity as a means of learning or prediction. We'll look at sample compression schemes as a means of correlating sparsity with the capacity of a hypothesis set, as well as examining learning error bounds which support sparsity. Finally, in part III, we'll illustrate an increasingly sophisticated toolkit of incremental algorithms for discovering sparse patterns among evolving time series.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Proceedings Article•10.1145/1553374.1553417

Gradient descent with sparsification: an iterative algorithm for sparse recovery with restricted isometry property

Rahul Garg, +1 more

- 14 Jun 2009

TL;DR: The Matlab implementation of GraDeS (Gradient Descent with Sparsification) outperforms previously proposed algorithms like Subspace Pursuit, StOMP, OMP, and Lasso by an order of magnitude and uncovered cases where L1-regularized regression (Lasso) fails but GraDeS finds the correct solution.

...read moreread less

248

Journal Article•10.1088/1361-6579/AA7623

Combining sparse coding and time-domain features for heart sound classification

Bradley M. Whitaker, +5 more

- 31 Jul 2017

- Physiological Measurement

TL;DR: The results show that sparse coding is an effective way to define spectral features of the cardiac cycle and its sub-cycles for the purpose of classification and can be combined with additional feature extraction methods to improve classification accuracy.

...read moreread less

112

•Journal Article•10.5555/2188385.2343686

A geometric approach to sample compression

Benjamin I. P. Rubinstein, +1 more

- 01 Jan 2012

- Journal of Machine Learning Research

TL;DR: The sample compression conjecture of Littlestone & Warmuth has remained unsolved for a quarter century as mentioned in this paper, and two promising ways forward are: embedding maximal classes into maximum classes with at most a polynomial increase to VC dimension, and compression via operating on geometric representations.

...read moreread less

53

•Posted Content

Supersparse Linear Integer Models for Interpretable Classification

Berk Ustun, +2 more

- 27 Jun 2013

- arXiv: Machine Learning

TL;DR: An off-the-shelf tool to create scoring systems that both accurate and interpretable, known as a Supersparse Linear Integer Model (SLIM), which is a discrete optimization problem that minimizes the 0-1 loss to encourage a high level of accuracy.

...read moreread less

44

•Journal Article•10.1007/S00453-015-0042-6

Matrix Sparsification and the Sparse Null Space Problem

Lee-Ad Gottlieb, +1 more

- 01 Oct 2016

- Algorithmica

TL;DR: In this paper, the authors revisited the matrix problems of sparse null space and matrix sparsification, and showed that they are equivalent, and gave a powerful tool to extend algorithms and heuristics for sparse approximation theory to these problems.

...read moreread less

38

...

Expand

References

Statistical learning theory

Vladimir Vapnik

- 01 Jan 1998

TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

...read moreread less

30.4K

•Journal Article•10.1023/B:STCO.0000035301.49549.88

A tutorial on support vector regression

Alexander J. Smola, +1 more

- 01 Aug 2004

- Statistics and Computing

TL;DR: This tutorial gives an overview of the basic ideas underlying Support Vector (SV) machines for function estimation, and includes a summary of currently used algorithms for training SV machines, covering both the quadratic programming part and advanced methods for dealing with large datasets.

...read moreread less

13K

Journal Article•10.1137/S1064827596304010

Atomic Decomposition by Basis Pursuit

Scott Chen, +2 more

- 11 Dec 1998

- SIAM Journal on Scientific Computing

TL;DR: Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions.

...read moreread less

11.3K

•Journal Article•10.1109/TIT.2005.858979

Decoding by linear programming

Emmanuel J. Candès, +1 more

- 01 Dec 2005

- IEEE Transactions on Information Theory

TL;DR: F can be recovered exactly by solving a simple convex optimization problem (which one can recast as a linear program) and numerical experiments suggest that this recovery procedure works unreasonably well; f is recovered exactly even in situations where a significant fraction of the output is corrupted.

...read moreread less

7.8K

•Posted Content

Decoding by Linear Programming

Emmanuel J. Candès, +1 more

- 15 Feb 2005

- arXiv: Metric Geometry

TL;DR: In this paper, it was shown that under suitable conditions on the coding matrix, the input vector can be recovered exactly by solving a simple convex optimization problem (which one can recast as a linear program).

...read moreread less

6.8K