Sparse Learning Under Regularization Framework

Open AccessBook

Sparse Learning Under Regularization Framework

- 15 Apr 2011

14

TL;DR: This thesis develops a novel online learning framework to solve group lasso and multi-task feature selection and proposes a generalized MKL (GMKL) model by introducing an elastic net-type constraint on the kernel weights to seek the optimal kernel combination weights.

Abstract: Regularization is a dominant theme in machine learning and statistics due to its prominent ability in providing an intuitive and principled tool for learning from high-dimensional data. As large-scale learning applications become popular, developing efficient algorithms and parsimonious models become promising and necessary for these applications. Aiming at solving large-scale learning problems, this thesis tackles the key research problems ranging from feature selection to learning with unlabeled data and learning data similarity representation. More specifically, we focus on the problems in three areas: online learning, semi-supervised learning, and multiple kernel learning. The first part of this thesis develops a novel online learning framework to solve group lasso and multi-task feature selection. To the best our knowledge, the proposed online learning framework is the first framework for the corresponding models. The main advantages of the online learning algorithms are that (1) they can work on the applications where training data appear sequentially; consequently, the training procedure can be started at any time; (2) they can handle data up to any size with any number of features. The efficiency of the algorithms is attained because we derive closed-form solutions to update the weights of the corresponding models. At each iteration, the online learning algorithms just need O (d) time complexity and memory cost for group lasso, while they need O (d × Q) for multi-task feature selection, where d is the number of dimensions and Q is the number of tasks. Moreover, we provide theoretical analysis for the average regret of the online learning algorithms, which also guarantees the convergence rate of the algorithms. In addition, we extend the online learning framework to solve several related models which yield more sparse solutions. The second part of this thesis addresses a general scenario of semi-supervised learning for the binary classification problern, where the unlabeled data may be a mixture of relevant and irrelevant data to the target binary classification task. Without specifying the relatedness in the unlabeled data, we develop a novel maximum margin classifier, named the tri-class support vector machine (3C-SVM), to seek an inductive rule that can separate these data into three categories: –1, +1, or 0. This is achieved by adopting a novel min loss function and following the maximum entropy principle. For the implementation, we approximate the problem and solve it by a standard concaveconvex procedure (CCCP). The approach is very efficient and it is possible to solve large-scale datasets. The third part of this thesis focuses on multiple kernel learning (MKL) to solve the insufficiency of the L1-MKL and the Lp-MKL models. Hence, we propose a generalized MKL (GMKL) model by introducing an elastic net-type constraint on the kernel weights. More specifically, it is an MKL model with a constraint on a linear combination of the L1-norm and the square of the L2-norm on the kernel weights to seek the optimal kernel combination weights. Therefore, previous MKL problems based on the L1-norm or the L2-norm constraints can be regarded as its special cases. Moreover, our GMKL enjoys the favorable sparsity property on the solution and also facilitates the grouping effect. In addition, the optimization of our GMKL is a convex optimization problem, where a local solution is the globally optimal solution. We further derive the level method to efficiently solve the optimization problem.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1145/242224.242229

Machine learning

Thomas G. Dietterich

- 01 Dec 1996

- ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

14K

•Proceedings Article

Robot vision

Y.J. Tejwani

- 01 Jan 1989

TL;DR: A scheme is developed for classifying the types of motion perceived by a humanlike robot and equations, theorems, concepts, clues, etc., relating the objects, their positions, and their motion to their images on the focal plane are presented.

...read moreread less

2K

•Proceedings Article

Fused matrix factorization with geographical and social influence in location-based social networks

Chen Cheng, +3 more

- 22 Jul 2012

TL;DR: This paper is the first to fuse MF with geographical and social influence for POI recommendation in LBSNs via modeling the probability of a user's check-in on a location as a Multicenter Gaussian Model (MGM) and fuse the geographical influence into a generalized matrix factorization framework.

...read moreread less

661

Solution of ill-posed problems by means of truncated SVD (Singular Value Decomposition)

Per Christian Hansen

- 01 Jun 1988

TL;DR: In this article, the authors considered truncated SVD (TSVD) solutions to ill-posed least squares problems involving matrices with well-determined as well as ill-defined numerical rank.

...read moreread less

62

Journal Article•10.1109/TNNLS.2016.2610465

Online Nonlinear AUC Maximization for Imbalanced Data Sets

Junjie Hu, +4 more

- 01 Apr 2018

- IEEE Transactions on Neural Networks

TL;DR: The kernelized online imbalanced learning (KOIL) algorithm is proposed, which produces a nonlinear classifier for the data by maximizing the AUC score while minimizing a functional regularizer.

...read moreread less

58

References

Journal Article•10.1109/5.726791

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

- 01 Jan 1998

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

53.5K

Journal Article•10.1145/1961189.1961199

LIBSVM: A library for support vector machines

Chih-Chung Chang, +1 more

- 06 May 2011

- ACM Transactions on Intelligent Systems ...

TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

...read moreread less

46.3K

•Book

The Nature of Statistical Learning Theory

Vladimir Vapnik

- 01 Jan 1995

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?

...read moreread less

46K

Journal Article•10.1111/J.2517-6161.1996.TB02080.X

Regression Shrinkage and Selection via the Lasso

Robert Tibshirani

- 01 Jan 1996

- Journal of the royal statistical society...

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.

...read moreread less

45.4K

Journal Article•10.2307/2532419

Applied Logistic Regression.

A. J. Scott, +2 more

- 01 Dec 1991

- Biometrics

TL;DR: Applied Logistic Regression, Third Edition provides an easily accessible introduction to the logistic regression model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables.

...read moreread less

40.1K