Structured Sparse Multi-Task Learning with Generalized Group Lasso

doi:10.3233/faia230333

Structured Sparse Multi-Task Learning with Generalized Group Lasso

Luhuan Fei, +2 more

pp 692-699

1

TL;DR: This paper proposes Generalized Group Lasso (GenGL) for structured sparse multi-task learning, introducing a linear operator for adaptable sparsity settings and hierarchical decomposition, and develops a novel framework (SSMTL) with efficient optimization for diverse architectures.

Abstract: Multi-task learning (MTL) improves generalization by sharing information among related tasks. Structured sparsity-inducing regularization has been widely used in MTL to learn interpretable and compact models, especially in high-dimensional settings. These methods have achieved much success in practice, however, there are still some key limitations, such as limited generalization ability due to specific sparse constraints on parameters, usually restricted in matrix form that ignores high-order feature interactions among tasks, and formulated in various forms with different optimization algorithms. Inspired by Generalized Lasso, we propose the Generalized Group Lasso (GenGL) to overcome these limitations. In GenGL, a linear operator is introduced to make it adaptable to diverse sparsity settings, and helps it to handle hierarchical sparsity and multi-component decomposition in general tensor form, leading to enhanced flexibility and expressivity. Based on GenGL, we propose a novel framework for Structured Sparse MTL (SSMTL), that unifies a number of existing MTL methods, and implement its two new variants in shallow and deep architectures, respectively. An efficient optimization algorithm is developed to solve the unified problem, and its effectiveness is validated by synthetic and real-world experiments.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.3233/faia240497

MAMO: Multi-Task Architecture Learning via Multi-Objective and Gradients Mediative Kernel

Yuzheng Tan, +2 more

- 16 Oct 2024

- Frontiers in artificial intelligence and...

TL;DR: MAMO proposes a novel multi-task architecture learning model via multi-objective optimization, addressing task interference by generating gradient mediative kernels and balancing tasks through Pareto optimal solutions, outperforming MTL baselines with effective model size.

...read moreread less

References

Preprint•10.48550/arxiv.1706.03762

Attention Is All You Need

Ashish Vaswani, +7 more

- 01 Jan 2017

Abstract: The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

...read moreread less

51.8K

Journal Article•10.1137/07070111X

Tensor Decompositions and Applications

Tamara G. Kolda, +1 more

- 01 Aug 2009

- Siam Review

TL;DR: This survey provides an overview of higher-order tensor decompositions, their applications, and available software.

...read moreread less

11.5K

Journal Article•10.1111/J.1467-9868.2005.00532.X

Model selection and estimation in regression with grouped variables

Ming Yuan, +1 more

- 01 Feb 2006

- Journal of The Royal Statistical Society...

TL;DR: In this paper, instead of selecting factors by stepwise backward elimination, the authors focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection.

...read moreread less

8.8K

•Journal Article•10.1111/J.1467-9868.2005.00490.X

Sparsity and smoothness via the fused lasso

Robert Tibshirani, +4 more

- 01 Feb 2005

- Journal of The Royal Statistical Society...

TL;DR: The fused lasso is proposed, a generalization that is designed for problems with features that can be ordered in some meaningful way, and is especially useful when the number of features p is much greater than N, the sample size.

...read moreread less

3.1K