Journal Article10.3233/faia230333
Structured Sparse Multi-Task Learning with Generalized Group Lasso
Luhuan Fei,Mineichi Kudo,Keigo Kimura +2 more
pp 692-699
1
TL;DR: This paper proposes Generalized Group Lasso (GenGL) for structured sparse multi-task learning, introducing a linear operator for adaptable sparsity settings and hierarchical decomposition, and develops a novel framework (SSMTL) with efficient optimization for diverse architectures.
read more
Abstract: Multi-task learning (MTL) improves generalization by sharing information among related tasks. Structured sparsity-inducing regularization has been widely used in MTL to learn interpretable and compact models, especially in high-dimensional settings. These methods have achieved much success in practice, however, there are still some key limitations, such as limited generalization ability due to specific sparse constraints on parameters, usually restricted in matrix form that ignores high-order feature interactions among tasks, and formulated in various forms with different optimization algorithms. Inspired by Generalized Lasso, we propose the Generalized Group Lasso (GenGL) to overcome these limitations. In GenGL, a linear operator is introduced to make it adaptable to diverse sparsity settings, and helps it to handle hierarchical sparsity and multi-component decomposition in general tensor form, leading to enhanced flexibility and expressivity. Based on GenGL, we propose a novel framework for Structured Sparse MTL (SSMTL), that unifies a number of existing MTL methods, and implement its two new variants in shallow and deep architectures, respectively. An efficient optimization algorithm is developed to solve the unified problem, and its effectiveness is validated by synthetic and real-world experiments.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
MAMO: Multi-Task Architecture Learning via Multi-Objective and Gradients Mediative Kernel
Yuzheng Tan,Guangneng Hu,Shuxin Zhang +2 more
TL;DR: MAMO proposes a novel multi-task architecture learning model via multi-objective optimization, addressing task interference by generating gradient mediative kernels and balancing tasks through Pareto optimal solutions, outperforming MTL baselines with effective model size.
References
Attention Is All You Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Łukasz Kaiser,Illia Polosukhin +7 more
- 01 Jan 2017
Abstract: The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
51.8K
Tensor Decompositions and Applications
Tamara G. Kolda,Brett W. Bader +1 more
TL;DR: This survey provides an overview of higher-order tensor decompositions, their applications, and available software.
Model selection and estimation in regression with grouped variables
Ming Yuan,Yi Lin +1 more
TL;DR: In this paper, instead of selecting factors by stepwise backward elimination, the authors focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection.
Sparsity and smoothness via the fused lasso
TL;DR: The fused lasso is proposed, a generalization that is designed for problems with features that can be ordered in some meaningful way, and is especially useful when the number of features p is much greater than N, the sample size.
•Posted Content
Recurrent Neural Network Regularization
TL;DR: This paper shows how to correctly apply dropout to LSTMs, and shows that it substantially reduces overfitting on a variety of tasks.
3.1K