Journal Article10.48550/arXiv.2306.11967
Complementary Learning Subnetworks for Parameter-Efficient Class-Incremental Learning
TL;DR: In this paper , a rehearsal-free class-incremental learning (CIL) approach that learns continually via the synergy between two complementary learning subnetworks is proposed, which involves jointly optimizing a CNN feature extractor and an analytical feed-forward classifier.
read more
Abstract: In the scenario of class-incremental learning (CIL), deep neural networks have to adapt their model parameters to non-stationary data distributions, e.g., the emergence of new classes over time. However, CIL models are challenged by the well-known catastrophic forgetting phenomenon. Typical methods such as rehearsal-based ones rely on storing exemplars of old classes to mitigate catastrophic forgetting, which limits real-world applications considering memory resources and privacy issues. In this paper, we propose a novel rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks. Our approach involves jointly optimizing a plastic CNN feature extractor and an analytical feed-forward classifier. The inaccessibility of historical data is tackled by holistically controlling the parameters of a well-trained model, ensuring that the decision boundary learned fits new classes while retaining recognition of previously learned classes. Specifically, the trainable CNN feature extractor provides task-dependent knowledge separately without interference; and the final classifier integrates task-specific knowledge incrementally for decision-making without forgetting. In each CIL session, it accommodates new tasks by attaching a tiny set of declarative parameters to its backbone, in which only one matrix per task or one vector per class is kept for knowledge retention. Extensive experiments on a variety of task sequences show that our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order robustness. Furthermore, to make the non-growing backbone (i.e., a model with limited network capacity) suffice to train on more incoming tasks, a graceful forgetting implementation on previously learned trivial tasks is empirically investigated.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
References
•Journal Article
Visualizing Data using t-SNE
TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Regression Shrinkage and Selection via the Lasso
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
•Book
Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers
Stephen Boyd,Neal Parikh,Eric Chu,Borja Peleato,Jonathan Eckstein +4 more
- 23 May 2011
TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.
•Posted Content
Overcoming catastrophic forgetting in neural networks
James Kirkpatrick,Razvan Pascanu,Neil C. Rabinowitz,Joel Veness,Guillaume Desjardins,Andrei Rusu,Kieran Milan,John Quan,Tiago Ramalho,Agnieszka Grabska-Barwinska,Demis Hassabis,Claudia Clopath,Dharshan Kumaran,Raia Hadsell +13 more
TL;DR: It is shown that it is possible to overcome the limitation of connectionist models and train networks that can maintain expertise on tasks that they have not experienced for a long time and selectively slowing down learning on the weights important for previous tasks.
5K
Catastrophic interference in connectionist networks: the sequential learning problem
Michael McCloskey,Neal J. Cohen +1 more
TL;DR: In this article, the authors discuss the catastrophic interference in connectionist networks and show that new learning may interfere catastrophically with old learning when networks are trained sequentially, and the analysis of the causes of interference implies that at least some interference will occur whenever new learning might alter weights involved in representing old learning.
4.6K