Second-order Learning Algorithm with Squared Penalty Term
Kazumi Saito,Ryohei Nakano +1 more
- 03 Dec 1996
- Vol. 12, Iss: 3, pp 627-633
TL;DR: The experiments showed that for a reasonably adequate penalty factor, the combination of the squared penalty term and the second-order learning algorithm drastically improves the convergence performance in comparison to the other combinations, at the same time bringing about excellent generalization performance.
read more
Abstract: This article compares three penalty terms with respect to the efficiency of supervised learning, by using first- and second-order off-line learning algorithms and a first-order on-line algorithm. Our experiments showed that for a reasonably adequate penalty factor, the combination of the squared penalty term and the second-order learning algorithm drastically improves the convergence performance in comparison to the other combinations, at the same time bringing about excellent generalization performance. Moreover, in order to understand how differently each penalty term works, a function surface evaluation is described. Finally, we show how cross validation can be applied to find an optimal penalty factor.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Bayesian regularization and pruning using a Laplace prior
TL;DR: Standard techniques for improved generalization from neural networks include weight decay and pruning and a comparison is made with results of MacKay using the evidence framework and a gaussian regularizer.
436
Adapting the Neural Network Approach to PGA Prediction: An Example Based on the KiK-net Data
TL;DR: In this article, the authors investigated the artificial neural network method for the derivation of physically sound, easy-to-handle, predictive ground-motion models and applied it to a large subset of the KiK-net seismic database, which includes 3891 records from 398 sites and 335 earthquakes.
117
Boundedness and Convergence of Online Gradient Method With Penalty for Feedforward Neural Networks
TL;DR: By proving that the weights are automatically bounded in the network training with penalty, this work simplifies the conditions that are required for convergence of online gradient method in literature.
76
Partial BFGS update and efficient step-length calculation for three-layer neural networks
Kazumi Saito,Ryohei Nakano +1 more
TL;DR: It turned out that an efficient and accurate step-length calculation plays an important role for the convergence of quasi-Newton algorithms, and a partial BFGS update greatly saves storage space without losing the convergence performance.
73
Convergence of online gradient method for feedforward neural networks with smoothing L 1/2 regularization penalty
TL;DR: The strong convergence results for the smoothing L"1"/"2 regularization method are shown and the boundedness of the weights during the network training is proved, proving that weights are bounded is no longer needed for the proof of convergence.
66
References
•Book
Neural networks for pattern recognition
Christopher M. Bishop
- 01 Jan 1995
TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.
Neural Networks for Pattern Recognition
Suresh Kothari,Heekuck Oh +1 more
TL;DR: The chapter discusses two important directions of research to improve learning algorithms: the dynamic node generation, which is used by the cascade correlation algorithm; and designing learning algorithms where the choice of parameters is not an issue.
14.5K
•Book
Pattern recognition and neural networks
Brian D. Ripley,N. L. Hjort +1 more
- 01 Jan 1996
TL;DR: Professor Ripley brings together two crucial ideas in pattern recognition; statistical methods and machine learning via neural networks in this self-contained account.
6.4K
Bayesian interpolation
David J. C. MacKay
- 01 May 1992
TL;DR: The Bayesian approach to regularization and model-comparison is demonstrated by studying the inference problem of interpolating noisy data by examining the posterior probability distribution of regularizing constants and noise levels.
4.7K
Original Contribution: A scaled conjugate gradient algorithm for fast supervised learning
TL;DR: Experiments show that SCG is considerably faster than BP, CGL, and BFGS, and avoids a time consuming line search.
4.2K
Related Papers (5)
Simon Haykin
- 16 Jul 1998
Christopher M. Bishop
- 01 Jan 1995