Open AccessProceedings Article
Smoothing Regularizers for Projective Basis Function Networks
John Moody,Thorsteinn Rögnvaldsson +1 more
- 22 May 1996
- pp 585-591
TL;DR: The new regularizers are shown to yield better generalization errors than weight decay when the implicit assumptions in the latter are wrong and to enable the direct enforcement of smoothness without the need for costly Monte-Carlo integrations of S(W, m).
read more
Abstract: Smoothing regularizers for radial basis functions have been studied extensively, but no general smoothing regularizers for projective basis junctions (PBFs), such as the widely-used sigmoidal PBFs, have heretofore been proposed. We derive new classes of algebraically-simple mth-order smoothing regularizers for networks of the form f(W, x) = Σi=1N Ujg [xT vj + Vjo] + uo, with general projective basis functions g[ċ]. These regularizers are: RG(W,m) = Σi=1N Uj2||vj||2m-1 Global Form RL(W,m) = Σi=1N uj2||vj||2m Local Form These regularizers bound the corresponding mth-order smoothing integral S(W,m) = ∫dD xω(x) ||∂m f(W,x)/∂xm||2, where W denotes all the network weights {uj, uo, vj, vo}, and Ω(x) is a weighting function on the D-dimensional input space. The global and local cases are distinguished by different choices of Ω(x).
The simple algebraic forms R(W, m) enable the direct enforcement of smoothness without the need for costly Monte-Carlo integrations of S(W, m). The new regularizers are shown to yield better generalization errors than weight decay when the implicit assumptions in the latter are wrong. Unlike weight decay, the new regularizers distinguish between the roles of the input and output weights and capture the interactions between them.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A Bayesian Committee Machine
TL;DR: It is found that the performance of the BCM improves if several test points are queried at the same time and is optimal if the number of test points is at least as large as the degrees of freedom of the estimator.
Computation with infinite neural networks
TL;DR: For neural networks with a wide class of weight priors, it can be shown that in the limit of an infinite number of hidden units, the prior over functions tends to a gaussian process.
198
Structural Modelling with Sparse Kernels
Steve R. Gunn,J.S. Kandola +1 more
TL;DR: This work describes a transparent, advanced non-linear modelling approach that enables the constructed predictive models to be visualised, allowing model validation and assisting in interpretation, and it is shown to exhibit competitive generalisation performance together with improved interpretability.
Geometrical interpretation and architecture selection of MLP
TL;DR: A geometrical interpretation of the multilayer perceptron (MLP) is suggested and some general guidelines for selecting the architecture of the MLP are proposed based upon this interpretation and the controversial issue of whether four-layered MLP is superior to the three-layers is also carefully examined.
109
A Novel Pruning Algorithm for Smoothing Feedforward Neural Networks Based on Group Lasso Method
TL;DR: Four new variants of the backpropagation algorithm are proposed to improve the generalization ability for feedforward neural networks by approximating the Group Lasso penalty and performing better than the other three classical penalization methods on both generalization and pruning efficiency.
106
References
Ridge regression: biased estimation for nonorthogonal problems
TL;DR: In this paper, an estimation procedure based on adding small positive quantities to the diagonal of X′X was proposed, which is a method for showing in two dimensions the effects of nonorthogonality.
10.3K
•Book
Spline models for observational data
Grace Wahba
- 01 Mar 1990
TL;DR: In this paper, a theory and practice for the estimation of functions from noisy data on functionals is developed, where convergence properties, data based smoothing parameter selection, confidence intervals, and numerical methods are established which are appropriate to a number of problems within this framework.
6.9K
A direct adaptive method for faster backpropagation learning: the RPROP algorithm
Martin Riedmiller,Heinrich Braun +1 more
- 28 Mar 1993
TL;DR: A learning algorithm for multilayer feedforward networks, RPROP (resilient propagation), is proposed that performs a local adaptation of the weight-updates according to the behavior of the error function to overcome the inherent disadvantages of pure gradient-descent.
Neural networks and the bias/variance dilemma
TL;DR: It is suggested that current-generation feedforward neural networks are largely inadequate for difficult problems in machine perception and machine learning, regardless of parallel-versus-serial hardware or other implementation issues.
3.9K
Networks for approximation and learning
Tomaso Poggio,Federico Girosi +1 more
- 01 Sep 1990
TL;DR: Regularization networks are mathematically related to the radial basis functions, mainly used for strict interpolation tasks as mentioned in this paper, and two extensions of the regularization approach are presented, along with the approach's corrections to splines, regularization, Bayes formulation, and clustering.
3.8K