About: Identity matrix is a research topic. Over the lifetime, 1253 publications have been published within this topic receiving 20575 citations. The topic is also known as: unit matrix.
TL;DR: It is found appropriate to use a diagonal matrix, generated by an update of the identity matrix, so as to fit the Rayleigh ellipsoid of the local Hessian in the direction of the change in the gradient.
Abstract: This paper describes some numerical experiments with variable-storage quasi-Newton methods for the optimization of some large-scale models (coming from fluid mechanics and molecular biology). In addition to assessing these kinds of methods in real-life situations, we compare an algorithm of A. Buckley with a proposal by J. Nocedal. The latter seems generally superior, provided that careful attention is given to some nontrivial implementation aspects, which concern the general question of properly initializing a quasi-Newton matrix. In this context, we find it appropriate to use a diagonal matrix, generated by an update of the identity matrix, so as to fit the Rayleigh ellipsoid of the local Hessian in the direction of the change in the gradient.
Also, a variational derivation of some rank one and rank two updates in Hilbert spaces is given.
TL;DR: This paper proposes a simpler solution that use recurrent neural networks composed of rectified linear units that is comparable to LSTM on four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem.
Abstract: Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients. To overcome this difficulty, researchers have developed sophisticated optimization techniques and network architectures. In this paper, we propose a simpler solution that use recurrent neural networks composed of rectified linear units. Key to our solution is the use of the identity matrix or its scaled version to initialize the recurrent weight matrix. We find that our solution is comparable to LSTM on our four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem.
TL;DR: In this article, the authors presented a multi-parameter family of difference operators when τ⩾3, where τ is the dimension of the difference operator and λ is the number of points in the difference matrix.
TL;DR: In this paper, the determinant of the matrix A = (ai,j) does not vanish and if A * = (a*j) is symmetric, where a*1=ai,iai,j/ai,i (i, j= 1, 2, N *, N), then A * is positive definite.
Abstract: Conditions (1.2) were formulated by Geiringer [4, p. 379](2). Evidently these conditions imply that ai,i 5#O (i=1, 2, -, 1N). It is easy to show by methods similar to those used in [4, pp. 379-381] that the determinant of the matrix A= (ai,j) does not vanish. Moreover, if the matrix A * = (a*j) is symmetric, where a*1=ai,iai,j/ ai,i (i, j= 1, 2, N * , N), then A * is positive definite. For if X is a nonpositive real number, then the matrix A * -XI, where I is the identity matrix, also satisfies (1.2) and hence its determinant cannot vanish. Therefore all eigenvalues of A * are positive, and A * is positive definite. On the other hand if A* is positive definite then ai,i5zQ (i=1, 2, , N). We shall be concerned with effective methods for obtaining numerical solu-
TL;DR: In this paper, the authors show that for the most commonly used covariance functions, the matrix $C$ can be hierarchically factored into a product of block low-rank updates of the identity matrix, yielding an $\mathcal {O} (n\,\log^2, n)$ algorithm for inversion.
Abstract: A number of problems in probability and statistics can be addressed using the multivariate normal (Gaussian) distribution. In the one-dimensional case, computing the probability for a given mean and variance simply requires the evaluation of the corresponding Gaussian density. In the $n$ -dimensional setting, however, it requires the inversion of an $n \times n$ covariance matrix, $C$ , as well as the evaluation of its determinant, $\det (C)$ . In many cases, such as regression using Gaussian processes, the covariance matrix is of the form $C = \sigma ^2 I + K$ , where $K$ is computed using a specified covariance kernel which depends on the data and additional parameters (hyperparameters). The matrix $C$ is typically dense, causing standard direct methods for inversion and determinant evaluation to require $\mathcal {O}(n^3)$ work. This cost is prohibitive for large-scale modeling. Here, we show that for the most commonly used covariance functions, the matrix $C$ can be hierarchically factored into a product of block low-rank updates of the identity matrix, yielding an $\mathcal {O} (n\,\log^2\, n)$ algorithm for inversion. More importantly, we show that this factorization enables the evaluation of the determinant $\det (C)$ , permitting the direct calculation of probabilities in high dimensions under fairly broad assumptions on the kernel defining $K$ . Our fast algorithm brings many problems in marginalization and the adaptation of hyperparameters within practical reach using a single CPU core. The combination of nearly optimal scaling in terms of problem size with high-performance computing resources will permit the modeling of previously intractable problems. We illustrate the performance of the scheme on standard covariance kernels.