Identity matrix

Topic Tools

Papers published on a yearly basis

1 / 2

Papers

Some numerical experiments with variable-storage quasi-Newton algorithms

[...]

Jean Charles Gilbert¹, Claude Lemaréchal²•Institutions (2)

International Institute for Applied Systems Analysis¹, French Institute for Research in Computer Science and Automation²

01 Dec 1989-Mathematical Programming

TL;DR: It is found appropriate to use a diagonal matrix, generated by an update of the identity matrix, so as to fit the Rayleigh ellipsoid of the local Hessian in the direction of the change in the gradient.

...read moreread less

Abstract: This paper describes some numerical experiments with variable-storage quasi-Newton methods for the optimization of some large-scale models (coming from fluid mechanics and molecular biology). In addition to assessing these kinds of methods in real-life situations, we compare an algorithm of A. Buckley with a proposal by J. Nocedal. The latter seems generally superior, provided that careful attention is given to some nontrivial implementation aspects, which concern the general question of properly initializing a quasi-Newton matrix. In this context, we find it appropriate to use a diagonal matrix, generated by an update of the identity matrix, so as to fit the Rayleigh ellipsoid of the local Hessian in the direction of the change in the gradient. Also, a variational derivation of some rank one and rank two updates in Hilbert spaces is given.

...read moreread less

829 citations

Posted Content•

A Simple Way to Initialize Recurrent Networks of Rectified Linear Units

[...]

Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton

03 Apr 2015-arXiv: Neural and Evolutionary Computing

TL;DR: This paper proposes a simpler solution that use recurrent neural networks composed of rectified linear units that is comparable to LSTM on four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem.

...read moreread less

Abstract: Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients. To overcome this difficulty, researchers have developed sophisticated optimization techniques and network architectures. In this paper, we propose a simpler solution that use recurrent neural networks composed of rectified linear units. Key to our solution is the use of the identity matrix or its scaled version to initialize the recurrent weight matrix. We find that our solution is comparable to LSTM on our four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem.

...read moreread less

822 citations

Journal Article•10.1006/JCPH.1994.1005•

Summation by parts for finite difference approximations for d/dx

[...]

Bo Strand¹•Institutions (1)

Uppsala University¹

01 Jan 1994-Journal of Computational Physics

TL;DR: In this article, the authors presented a multi-parameter family of difference operators when τ⩾3, where τ is the dimension of the difference operator and λ is the number of points in the difference matrix.

...read moreread less

783 citations

Journal Article•10.1090/S0002-9947-1954-0059635-7•

Iterative methods for solving partial difference equations of elliptic type

[...]

David M. Young

01 Jan 1954-Transactions of the American Mathematical Society

TL;DR: In this paper, the determinant of the matrix A = (ai,j) does not vanish and if A * = (a*j) is symmetric, where a*1=ai,iai,j/ai,i (i, j= 1, 2, N *, N), then A * is positive definite.

...read moreread less

Abstract: Conditions (1.2) were formulated by Geiringer [4, p. 379](2). Evidently these conditions imply that ai,i 5#O (i=1, 2, -, 1N). It is easy to show by methods similar to those used in [4, pp. 379-381] that the determinant of the matrix A= (ai,j) does not vanish. Moreover, if the matrix A * = (a*j) is symmetric, where a*1=ai,iai,j/ ai,i (i, j= 1, 2, N * , N), then A * is positive definite. For if X is a nonpositive real number, then the matrix A * -XI, where I is the identity matrix, also satisfies (1.2) and hence its determinant cannot vanish. Therefore all eigenvalues of A * are positive, and A * is positive definite. On the other hand if A* is positive definite then ai,i5zQ (i=1, 2, , N). We shall be concerned with effective methods for obtaining numerical solu-

...read moreread less

773 citations

Journal Article•10.1109/TPAMI.2015.2448083•

Fast Direct Methods for Gaussian Processes

[...]

Sivaram Ambikasaran¹, Daniel Foreman-Mackey¹, Leslie Greengard¹, David W. Hogg¹, Michael O'Neil¹ - Show less +1 more•Institutions (1)

New York University¹

01 Feb 2016-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: In this paper, the authors show that for the most commonly used covariance functions, the matrix $C$ can be hierarchically factored into a product of block low-rank updates of the identity matrix, yielding an $\mathcal {O} (n\,\log^2, n)$ algorithm for inversion.

...read moreread less

Abstract: A number of problems in probability and statistics can be addressed using the multivariate normal (Gaussian) distribution. In the one-dimensional case, computing the probability for a given mean and variance simply requires the evaluation of the corresponding Gaussian density. In the $n$ -dimensional setting, however, it requires the inversion of an $n \times n$ covariance matrix, $C$ , as well as the evaluation of its determinant, $\det (C)$ . In many cases, such as regression using Gaussian processes, the covariance matrix is of the form $C = \sigma ^2 I + K$ , where $K$ is computed using a specified covariance kernel which depends on the data and additional parameters (hyperparameters). The matrix $C$ is typically dense, causing standard direct methods for inversion and determinant evaluation to require $\mathcal {O}(n^3)$ work. This cost is prohibitive for large-scale modeling. Here, we show that for the most commonly used covariance functions, the matrix $C$ can be hierarchically factored into a product of block low-rank updates of the identity matrix, yielding an $\mathcal {O} (n\,\log^2\, n)$ algorithm for inversion. More importantly, we show that this factorization enables the evaluation of the determinant $\det (C)$ , permitting the direct calculation of probabilities in high dimensions under fairly broad assumptions on the kernel defining $K$ . Our fast algorithm brings many problems in marginalization and the adaptation of hyperparameters within practical reach using a single CPU core. The combination of nearly optimal scaling in terms of problem size with high-performance computing resources will permit the modeling of previously intractable problems. We illustrate the performance of the scheme on standard covariance kernels.

...read moreread less

748 citations

...

Expand

Year	Papers
2025	5
2024	16
2023	16
2022	52
2021	57
2020	57

Topic Tools

Papers published on a yearly basis

Papers

Some numerical experiments with variable-storage quasi-Newton algorithms

A Simple Way to Initialize Recurrent Networks of Rectified Linear Units

Summation by parts for finite difference approximations for d/dx

Iterative methods for solving partial difference equations of elliptic type

Fast Direct Methods for Gaussian Processes

Related Topics (5)

Performance Metrics