TL;DR: A new method for computing the solution of a linear system having a symmetric circulant tridiagonal matrix is presented, which is quite competitive with Gaussian elimination and with the modified double sweep method.
Abstract: In this paper a new method for computing the solution of a linear system having a symmetric circulant tridiagonal matrix is presented. This special kind of system appears in many applications. After an appropriate partition of the system and the elimination of the last unknown, we apply the Woodbury formula to arrive at a very efficient and stable method, which is quite competitive with Gaussian elimination and with the modified double sweep method.
TL;DR: In this paper, the authors showed that the complexity of the eigenvalue computation for a symmetric tridiagonal matrix can be reduced to polylogarithmic factors from the information lower bounds.
Abstract: Surprisingly simple corollaries from the Courant-Fischer minimax characterization theorem enable us to devise a very effective algorithm for the evaluation of a set S interleaving the set E of the eigenvalues of a real symmetric tridiagonal matrix Tn (as well a.s a point that splits E into two subsets of comparable cardinalities). As a result, we dramatically decrease the previous record upper estimates for the parallel complexity of the eigenvalue computation for a symmetric tridiagonal matrix (which is a major computational problem in linear algebra); our new upper bounds are within polylogarithmic factors from the information lower bounds. The algorithm can be extended to approximating to the zeros of a polynomial that ha-s only real zeros (aa an alternative to the algorithm of [BOT]).
TL;DR: It is demonstrated how the efficiency of the general block tridiagonal multilevel algorithm can be improved by introducing the equivalent of two-way Gaussian elimination for the first and the last partitioning and by carefully balancing the load of the processors.
Abstract: This paper describes an efficient algorithm for the parallel solution of systems of linear equations with a block tridiagonal coefficient matrix. The algorithm comprises a multilevel LU-factorization based on block cyclic reduction and a corresponding solution algorithm. The paper includes a general presentation of the parallel multilevel LU-factorization and solution algorithms, but the main emphasis is on implementation principles for a message passing computer with hypercube topology. Problem partitioning, processor allocation and communication requirement are discussed for the general block tridiagonal algorithm. Band matrices can be cast into block tridiagonal form, and this special but important problem is dealt with in detail. It is demonstrated how the efficiency of the general block tridiagonal multilevel algorithm can be improved by introducing the equivalent of two-way Gaussian elimination for the first and the last partitioning and by carefully balancing the load of the processors. The presentation of the multilevel band solver is accompanied by detailed complexity analyses. The properties of the parallel band solver were evaluated by implementing the algorithm on an Intel iPSC hypercube parallel computer and solving a larger number of banded linear equations using 2 to 32 processors. The results of the evaluation include speed-up over a sequential processor, and the measure values are in good agreement with the theoretical values resulting from complexity analysis. It is found that the maximum asymptotic speed-up of the multilevel LU-factorization using p processors and load balancing is approximated well by the expression ( p +6)/4. Finally, the multilevel parallel solver is compared with solvers based on row and column interleaved organization.
TL;DR: By the use of repeated partitioning of the matrix into (2 × 2) subsystems it is shown that the linear system can be recursively decoupled into an explicit form suitable for solving on parallel or vector computers.
Abstract: In many numerical methods it is necessary to solve repeatedly tridiagonal linear systems of a certain form, i.e. diagonally dominant. By the use of repeated partitioning of the matrix into (2 × 2) subsystems it is shown that the linear system can be recursively decoupled into an explicit form suitable for solving on parallel or vector computers.
TL;DR: This paper describes programs to reduce a nonsymmetric matrix to tridiagonal form, compute the eigenvalues of thetridiagonal matrix, improve the accuracy of an eigenvalue, and compute the corresponding eigenvector.
Abstract: This paper describes programs to reduce a nonsymmetric matrix to tridiagonal form, compute the eigenvalues of the tridiagonal matrix, improve the accuracy of an eigenvalue, and compute the corresponding eigenvector. 8 refs., 3 tabs.
TL;DR: A parallel algorithm to solve the Gaussian elimination with complete or partial pivoting problem, based on the corresponding sequential algorithm, on SIMD hypercube computers with distributed memory, which allows for arbitrary dimensions for the matrix and the hypercube.
Abstract: We present a parallel algorithm to solve the Gaussian elimination with complete or partial pivoting problem, based on the corresponding sequential algorithm, on SIMD hypercube computers with distributed memory. This parallel algorithm is general in the sense that it allows for arbitrary dimensions for the matrix and the hypercube. The flexibility of this algorithm is rooted in the partition of the dimensions of the hypercube into two subsets, each one associated with one dimension of the matrix. The data are distributed in the local memories with a cyclic storage scheme. The performance of a parallel algorithm based on Gaussian elimination is bounded by data dependences.
TL;DR: This work shows how to transform the solution of an n × n tridiagonal system into suffix computations of continued fractions, and introduces a parallel substitution scheme to compute the suffix values.
Abstract: We first show how to transform the solution of an n × n tridiagonal system into suffix computations of continued fractions. Then a parallel substitution scheme is introduced to compute the suffix values. The derived parallel algorithm allows the tridiagonal system to be solved in O(log n) time on an unshuffle network with Θ(n /log n) processors. It is cost-optimal in the sense that processor number times execution time is minimized. Our solver is conceptually simple and easy for implementation.
TL;DR: The solver based on the modified Gaussian elimination method fully exploits parallelism and Computation and communication complexities of the proposed algorithm are all shown to be O(n/m).
Abstract: A new tridiagonal Toeplitz linear system (TTLS) solver is proposed. The solver first decomposes an n-dimensional strictly diagonally dominant TTLS equation into a number of m-dimensional subsystems employing a modified Gaussian elimination method. An analytic solution of a continued fraction is obtained to derive the solver. The solver based on the modified Gaussian elimination method fully exploits parallelism. Computation and communication complexities of the proposed algorithm are all shown to be O(n/m).
TL;DR: The scaled tridiagonal and diagonal preconditioners prove very effective and efficient for a broad range of problems, even for conductivity variations of 25 orders of magnitude.
TL;DR: This paper presents an implementation of multisection and parallel bisection method on a transputer network for finding the eigenvalues and corresponding eigenvectors of a symmetric tridiagonal matrix which lie in a specified interval.
Abstract: In this paper we present an implementation of multisection and parallel bisection method on a transputer network for finding the eigenvalues and corresponding eigenvectors of a symmetric tridiagonal matrix which lie in a specified interval ( a , b ). Although several similar studies in the literature have been reported significant speedups over the sequential versions of the algorithms, it remains to be determined which multiprocessor configuration is the most advantageous for these problems.
TL;DR: This work presents a new algorithm for inverting tridiagonal matrices inspired by the recursive partitioning algorithm of Evans and has potential for its vector and parallel implementation.
Abstract: Motivated by the recursive partitioning algorithm of Evans [2], we present a new algorithm for inverting tridiagonal matrices. Our derivation of the algorithm is different but elementary. The present algorithm has potential for its vector and parallel implementation.
TL;DR: The present algorithm can be adapted for banded linear systems; its adaptation for tridiagonal linear systems where it exhibits a near perfect speed up over Thomas' algorithm is considered.
Abstract: Motivated by the folding algorithm of Evans and Hatzopoulos [6] (see also [1]) for the solution of certain banded systems of linear equations, we describe a “new” folding Gaussian elimination algorithm for linear systems Ax = d with a full general coefficient matrix . We introduce a series of transformations Wm which simultaneously eliminate the elements and . For n even, the transformed system has a coefficient matrix with two half-size triangular subsystems uncoupled, obviating the need to solve 2×2 core subsystems as in [6]. The new algorithm has an arithmetical operations count of which is consistent with of the unidirectional algorithm; thus, it could possibly attain a speed up of 1.6 if implemented on a dual processor machine. The present algorithm can be adapted for banded linear systems; we consider its adaptation for tridiagonal linear systems where it exhibits a near perfect speed up over Thomas' algorithm.
TL;DR: A method for inverting tridiagonal matrices by adopting the strategy resulting in a recursive doubling algorithm is presented; the present algorithm has a highly parallel structure.
Abstract: Evans [2, 3] introduced the method of recursive point partitioning algorithm for the solution of sparse banded matrix systems and investigated the “one-line at a time” strategy for the solution of tridiagonal linear systems. Recursive block partitioning schemes resulting from variation in the size of the block structure using “two-lines at a time” have been investigated for both the tridiagonal and the quindiagonal matrix systems in Okolie [6]. The case of partitioning strategy for an nth order system has been considered by Evans and Okolie [4] resulting in a recursive decoupling algorithm for tridiagonal linear systems. Following the recursive point partitioning algorithm of Evans [2, 3], Chawla et al [1] developed a recursive partitioning algorithm for inverting tridiagonal matrices. In the present paper we present a method for inverting tridiagonal matrices by adopting the strategy resulting in a recursive doubling algorithm; the present algorithm has a highly parallel structure.
TL;DR: This paper presents systolic algorithms for the calculation of the eigenvalues and eigenvectors of a symmetric tridiagonal matrix using the methods of bisection and inverse iteration respectively.
Abstract: This paper presents systolic algorithms for the calculation of the eigenvalues and eigenvectors of a symmetric tridiagonal matrix using the methods of bisection and inverse iteration respectively. A single array design is considered, where the use of only one array of linearly connected systolic processors solves the problem at the expense of more complex cell definition and control mechanisms.
TL;DR: A synthesis, procedure for tridiagonal state-space structures is presented that yields structures with a reduced number of multipliers and eliminates zero-input and constant-input limit cycles.
Abstract: A synthesis, procedure for tridiagonal state-space structures is presented that yields structures with a reduced number of multipliers and eliminates zero-input and constant-input limit cycles. The output roundoff noise is minimized by optimizing some free parameters. Some design examples are presented illustrating the synthesis procedure. >
TL;DR: In this article, a 2n-by-2n tridiagonal matrices with variable diagonal vectors are considered and the conditions for such matrices to be nonsingular are derived and stated in geometrical terms.
TL;DR: Block iterative methods are constructed for solving systems of equations with a block-tridiagonal matrix in the difference approximation of partial differential equations.
Abstract: Block iterative methods are constructed for solving systems of equations with a block-tridiagonal matrix. Such systems occur, in particular, in the difference approximation of partial differential equations.
TL;DR: The parallel numerical solution of the matrix eigenvalue problem for real symmetric tridiagonal matrices is discussed and two implementations of the Sturm sequence algorithm on transputer arrays are described.
Abstract: We discuss the parallel numerical solution of the matrix eigenvalue problem for real symmetric tridiagonal matrices. Instances occur frequently in practice. Two implementations of the Sturm sequence algorithm on transputer arrays are described. For the first the maximum size of matrices which may be accommodated is restricted by the amount of local memory available. The second implementation removes this constraint but requires an increased execution time.
TL;DR: A method of solving the uniform bicubic B-spline surface fitting algorithm is proposed which introduces parallelism in a way that may be effectively exploited by a suitable parallel architecture.
Abstract: A method of solving the uniform bicubic B-spline surface fitting algorithm is
proposed which introduces parallelism in a way that may be effectively exploited by a
suitable parallel architecture. This method is based on the observation that a tensor
product spline surface fitting problem can be split into two spline curve fitting problems
and each of these problems can be realized by a macropipeline of fixed size VLSI arrays. In
fact, the heart of curve fitting problem consists of a block tridiagonal linear system. Based
on the state-of-art electronic and packaging technologies, the size of VLSI arithmetic
devices is limited due to the bounded chip area and I/O packaging constraints. A modular
approach to achieve VLSI matrix arithmetic solution for the block tridiagonal linear
system is amenable from the viewpoints of feasibility and applicability. A matrix
partitioning approach is presented to overcome those technological constraints imposed by
the number of I/O pins. A block tridiagonal linear system of size mn is then divided into
m simple tridiagonal systems of size n and n simple tridiagonal systems of size m by the
Dc Boor partitioning theorem. Each of the simple tridiagonal linear systems could be
partitioned and mappied into a series of two fixed size primitive VLSI matrix arithmetic
arrays including L-U decomposer and triangular system solver. The L-U decomposer and
triangular system solver could be realized by a hex-connected processor array and an
inverse perfect shuffle machine respectively. It would be shown that a B-spline surface
fitting problem for a grid of mn points can be solved by m hex-connected processor
arrays having 4 processors, m inverse perfect shuffle machines having n processors and n
inverse perfect shuffle machines having m processors in (3(m+n)+2({logzn1 +flog2n)+4J
units of time.
TL;DR: The effect of this matrix approximation is studied and a rigorous error analysis is given and the numerical results are presented.
Abstract: The Parallel Diagonal Dominant (PDD) Algorithm has been proposed for solving certain types of tridiagonal linear systems. The algorithm employ a matrix approximation. Both theoretical and experimental results have shown that the PDD algorithm is a highly efiient parallel algorithm for a variety of architectures. In this paper, the effect of this approximation is studied and a rigorous error analysis is given. The numerical results are presented.
TL;DR: Sets of tridiagonal systems occur in many applications and Fast Poisson solvers and Alternate Direction Methods make use of tridagonal system solvers.
Abstract: Sets of tridiagonal systems occur in many applications. Fast Poisson solvers and Alternate Direction Methods make use of tridiagonal system solvers. Network-based multiprocessors provide a cost-effective alternative to traditional supercomputer architectures. The complexity of concurrent algorithms for the solution of multiple tridiagonal systems on Boolean-cube-configured multiprocessors with distributed memory are investigated. Variations of odd-even cyclic reduction, parallel cyclic reduction, and algorithms making use of data transposition with or without substructuring and local elimination, or pipelined elimination, are considered. A simple performance model is used for algorithm comparison, and the validity of the model is verified on an Intel iPSC/ 1. For many combinations of machine and system parameters, pipelined elimination, or equation transposition with or without substructuring is optimum. Hybrid algorithms that at any stage choose the best algorithm among the considered ones for the remainder of the problem are presented.It is shown that the optimum partitioning of a set of independent tridiagonal systems among a set of processors yields the embarrassingly parallel case. If the systems originate from a lattice and solutions are computed in alternating directions, then to first order the aspect ratio of a computational lattice shall be the same as that of the lattice forming the base for the equations.The experiments presented here demonstrate the importance of combining in the communication system for architectures with a relatively high communications start-up time.
TL;DR: The paper is concluded with numerical examples that demonstrate the superiority of the proposed method by saving an order of magnitude in execution time at the expense of sacrificing a few orders of accuracy, although for symmetric tridiagonal matrices in general, the method appears to be unstable.
Abstract: An efficient method to solve the eigenproblem of $N \times N$ symmetric tridiagonal matrices is proposed. Unlike the standard eigensolvers that necessitate $O(N^3 )$ operations to compute the eigenvectors of such matrices, the proposed method computes both the eigenvalues and eigenvectors with only $O(N^2 )$ operations. The method is based on serial implementation of the recently introduced Divide and Conquer algorithm [3], [1], [4]. It exploits the fact that by $O(N^2 )$ Divide and Conquer operations one can compute the eigenvalues of an $N \times N$ symmetric tridiagonal matrix and a small number of pairs of successive rows of its eigenvector matrix. The rest of the eigenvectors (either all together or one at a time) are computed by linear three-term recurrence relations. The paper is concluded with numerical examples that demonstrate the superiority of the proposed method for a special class of symmetric tridiagonal matrices, by saving an order of magnitude in execution time at the expense of sacrificing a few orders of accuracy, although for symmetric tridiagonal matrices in general, the method appears to be unstable.
TL;DR: In this paper, the backward error analysis, perturbation theory, and properties of the $LU$ factorization of a tridiagonal matrix were used to obtain the best bound available for the error.
Abstract: If $\hat x$ is the computed solution to a tridiagonal system $Ax = b$ obtained by Gaussian elimination, what is the “best” bound available for the error $x - \hat x$ and how can it be computed efficiently? This question is answered using backward error analysis, perturbation theory, and properties of the $LU$ factorization of A. For three practically important classes of tridiagonal matrix, those that are symmetric positive definite, totally nonnegative, or M-matrices, it is shown that $(A + E)\hat x = b$ where the backward error matrix E is small componentwise relative to A. For these classes of matrices the appropriate forward error bound involves Skeel’s condition number cond $(A,x)$, which, it is shown, can be computed exactly in $O(n)$ operations. For diagonally dominant tridiagonal A the same type of backward error result holds, and the author obtains a useful upper bound for cond $(A,x)$ that can be computed in $O(n)$ operations. Error bounds and their computation for general tridiagonal matrices a...