Top 52 papers presented at Parallel Computing Technologies in 1997

Showing papers presented at "Parallel Computing Technologies in 1997"

Control-Driven Coordination Programming in Shared Dataspace

[...]

George A. Papadopoulos¹, Farhad Arbab•Institutions (1)

8 Sep 1997

TL;DR: This paper argues for an alternative way of designing coordination models for parallel and distributed environments based on a complete symmetry between and decoupling of producers and consumers, as well as a clear distinction between the computational and the coordination/communication work performed by each process.

...read moreread less

Abstract: This paper argues for an alternative way of designing coordination models for parallel and distributed environments based on a complete symmetry between and decoupling of producers and consumers, as well as a clear distinction between the computational and the coordination/communication work performed by each process. The novel idea is to allow both producer and consumer processes to communicate with each other in a fashion that does not dictate any one of them to have specific knowledge about the rest of the processes involved in a coordinated activity. Furthermore, the model is inherently control-driven where communicating processes observe state changes and react to the presence of events and where the main communication mechanism is limited broadcasting (as opposed to either point-to-point or unrestricted broadcasting communication). Although a direct realisation of this model in terms of a concrete coordination language does already exist, we argue that the underlying principles can be applied to other similar models. We demonstrate our point by comparing our model with an established and widely used coordination framework, namely the Linda-type Shared Dataspace model, and we show how the functionality of the former can be embedded into the latter, thus yielding an alternative Linda-based coordination framework.

...read moreread less

17 citations

Book Chapter•10.1007/3-540-63371-5_22•

Hybrid Approach to Task Allocation in Distributed Systems

[...]

Ladislav Hluchý¹, Miroslav Dobrucký¹, Ján Astalos¹•Institutions (1)

Slovak Academy of Sciences¹

8 Sep 1997

TL;DR: This paper describes the static and dynamic task allocation tools in PVM environment for distributed memory parallel systems and augmented simulated annealing and heuristic move exchange methods in distributed form are implemented.

...read moreread less

Abstract: This paper describes the static and dynamic task allocation tools in PVM environment for distributed memory parallel systems. For the static mapping the objective function is used to evaluate the optimality of the allocation of a task graph onto a processor graph. Together with our optimization method also augmented simulated annealing and heuristic move exchange methods in distributed form are implemented. For dynamic task allocation the semidistributed approach was designed based on the division of processor network topology into independent and symmetric spheres. Distributed static mapping (DSM) and dynamic load balancing (DLB) tools are controlled by user window interface. DSM and DLB tools are integrated together with software monitor (PG-PVM) in the graphical GRAPNEL environment.

...read moreread less

13 citations

Book Chapter•10.1007/3-540-63371-5_6•

A Formal Framework for the Analysis of Recursive-Parallel Programs

[...]

Olga Kouchnarenko¹, Ph. Schnoebelen•Institutions (1)

University of Rennes¹

8 Sep 1997

TL;DR: The formal framework proposed here combines a formal operational model of abstract programs, a set of decision methods for the analysis of RP schemes, a formal Operational model for the interpreted programs, and translation results stating how some behavioural properties of the concrete programs can be correctly checked on the corresponding scheme.

...read moreread less

Abstract: RP programs are imperative programs with parallelism and recursion and only a limited way of synchronizing parallel processes. The formal framework we propose here combines (1) a formal operational model of abstract programs (or RP schemes), (2) a set of decision methods for the analysis of RP schemes, (3) a formal operational model for the interpreted programs, and (4) translation results stating how some behavioural properties of the concrete programs can be correctly checked on the corresponding scheme.

...read moreread less

12 citations

Book Chapter•10.1007/3-540-63371-5_14•

An Integer Linear Programming Model of Software Pipelining for the MIPS R8000 Processor

[...]

Artour Stoutchinin¹•Institutions (1)

University of Delaware¹

8 Sep 1997

TL;DR: The ILP model for the MIPS R8000 is extended by including memory optimization and the entire model is presented in detail, aiming to produce optimal schedules.

...read moreread less

Abstract: In parallelizing the code for high-performance processors, software pipelining of innermost loops is of fundamental importance. In order to benefit from software pipelining, two separate tasks need to be performed: (i) software pipelining proper (find the rate-optimal legal schedule), and (ii) register allocation (allocate registers to the found schedule). Software pipelining and register allocation can be formulated as an integer linear programming (ILP) problem, aiming to produce optimal schedules. In this paper, we discuss the application of the integer linear programming to software pipelining on the MIPS R8000 superscalar microprocessor. Some of the results were presented in the PLDI96 [14], where they were compared to the MIPSpro software pipeliner. In this paper we further extend the ILP model for the MIPS R8000 by including memory optimization and present the entire model in detail.

...read moreread less

10 citations

Book Chapter•10.1007/BFB0032706•

A Unified Software Pipeline Construction Scheme for Modulo Scheduled Loops

[...]

Benoît Dupont de Dinechin¹•Institutions (1)

McGill University¹

8 Sep 1997

TL;DR: A software pipeline construction scheme for DO-loops, while-loop, and loops with multiple exits, which unifies, simplifies, and generalizes, the separate techniques previously required to build a complete software pipeline from a local schedule computed by modulo scheduling is presented.

...read moreread less

Abstract: We present a software pipeline construction scheme for DO-loops, while-loops, and loops with multiple exits, which unifies, simplifies, and generalizes, the separate techniques previously required to build a complete software pipeline from a local schedule computed by modulo scheduling. In the setting of this software pipeline construction scheme, we demonstrate a simple way of implementing a general form of modulo expansion. Then we introduce inductive relaxation, a technique that replaces generalized modulo expansion when the variable to expand is a simple induction. These techniques do not require any architectural support from the target processor, and have been extensively tested as part of the software pipeliner that comes with the 3.0 compiler releases for the Cray T3ETM massively parallel computer.

...read moreread less

7 citations

Book Chapter•10.1007/3-540-63371-5_8•

On Proving Large Distributed Systems: Petri Net Modules Verification

[...]

Irina A. Lomazova¹•Institutions (1)

Russian Academy of Sciences¹

8 Sep 1997

TL;DR: This paper presents a formal basis for validating large distributed systems described by composition of coloured Petri-Net = modules and proposes compositional proof technique for such systems.

...read moreread less

Abstract: In this paper we present a formal basis for validating large distributed systems. Distributed systems are described by composition of coloured Petri-Net = modules. Compositional proof technique for such systems, where properties are = specified in terms of a linear time temporal logic, is proposed.

...read moreread less

7 citations

Book Chapter•10.1007/3-540-63371-5_40•

Parallel Computation of an Unsteady Compressible Flow

[...]

E. Onuphre¹, André Chambarel¹•Institutions (1)

Centre national de la recherche scientifique¹

8 Sep 1997

TL;DR: A general background is presented for developping parallel applications in the domain of Computational Fluid Dynamics based upon block Jacobi preconditioned iterative methods for solving partial differential equations and the simulation of an unsteady compressible flow is discussed.

...read moreread less

Abstract: In the present paper, a general background is presented for developping parallel applications in the domain of Computational Fluid Dynamics. This frame of work is based upon block Jacobi preconditioned iterative methods for solving partial differential equations. It is shown how the parallelism potential of such a preconditioning can be efficiently exploited by associating it with Finite Element discretization and Object Oriented Programming. The resulting parallel applications are characterized by coarse granularity, ease of maintaining good load balance and the possibility of using the same object in both a serial or a parallel computing context. As an application of our parallel approach, the simulation of an unsteady compressible flow is discussed.

...read moreread less

5 citations

Book Chapter•10.1007/3-540-63371-5_21•

An HPF Case Study of a Domain-Decomposition Based Irregular Application

[...]

Cécile Germain¹, Jacques Laminie¹, M. Pallud¹, Daniel Etiemble¹•Institutions (1)

Centre national de la recherche scientifique¹

8 Sep 1997

TL;DR: It is shown that HPF can easily express the natural parallelism of the application and is presented as a realistic, but non adaptive irregular application.

...read moreread less

Abstract: Data-parallel languages, in particular HPF, provide a high-level view of operators overs parallel data structures and hide the details of data partitioning and communication. One of the most difficult issues in compiling such languages is managing irregular data-dependent parallelism. This paper presents the study of a realistic, but non adaptive irregular application. We show that HPF can easily express the natural parallelism of the application. Experimental results and a detailed examination of the compiler process are presented.

...read moreread less

5 citations

Proceedings Article•

Modelling of seismic wave propagation for 2D media (direct and inverse problems). Lecture Notes in Computer Science

[...]

V. G. Khajdukov, Victor Kostin, V. V. Kovalevsky, Victor E. Malyshkin, Vladimir Tcheverda, D M Vishnevsky - Show less +2 more

1 Jan 1997

4 citations

Book Chapter•10.1007/3-540-63371-5_18•

Optimization Techniques and Performance Analysis for Different Serial and Parallel RISC-based Computers

[...]

Oleg Bessonov¹, Bernard Roux•Institutions (1)

Russian Academy of Sciences¹

8 Sep 1997

TL;DR: The paper describes different methods of performance optimization of serial and parallel algorithms for modern superscalar RISC processor based computers and the comparative performance analysis of different computers and their architectural peculiarities.

...read moreread less

Abstract: The paper describes different methods of performance optimization of serial and parallel algorithms for modern superscalar RISC processor based computers. The limitations imposed on the performance by hierarchical organization of computer memories are discussed, followed by the comparative performance analysis of different computers and their architectural peculiarities. Finally the parallelization aspects of the solution of 3-dimensional CFD problems are considered, along with the comparison of communication characteristics of parallel computers.

...read moreread less

4 citations

Book Chapter•10.1007/3-540-63371-5_32•

Hardware Support for 3D Cellular Processing

[...]

Rolf Hoffmann¹, Klaus-Peter Völkmann¹•Institutions (1)

Technische Universität Darmstadt¹

8 Sep 1997

TL;DR: A 3D architecture is presented based on the principles parallel access window, shifting and pipelining which can be used in designing special designed coprocessors for cellular processing.

...read moreread less

Abstract: Cellular Processing, especially in the 3D realtime case, needs high computing performance With a special designed coprocessor the requirements can be fulfilled at relatively low cost First the architectural principles are described which can be used in designing such coprocessors Second a 3D architecture is presented based on the principles parallel access window, shifting and pipelining The implementation uses two Field Programmable Logic Arrays thereby performing 66 million of 3D celloperations per second

...read moreread less

Book Chapter•10.1007/3-540-63371-5_36•

Modelling of Seismic Wave Propagation for 2D Media (Direct and Inverse Problems)

[...]

V. G. Khajdukov, V. D. Korneev, Victor Kostin, V. V. Kovalevsky, Victor E. Malyshkin, Vladimir Tcheverda, D. V. Vishnevsky¹ - Show less +3 more•Institutions (1)

Novosibirsk State University¹

8 Sep 1997

Book Chapter•10.1007/3-540-63371-5_50•

The Highly Parallel Incomplete Gram-Schmidt Preconditioner

[...]

Tianruo Yang¹, Hai Xiang Lin•Institutions (1)

Linköping University¹

8 Sep 1997

TL;DR: This paper describes a more efficient alternative, namely Improved ParIMGS (IParIMGS) which avoids the global communication of inner products and only requires local communications, therefore, the cost of communication can be significantly reduced.

...read moreread less

Abstract: In this paper we study the parallel aspects of IMGS, Incomplete Modified Gram-Schmidt preconditioner which can be used for efficiently solving sparse and large linear systems and least squares problems on massively parallel distributed memory computers. The performance of this preconditioning technique on this kind of architecture is always limited because of the global communication required for the inner products, even for ParIMGS, a parallel version of IMGS where we create some possibilities such that the computation can be overlapped with the communication. We will describe a more efficient alternative, namely Improved ParIMGS (IParIMGS) which avoids the global communication of inner products and only requires local communications. Therefore, the cost of communication can be significantly reduced. Several numerical experiments carried out on Parsytec GC/PowerPlus are presented as well.

...read moreread less

Book Chapter•10.1007/3-540-63371-5_30•

Communications in Parallel Architectures and Networks of Workstations: From Standardisation to New Standards

[...]

Franck Cappello¹, Daniel Etiemble¹•Institutions (1)

University of Paris-Sud¹

8 Sep 1997

TL;DR: This work examines the present trends toward standardisation of communication for parallel machines and networks of workstations, and discusses some software and hardware features to improve performance compared to the usual PVM-Unix-TCP/IP-Ethernet stack of protocols.

...read moreread less

Abstract: Standardisation, which is the rule for PCs and workstation, is quickly expending for parallel machines and networks of workstations. The use of commodities is the key issue to reduce the cost /performance ratio: standard microprocessors, OS, libraries... are used. Standardisation of communications is more difficult if both high performance and protection in multi-user context are wanted. We examine the present trends toward standardisation of communication for parallel machines and networks of workstations. We discuss some software and hardware features to improve performance compared to the usual PVM-Unix-TCP/IP-Ethernet stack of protocols.

...read moreread less

Book Chapter•10.1007/3-540-63371-5_45•

Bitwise Processing - a Paradigm for Deriving Parallel Algorithms

[...]

Wolfgang Koch¹•Institutions (1)

University of Jena¹

8 Sep 1997

Book Chapter•10.1007/3-540-63371-5_24•

Task Migration and Fine Grain Parallelism on Distributed Memory Architectures

[...]

Yvon Jégou¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

8 Sep 1997

TL;DR: It is shown that, because it generates parallel and asynchronous execution of a large number of small tasks, the task migration paradigm allows a direct exploitation of these irregularly structured problems on distributed memory architectures.

...read moreread less

Abstract: The most successful compilation techniques for distributed memory architectures are based on static analysis of the memory accesses. Loop iterations with similar comportment on the parallel memories are combined in order to form coarse grain parallel tasks. But for irregularly structured applications, the behavior of each iteration of a parallel loop on the memories is data dependent and cannot be predicted at compile-time and the only exploitable parallelism is fine-grain. We show that, because it generates parallel and asynchronous execution of a large number of small tasks, the task migration paradigm allows a direct exploitation of these irregularly structured problems on distributed memory architectures.

...read moreread less

Book Chapter•10.1007/3-540-63371-5_2•

Tight Lower Bounds for Computing Shortest Paths on Proper Interval and Bipartite Permutation Graphs

[...]

Lin Chen

8 Sep 1997

TL;DR: Logarithmic time lower bounds for computing the distance between two arbitrary vertices, in a proper interval graph represented by a family of intervals on a real line, and in a bipartite permutation graph representing by a permutation function, on exclusive write PRAM are proved here.

...read moreread less

Abstract: Logarithmic time lower bounds for computing the distance between two arbitrary vertices, in a proper interval graph represented by a family of intervals on a real line, and in a bipartite permutation graph represented by a permutation function, on exclusive write PRAM are proved here. The lower bounds are also valid for these classes of graphs represented by adjacency matrices and for their superclasses. Shortest paths on interval and permutation graphs, which, respectively, strictly contain proper interval and bipartite permutation graphs, are known to be computable in logarithmic time on exclusive write PRAM. It follows that the lower bounds derived here are tight.

...read moreread less

Book Chapter•10.1007/3-540-63371-5_17•

A Spatial Grid File for Multimedia Data Representation

[...]

Adil Alpkocak¹, Esen A. Ozkarahan¹•Institutions (1)

Dokuz Eylül University¹

8 Sep 1997

TL;DR: This study introduces a new file structure called Spatial Grid File, which enables us to index data objects by different and independent high-dimensional attributes and is very suitable for parallelization.

...read moreread less

Abstract: In multimedia databases spatial or high-dimensional data manipulation is important for storage and retrieval. In this study, we introduce a new file structure called Spatial Grid File. This file enables us to index data objects by different and independent high-dimensional attributes. And, with it, well-known spatial query types, such as range queries, nearest neighbor queries and spatial join operations can be efficiently performed. Although the performance of the Spatial-Grid file structure is based on the indexing methods used, it has a unique feature of combining set of spatial data each having different properties. Furthermore, this file structure is very suitable for parallelization.

...read moreread less

Book Chapter•10.1007/3-540-63371-5_15•

Computations on Cellular Automata with Defects

[...]

V. Valkovskii¹, D. Zerbino•Institutions (1)

National Academy of Sciences of Ukraine¹

8 Sep 1997

TL;DR: The notion of cellular program, which represents the inversion of a certain combination of bits on a cellular plane, is introduced and the independence of a computational result of the solitary defects of cells is investigated.

...read moreread less

Abstract: Parallel computations on the lower level in the negabinaries coding system are considered. All arithmetical operations are realized by means of five simple rules. Every rule represents the inversion of a certain combination of bits on a cellular plane. The notion of cellular program is introduced. The independence of a computational result of the solitary defects of cells is investigated. Examples illustrate this approarch. The questions concerning the property of rule systems are dis ussed.

...read moreread less

Book Chapter•10.1007/3-540-63371-5_52•

Simulating Cellular Computations with ALT. A Tutorial

[...]

Yuri Pogudin, Olga Bandman

8 Sep 1997

TL;DR: This tutorial contains a brief review of the theoretical background, the restricted ALT-language and simple examples, and may be used in teaching Parallel Computing.

...read moreread less

Abstract: ALT (Animating Language Tools is a computer tool for designing and simulating computational processes in cellular arrays. It combines interfaces for visual and textual representation of fine-grained parallel algorithms. A special high-level language is developed whose statements are graphically given arrays and subarrays. The simulating process allows to observe computation dynamics at different levels of detail length: at program blocks, at statements to be executed in parallel over the given array, in quasiparallel mode at cell-state changes. The tutorial contains a brief review of the theoretical background, the restricted ALT-language and simple examples. ALT is running on PC under MS DOS and may be used in teaching Parallel Computing.

...read moreread less

Book Chapter•10.1007/3-540-63371-5_37•

Decomposition on a Group and Parallel Convolution and Fast Fourier Transform Algorithms

[...]

Olga V. Klimova¹•Institutions (1)

Russian Academy of Sciences¹

8 Sep 1997

TL;DR: The group-theoretic approach to the decomposition of the basic operations of the digital signal processing (DSP) such as discrete Fourier transform and convolution is proposed and the description of a vector DFT algorithm is adduced.

...read moreread less

Abstract: The group-theoretic approach to the decomposition of the basic operations of the digital signal processing (DSP) such as discrete Fourier transform (DFT) and convolution is proposed. The distinctive feature of the approach is its primordial orientation to parallel processing. The recurrent description of the decomposition process producing fast parallel algorithms are effective both for parallel and sequential processing. The main properties of these algorithms are formulated and the description of a vector DFT algorithm is adduced.

...read moreread less

Book Chapter•10.1007/3-540-63371-5_16•

Efficient Implementation of the Improved Unsymmetric Lanczos Process on Massively Distributed Memory Computers

[...]

Tianruo Yang¹•Institutions (1)

Linköping University¹

8 Sep 1997

TL;DR: An improved version of the unsymmetric Lanczos process combining elements of numerical stability and parallel algorithm design is proposed, derived such that all inner products and matrix-vector multiplications of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time.

...read moreread less

Abstract: For the eigenvalues of a large and sparse unsymmetric coefficient matrix, we have proposed an improved version of the unsymmetric Lanczos process combining elements of numerical stability and parallel algorithm design. The algorithm is derived such that all inner products and matrix-vector multiplications of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time. Therefore, the cost of global communication on parallel distributed memory computers can be significantly reduced. The resulting algorithm maintains the favorable properties of the Lanczos process while not increasing computational costs. In this paper, we describe an efficient implementation of this method which is particularly well suited to problems with irregular sparsity pattern. The corresponding communication cost is independent of the sparsity pattern with several performance improvement techniques such as overlapping computation and communication, balancing the computational load. The performance is demonstrated by numerical experimental results carried out on massively parallel distributed memory computer Parsytec GC/PowerPlus.

...read moreread less

Book Chapter•10.1007/3-540-63371-5_9•

Influence of Self-Connection Weights on Cellular-Neural Network Stability

[...]

Sergey Pudov¹•Institutions (1)

Russian Academy of Sciences¹

8 Sep 1997

TL;DR: It is studied how self-connection weight values influence the main characteristic of CNAM, namely the strong stability to k-distortions of stored prototypes, to provide a maximal strong stability for each prototype.

...read moreread less

Abstract: Cellular-Neural Associative Memory (memory by Hophield with local connection structure) with weight matrix designed by anyone of the existing methods ensuring individual stability of network is concidered. It is studied how self-connection weight values influence the main characteristic of CNAM, namely the strong stability to k-distortions of stored prototypes. Expression for determining the self-connection weight values is obtained, such that provides a maximal strong stability for each prototype. Two strategies are proposed to determine the most acceptable value according to the requiered accuracy. The obtained results are valid not only for CNAM but also for full-connected Hopfield associative memory designed with the help of any learning method.

...read moreread less

Book Chapter•10.1007/3-540-63371-5_34•

Analysis of Methods for Solving Large-Scale Non-Symmetric Linear Systems with Sparsed Matrices

[...]

M. Balandin¹, O. Chernyshev¹, E. Shurina¹•Institutions (1)

Novosibirsk State Technical University¹

8 Sep 1997

TL;DR: Two methods which allow to solve non-symmetric sparsed systems of linear equations (SLEs) — Biconjugative Gradients (BiCG) and General Minimal Residuals (GMRES) methods — are described.

...read moreread less

Abstract: Two methods which allow to solve non-symmetric sparsed systems of linear equations (SLEs) — Biconjugative Gradients (BiCG) and General Minimal Residuals (GMRES) methods — are described in this paper. Analysis of requirements to memory and computational speed is given; some results of application to finite- differences and finiteelements SLEs are shown. Some features of these methods connected with ABS-class presented by Abaffi and Spedicato are also discussed.

...read moreread less

Book Chapter•10.1007/3-540-63371-5_48•

Processing and Debugging of Parallel Programs on the Level of Task Model

[...]

Victor Samofalov, Alexander Konovalov

8 Sep 1997

TL;DR: T-model is such an algorithm model of the task which reflects structure, description and volume of information flows which permits to evaluate of the quality of parallel execution, to find bottlenecks and deadlocks and to correct the discovered weak points.

...read moreread less

Abstract: The conception of creation and debugging of parallel task model is discussed. T-model is such an algorithm model of the task which reflects structure, description and volume of information flows. Creation of T-model permits to evaluate of the quality of parallel execution, to find bottlenecks and deadlocks and to correct the discovered weak points, changing the volume and exchanges discipline as well as carrying out the decomposition again. T-model description consists hardware description and description of parallel processes.

...read moreread less

Book Chapter•10.1007/3-540-63371-5_44•

The Base Module of Multiprocessor System with Structural-Procedural Organization of Computing

[...]

Anatoly Kaliaev, Igor Kaliaev, Ilija Levin

8 Sep 1997

TL;DR: The base module of a multiprocessor system with structuralprocedural computation is considered and this module provides the system performance close to peak.

...read moreread less

Abstract: The base module of a multiprocessor system with structuralprocedural computation is considered. This module provides the system performance close to peak.

...read moreread less

Book Chapter•10.1007/3-540-63371-5_28•

3D Visual Tool Supporting Derivation of Parallel Programs for MIMD Systems

[...]

Elena Trichina¹, Juha Oinonen²•Institutions (2)

University of South Australia¹, University of Eastern Finland²

8 Sep 1997

TL;DR: A visual system is reported on, which uses a three dimensional model to describe the data dependencies in the computational domain and to visualise transformations of this domain relevant to the main steps in parallel program design.

...read moreread less

Abstract: Parallel program design and analysis is a complex activity, where many difficulties stem from the principle inadequacy of pure textual formalism to specify parallelism in an understandable fashion. In this paper we report on a visual system, which uses a three dimensional model to describe the data dependencies in the computational domain and to visualise transformations of this domain relevant to the main steps in parallel program design. We believe three dimensional interactive graphics provides an extra degree of freedom for conveying information. Using 3D graphics, we can visualise all the data dependencies evolving in time as a three dimensional graph, displaying it on a screen. Visual abstraction, animation of transformations and visualisation of their effects will shift information to the perceptual level to help assimilate mathematical notions.

...read moreread less

Book Chapter•10.1007/3-540-63371-5_23•

Viability of Multithreading on Networks of Workstations

[...]

Hantak Kwak¹, Ben Lee¹, Ali R. Hurson²•Institutions (2)

Oregon State University¹, Pennsylvania State University²

8 Sep 1997

TL;DR: Experiments indicate the performance of multithreading, with a small number of threads per processor, is very comparable to that of programs written using message-passing and has an added advantage over message-Passing in that it is relatively insensitive to initial data distribution.

...read moreread less

Abstract: Recent trend in high-performance computing focuses on networks of workstations (NOWs) as a way of realizing cost-effective parallel machines. This has been due to the availability of powerful wide-issue processors, high-speed networks, and software infrastructure systems. Due to its distributed nature, message-passing has been the choice of communication model for NOWs. This paper, however, examines the viability of using multithreading on NOWs. A matrix multiplication algorithm was studied by simulating a shared-memory abstraction on top of Parallel Virtual Machine (PVM) to characterize the behavior of multi-threading. Our experiments indicate the performance of multithreading, with a small number of threads per processor, is very comparable to that of programs written using message-passing. Our studies also show multithreading has an added advantage over message-passing in that it is relatively insensitive to initial data distribution.

...read moreread less

Book Chapter•10.1007/3-540-63371-5_29•

Scheduling Algorithms for Parallel Transaction Processing Systems

[...]

Jiahong Wang¹, Jie Li¹, Hisao Kameda¹•Institutions (1)

University of Tsukuba¹

8 Sep 1997

TL;DR: It is observed that for the shared-nothing parallel TP system, this negative effect of 2PL can be alleviated significantly by scheduling transactions judiciously and a new transaction scheduling algorithm called FCFSP (FCFS with Priority) is proposed thereby.

...read moreread less

Abstract: Shared-nothing parallel transaction processing (TP) systems have great potential to serve the ever-increasing demands for high transaction processing rate. This potential, however, may not be reached due to the negative effect of the widely used two-phase locking (2PL) concurrency control method. We observed that for the shared-nothing parallel TP system, this negative effect of 2PL can be alleviated significantly by scheduling transactions judiciously. In this paper, a new transaction scheduling algorithm called FCFSP (FCFS with Priority) is proposed thereby. In order to study the performance of transaction scheduling algorithms, a comprehensive simulator for shared-nothing parallel TP systems is developed. Using the developed simulator, the performance of FCFSP is compared with that of the conventional FCFS and the previously proposed SCST (Synchronizing Completion of SubTransactions) transaction scheduling algorithms. Simulation results demonstrate the effectiveness of FCFSP. Simulation results also show that FCFSP outperforms FCFS greatly, and overcomes the drawback of SCST.

...read moreread less

Book Chapter•10.1007/3-540-63371-5_31•

A Multithreaded Vector Co-processor

[...]

Bernard Goossens¹•Institutions (1)

University of Paris¹

8 Sep 1997

TL;DR: A multithreaded vector co-processor design is described, intended to be placed with its private vector memory, on an expansion board, linked to the scalar processor and its cache-based memory hierarchy.

...read moreread less

Abstract: A multithreaded vector co-processor design is described. It is intended to be placed with its private vector memory, on an expansion board, linked to the scalar processor and its cache-based memory hierarchy. The vector co-processor can run up to 8 vector tasks (threads) in parallel. Vector registers can be accessed either as independent sets of scalar values or as array sets. The Tomasulo's algorithm, simplified to keep the issue and termination logics simple in a multithreaded context, dynamically schedules the dependent instructions. A locking feature is provided to handle both the reductions and the complex recurrences in a vector form.

...read moreread less