Scispace (Formerly Typeset)
  1. Home
  2. Conferences
  3. Parallel Computing
  4. 2002
  1. Home
  2. Conferences
  3. Parallel Computing
  4. 2002
Showing papers presented at "Parallel Computing in 2002"
Journal Article•10.1016/S0167-8191(02)00094-7•
Data management and transfer in high-performance computational grid environments

[...]

Bill Allcock1, Joe Bester1, John Bresnahan1, Ann L. Chervenak2, Ian Foster3, Carl Kesselman2, Sam Meder1, Veronika Nefedova1, Darcy Quesnel1, Steven Tuecke1 •
Argonne National Laboratory1, University of Southern California2, University of Chicago3
1 May 2002
TL;DR: A high-speed transport service that extends the popular FTP protocol with new features required for Data Grid applications, such as striping and partial file access and a replica management service that integrates a replica catalog with GridFTP transfers to provide for the creation, registration, location, and management of dataset replicas.
Abstract: An emerging class of data-intensive applications involve the geographically dispersed extraction of complex scientific information from very large collections of measured or computed data. Such applications arise, for example, in experimental physics, where the data in question is generated by accelerators, and in simulation science, where the data is generated by supercomputers. So-called Data Grids provide essential infrastructure for such applications, much as the Internet provides essential services for applications such as e-mail and the Web. We describe here two services that we believe are fundamental to any Data Grid: reliable, high-speed transport and replica management. Our high-speed transport service, GridFTP, extends the popular FTP protocol with new features required for Data Grid applications, such as striping and partial file access. Our replica management service integrates a replica catalog with GridFTP transfers to provide for the creation, registration, location, and management of dataset replicas. We present the design of both services and also preliminary performance results. Our implementations exploit security and other services provided by the Globus Toolkit.

699 citations

Journal Article•10.1016/S0167-8191(01)00141-7•
PASTIX: a high-performance parallel direct solver for sparse symmetric positive definite systems

[...]

Pascal Hénon1, Pierre Ramet1, Jean Roman1•
University of Bordeaux1
1 Feb 2002
TL;DR: The block partitioning and scheduling problem for sparse parallel factorization without pivoting is considered, and the scalability of the parallel solver and the compromise between memory overhead and efficiency are considered.
Abstract: Solving large sparse symmetric positive definite systems of linear equations is a crucial and time-consuming step, arising in many scientific and engineering applications. The block partitioning and scheduling problem for sparse parallel factorization without pivoting is considered. There are two major aims to this study: the scalability of the parallel solver, and the compromise between memory overhead and efficiency. Parallel experiments on a large collection of irregular industrial problems validate our approach.

246 citations

Journal Article•10.1016/S0167-8191(02)00151-5•
Probabilistic methods for centroidal Voronoi tessellations and their parallel implementations

[...]

Lili Ju1, Qiang Du2, Max D. Gunzburger1•
Iowa State University1, Pennsylvania State University2
1 Oct 2002
TL;DR: By using multi-sampling in a new probabilistic algorithm, more accurate and efficient approximations of CVTs are obtained without the need to explicit construct Voronoi diagrams.
Abstract: Centroidal Voronoi tessellations (CVTs) are Voronoi tessellations of a region such that the generating points of the tessellations are also the centroids of the corresponding Voronoi cells. In this paper, some probabilistic methods for determining CVTs and their parallel implementations on distributed memory systems are presented. By using multi-sampling in a new probabilistic algorithm we introduce, more accurate and efficient approximations of CVTs are obtained without the need to explicit construct Voronoi diagrams. The new algorithm lends itself well to parallelization, i.e., near prefect linear speed up in the number of processors is achieved. The results of computational experiments performed on a CRAY T3E-600 system are provided which illustrate the superior sequential and parallel performance of the new algorithm when compared to existing algorithms. In particular, for the same amount of work, the new algorithms produce significantly more accurate CVTs.

213 citations

Journal Article•10.1016/S0167-8191(01)00135-1•
Two-level dynamic scheduling in PARDISO: improved scalability on shared memory multiprocessing systems

[...]

Olaf Schenk1, Klaus Gärtner•
University of Basel1
1 Feb 2002
TL;DR: A new parallelization strategy based on a dynamic two-level scheduling scheme that aims at minimizing cache conflicts and interprocessor communication costs and, at the same time, maximizing processor load balance and Level-3 BLAS performance is explored.
Abstract: The PARDISO package is a mathematical library of OpenMP routines for the parallel direct solution of large sparse linear systems of equations. One objective of PARDISO is to achieve a high efficiency on shared memory multiprocessing systems. A new parallelization strategy based on a dynamic two-level scheduling scheme is therefore explored. The method aims at minimizing cache conflicts and interprocessor communication costs and, at the same time, maximizing processor load balance and Level-3 BLAS performance. The synchronization events are reduced by one order of magnitude compared with a one-level scheduling strategy. This results in an efficient parallel sparse LU decomposition method. An overview of the two-level scheduling algorithm and the key algorithmic features of the solver PARDISO is given, Finally, numerical results and a comparison with another software package demonstrate the performance.

131 citations

Journal Article•10.1016/S0167-8191(02)00187-4•
ICENI: optimisation of component applications within a Grid environment

[...]

Nathalie Furmento1, Anthony Mayer1, Stephen McGough1, Steven Newhouse1, Tony Field1, John Darlington1 •
Imperial College London1
1 Dec 2002
TL;DR: Imperial College e-Science Networked Infrastructure (ICENI), a Grid middleware framework developed within the London e- science Centre, is described and the effectiveness of this architecture is demonstrated through the high-level specification and solution of a set of linear equations by automatic and selection of optimal resources and implementations.
Abstract: Effective exploitation of Computational Grids can only be achieved when applications are fully integrated with the Grid middleware and the underlying computational resources. Fundamental to this exploitation is information. Information about the structure and behaviour of the application, the capability of the computational and networking resources, and the availability and access to these resources by an individual, a group or an organisation.In this paper we describe Imperial College e-Science Networked Infrastructure (ICENI), a Grid middleware framework developed within the London e-Science Centre. ICENI is a platform-independent framework that uses open and extensible XML derived protocols, within a framework built using Java and Jini, to explore effective application execution upon distributed federated resources. We match a high-level application specification, defined as a network of components, to an optimal combination of the currently available component implementations within our Grid environment, by using composite performance models. We demonstrate the effectiveness of this architecture through the high-level specification and solution of a set of linear equations by automatic and selection of optimal resources and implementations.

125 citations

Journal Article•10.1016/S0167-8191(02)00190-4•
From patterns to frameworks to parallel programs

[...]

Steve MacDonald1, John Anvik1, Steven Bromling1, Jonathan Schaeffer1, Duane Szafron1, Kai Tan1 •
University of Alberta1
1 Dec 2002
TL;DR: The Parallel Design Patterns (PDP) process, the basis of the CO2P3S parallel programming system, combines these techniques in a layered development model, creating a new approach to parallel programming that addresses correctness and openness in a unique way.
Abstract: Object-oriented programming, design patterns, and frameworks are abstraction techniques that have been used to reduce the complexity of sequential programming. This paper describes our approach of applying these three techniques to the more difficult parallel programming domain. The Parallel Design Patterns (PDP) process, the basis of the CO2P3S parallel programming system, combines these techniques in a layered development model. The result is a new approach to parallel programming that addresses correctness and openness in a unique way. At the topmost developmem layer, a customized framework is generated from a design pattern specification of the parallel structure of the program. This framework encapsulates all of the structural details of the pattern, including communication and synchronization, to prevent programmer errors and ensure correctness. Lower layers are used only for performance tuning to make the code as efficient as necessary. This paper describes CO2P3S, based on the PDP process, and demonstrates it using an example application. We also provide results from a usability study of CO2P3S.

96 citations

Journal Article•10.1016/S0167-8191(02)00103-5•
A software architecture for user transparent parallel image processing

[...]

Frank J. Seinstra1, Dennis C. Koelma1, Jan-Mark Geusebroek1•
University of Amsterdam1
1 Aug 2002
TL;DR: Results indicate that the core of the architecture forms a powerful basis for automatic parallelization and optimization of a wide range of imaging software.
Abstract: This paper describes a software architecture that allows image processing researchers to develop parallel applications in a transparent manner. The architecture's main component is an extensive library of data parallel low level image operations capable of running on homogeneous distributed memory MIMD-style multicomputers. Since the library has an application programming interface identical to that of an existing sequential library, all parallelism is completely hidden from the user.The first part of the paper discusses implementation aspects of the parallel library, and shows how sequential as well as parallel operations are implemented on the basis of so-called parallelizable patterns. A library built in this manner is easily maintainable, as extensive code redundancy is avoided. The second part of the paper describes the application of performance models to ensure efficiency of execution on all target platforms. Experiments show that for a realistic application performance predictions are highly accurate. These results indicate that the core of the architecture forms a powerful basis for automatic parallelization and optimization of a wide range of imaging software.

74 citations

Journal Article•10.1016/S0167-8191(02)00100-X•
Video compression with parallel processing

[...]

Ishfaq Ahmad1, Yong He2, M.L. Liou3•
University of Texas at Arlington1, Motorola2, Hong Kong University of Science and Technology3
1 Aug 2002
TL;DR: An overview of the recent research in video compression using parallel processing is presented, outlining the basic philosophy of each approach and providing examples, and suggesting future research directions.
Abstract: Driven by the rapidly increasing demand for audio-visual applications, digital video compression technology has become a mature field, offering several available products based on both hardware and software implementations. Taking advantage of spatial, temporal, and statistical redundancies in video data, a video compression system aims to maximize the compression ratio while maintaining a high picture quality. Despite the tremendous progress in this area, video compression remains a challenging research problem due to its computational requirements and also because of the need for higher picture quality at lower data rates. Designing efficient coding algorithms continues to be a prolific area of research. For circumvent the computational requirement, researchers has resorted to parallel processing with a variety of approaches using dedicated parallel VLSI architectures as well as software on general-purpose available multiprocessor systems. Despite the availability of fast single processors, parallel processing helps to explore advanced algorithms and to build more sophisticated systems. This paper presents an overview of the recent research in video compression using parallel processing. The paper provides a discussion of the basic compression techniques, existing video coding standards, and various parallelization approaches. Since video compression is multi-step in nature using various algorithms, parallel processing can be exploited at an individual algorithm or at a complete system level. The paper covers a broad spectrum of such approaches, outlining the basic philosophy of each approach and providing examples. We contrast these approaches when possible, highlight their pros and cons, and suggest future research directions. While the emphasis of this paper is on software-based methods, a significant discussion of hardware and VLSI is also included.

72 citations

Journal Article•10.1016/S0167-8191(02)00097-2•
Processing large-scale multi-dimensional data in parallel and distributed environments

[...]

Michael D. Beynon1, Chialin Chang1, Ümit V. Çatalyürek2, Tahsin Kurc2, Alan Sussman1, Henrique Andrade1, Renato Ferreira1, Joel H. Saltz2 •
University of Maryland, College Park1, Ohio State University2
1 May 2002
TL;DR: This paper presents a compendium of frameworks and methods developed to support efficient execution of subsetting and aggregation operations in applications that query and manipulate large, multi-dimensional datasets in parallel and distributed computing environments.
Abstract: Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability to analyze information is increasingly becoming hindered by dataset sizes. The vast amount of data in scientific datasets makes it a difficult task to efficiently access the data of interest, and manage potentially heterogeneous system resources to process the data. Subsetting and aggregation are common operations executed in a wide range of data-intensive applications. We argue that common runtime and programming support can be developed for applications that query and manipulate large datasets. This paper presents a compendium of frameworks and methods we have developed to support efficient execution of subsetting and aggregation operations in applications that query and manipulate large, multi-dimensional datasets in parallel and distributed computing environments.

69 citations

Journal Article•10.1016/S0167-8191(02)00186-2•
Grid programming: some indications where we are headed

[...]

Domenico Laforenza1•
National Research Council1
1 Dec 2002
TL;DR: The development of Grid programming environments that would enable programmers to efficiently exploit this technology is an important and hot research issue and the most important approaches/projects conducted in this field worldwide are reviewed.
Abstract: Grid computing enables the development of large scientific applications on an unprecedented scale. Grid-aware applications, also called meta-applications or multi-disciplinary applications, make use of coupled computational resources that are not available at a single site. In this light, the Grids let scientists solve larger or new problems by pooling together resources that could not be coupled easily before. It is well known that the programmer's productivity in designing and implementing efficient distributed/parallel applications on high-performance computers is still usually a very time-consuming task. Grid computing makes the situation worse. Consequently, the development of Grid programming environments that would enable programmers to efficiently exploit this technology is an important and hot research issue.After an introduction on the main Grid programming issues, this paper will review the most important approaches/projects conducted in this field worldwide.

61 citations

Journal Article•10.1016/S0167-8191(02)00091-1•
Parallel data intensive computing in scientific and commercial applications

[...]

Mario Cannataro1, Domenico Talia2, Pradip K. Srimani3•
Indian Council of Agricultural Research1, University of Calabria2, Clemson University3
1 May 2002
TL;DR: The purpose of this introductory article is to provide an overview of the main issues in parallel data intensive computing in scientific and commercial applications and to encourage the reader to go into the more in-depth articles later in this special issue.
Abstract: Applications that explore, query, analyze, visualize, and, in general, process very large scale data sets are known as Data Intensive Applications. Large scale data intensive computing plays an increasingly important role in many scientific activities and commercial applications, whether it involves data mining of commercial transactions, experimental data analysis and visualization, or intensive simulation such as climate modeling. By combining high performance computation, very large data storage, high bandwidth access, and high-speed local and wide area networking, data intensive computing enhances the technical capabilities and usefulness of most systems. The integration of parallel and distributed computational environments will produce major improvements in performance for both computing intensive and data intensive applications in the future. The purpose of this introductory article is to provide an overview of the main issues in parallel data intensive computing in scientific and commercial applications and to encourage the reader to go into the more in-depth articles later in this special issue.
Journal Article•10.1016/S0167-8191(02)00135-7•
An efficient algorithm for constructing Hamiltonian paths in meshes

[...]

Shao Dong Chen, Hong Shen1, Rodney Topor2•
Japan Advanced Institute of Science and Technology1, Griffith University2
1 Sep 2002
TL;DR: This paper presents an efficient linear-time sequential algorithm for constructing Hamiltonian paths between two given vertices in meshes with horizontal size m and vertical size n and shows that the algorithm can be optimally parallelized to obtain a constant-time parallel algorithm on the weakest parallel machine without need of inter-processor communication.
Abstract: This paper presents an efficient linear-time sequential algorithm for constructing Hamiltonian paths between two given vertices in meshes with horizontal size m and vertical size n. The algorithm first partitions the given mesh into a number of submeshes in constant steps, and then constructs a Hamiltonian cycle or path in each submesh and combines them together to become a complete Hamiltonian path in mn steps. Our algorithm has improved the previous algorithm [6] by reducing the number of partition steps from O(m + n) to only a constant. Moreover, we show that our algorithm can be optimally parallelized to obtain a constant-time parallel algorithm on the weakest parallel machine without need of inter-processor communication, while this cannot be achieved for the previous algorithm.
Journal Article•10.1016/S0167-8191(02)00191-6•
Parallel components for PDEs and optimization: some issues and experiences

[...]

Boyana Norris1, Satish Balay1, Steven J. Benson1, Lori A. Freitag1, Paul D. Hovland1, Lois Curfman McInnes1, Barry Smith1 •
Argonne National Laboratory1
1 Dec 2002
TL;DR: This paper discusses recent work on building component interfaces and implementations in parallel numerical toolkits for mesh manipulations, discretization, linear algebra, and optimization for high-performance simulations in computational science.
Abstract: High-performance simulations in computational science often involve the combined software contributions of multidisciplinary teams of scientists, engineers, mathematicians, and computer scientists. One goal of component-based software engineering in large-scale scientific simulations is to help manage such complexity by enabling better interoperability among codes developed by different groups. This paper discusses recent work on building component interfaces and implementations in parallel numerical toolkits for mesh manipulations, discretization, linear algebra, and optimization. We consider several motivating applications involving partial differential equations and unconstrained minimization to demonstrate this approach and evaluate performance.
Journal Article•10.1016/S0167-8191(02)00105-9•
A data and task parallel image processing environment

[...]

Cristina Nicolescu1, Pieter Jonker1•
Delft University of Technology1
1 Aug 2002
TL;DR: A data and task parallel low-level image processing environment for distributed memory systems that is parallelized by data decomposition using algorithmic skeletons and validated on the multi-baseline stereo vision application.
Abstract: The paper presents a data and task parallel low-level image processing environment for distributed memory systems. Image processing operators are parallelized by data decomposition using algorithmic skeletons. Image processing applications are parallelized by task decomposition, based on the image application task graph. In this way, an image processing application can be parallelized both by data and task decomposition, and thus better speed-ups can be obtained. We validate our method on the multi-baseline stereo vision application.
Book Chapter•10.1007/3-540-48051-X_42•
Scheduling Strategies for Master-Slave Tasking on Heterogeneous Processor Grids

[...]

C. Banino1, Olivier Beaumont1, Arnaud Legrand2, Yves Robert2•
L'Abri1, French Institute for Research in Computer Science and Automation2
15 Jun 2002
TL;DR: This paper uses a non-oriented graph to model a grid, where resources can have different speeds of computation and communication, as well as different overlap capabilities, and shows how to determine the optimal steady-state scheduling strategy for each processor.
Abstract: In this paper, we consider the problem of allocating a large number of independent, equal-sized tasks to a heterogeneous "grid" computing platform. We use a non-oriented graph to model a grid, where resources can have different speeds of computation and communication, as well as different overlap capabilities. We show how to determine the optimal steady-state scheduling strategy for each processor.Because spanning trees are easier to deal with in practice, a natural question arises: how to extract the best spanning tree, i.e. the one with optimal steady-state throughput, out of a general interconnection graph? We show that this problem is NP-Complete. Still, we introduce and compare several low-complexity heuristics to determine a sub-optimal spanning tree.
Journal Article•10.1016/S0167-8191(01)00128-4•
MODTRAN on supercomputers and parallel computers

[...]

Ping Wang1, Karen Y. Liu1, Tom Cwik1, Robert O. Green1•
California Institute of Technology1
7 Jan 2002
TL;DR: A flexible, parallel version of MODTRAN is implemented on the Cray T3E, the HP SPP2000, and a Beowulf-class cluster computer using domain decomposition techniques and the Message Passing Interface (MPI) library.
Abstract: To enable efficient reduction of large data sets such as is done in the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) project at the Jet Propulsion Laboratory (JPL), a high performance version of MODTRAN is essential. One means to accomplish this is to apply the computational resources of parallel computer systems. In our present work, a flexible, parallel version of MODTRAN has been implemented on the Cray T3E, the HP SPP2000, and a Beowulf-class cluster computer using domain decomposition techniques and the Message Passing Interface (MPI) library. In this paper, porting the sequential MODTRAN to various platforms is discussed; strategies of designing a parallel version of MODTRAN are developed; detailed implementation for a parallel MODTRAN is reported, and performance data of the parallel code on various computers are presented. Near linear scaling performance of parallel MODTRAN has been obtained, and comparisons of wallclock time are made among various supercomputers and parallel computers. The parallel version of MODTRAN gives excellent speedup, which dramatically reduces total data processing time for many applications such as the AVIRIS project at JPL.
Journal Article•10.1016/S0167-8191(02)00092-3•
Reconciling simplicity and realism in parallel disk modelsy

[...]

Peter Sanders1•
Max Planck Society1
1 May 2002
TL;DR: In this paper, the authors propose a model that implements one large logical disk allowing concurrent access to arbitrary sets of variable size blocks, which can be implemented efficiently on multiple independent disks even if zones with different speed, communication bottlenecks and failed disks are allowed.
Abstract: For the design and analysis of algorithms that process huge data sets, a machine model is needed that handles parallel disks. There seems to be a dilemma between simple and flexible use of such a model and accurate modeling of details of the hardware. This paper explains how many aspects of this problem can be resolved. The programming model implements one large logical disk allowing concurrent access to arbitrary sets of variable size blocks. This model can be implemented efficiently on multiple independent disks even if zones with different speed, communication bottlenecks and failed disks are allowed. These results not only provide useful algorithmic tools but also imply a theoretical justification for studying external memory algorithms using simple abstract models.The algorithmic approach is random redundant placement of data and optimal scheduling of accesses. The analysis generalizes a previous analysis for simple abstract external memory models in several ways (higher efficiency, variable block sizes, more detailed disk model).
Proceedings Article•10.1142/9781860949630_0025•
Design of a parallel and distributed web search engine

[...]

Salvatore Orlando, Raffaele Perego, Fabrizio Silvestri
1 Jul 2002
TL;DR: MOSE as mentioned in this paper is a scalable parallel and distributed engine for searching the web, specifically designed to efficiently exploit affordable parallel architectures, such as clusters of workstations, which can be easily adjusted to fulfill the bandwidth requirements of the application at hand.
Abstract: This paper describes the architecture of MOSE (My Own Search Engine), a scalable parallel and distributed engine for searching the web. MOSE was specifically designed to efficiently exploit affordable parallel architectures, such as clusters of workstations. Its modular and scalable architecture can be easily adjusted to fulfill the bandwidth requirements of the application at hand. Both task-parallel and data-parallel approaches are exploited within MOSE in order to increase the throughput and efficiently use communication, storing and computational resources. We used a collection of html documents as a benchmark and conducted preliminary experiments on a cluster of three SMP Linux PCs.
Journal Article•10.1016/S0167-8191(02)00185-0•
Middleware for the use of storage in communication

[...]

Micah Beck1, Dorian Arnold2, Alessandro Bassi3, Francine Berman4, Henri Casanova4, Jack Dongarra1, Terry Moore1, Graziano Obertelli5, James S. Plank1, Martin Swany5, Sathish Vadhiyar1, Rich Wolski3 •
University of Tennessee1, University of Wisconsin-Madison2, École normale supérieure de Lyon3, University of California, San Diego4, University of California, Santa Barbara5
1 Dec 2002
TL;DR: The Logistical Computing and Internetworking project is a reflection of the way that the next generation internetworking fundamentally changes the authors' definition of high performance wide area computing, with a richer view of the use of storage in communication and information sharing.
Abstract: The Logistical Computing and Internetworking (LoCI) project is a reflection of the way that the next generation internetworking fundamentally changes our definition of high performance wide area computing. A key to achieving this aim is the development of middleware that can provide reliable, flexible, scalable, and cost-effective delivery of data with quality of service guarantees to support high performance applications of all types. The LoCI effort attacks this problem with a simple but innovative strategy. At the base of the LoCI project is a richer view of the use of storage in communication and information sharing.
Book Chapter•10.1007/3-540-48051-X_50•
Reliability Bounds for Large Multistage Interconnection Networks

[...]

Nasser Fard1, Indra Gunawan1•
Northeastern University1
15 Jun 2002
TL;DR: The derivation of terminal, broadcast, lower and upper bounds network reliability expressions of the extra-stage cube network will be demonstrated and lower bound reliability provides sufficient assurance that the system will be operational at some specified time.
Abstract: To derive the exact reliability expressions for large Multi-stage Interconnection Networks (MINs) can become rather complex. As network size increases, the reliability bounds could be used to estimate the reliability of the networks. In this paper, terminal, broadcast, lower and upper bounds network reliability will be determined. Lower bound reliability is the minimum probability that the system will be operational for a specified time. Upper bound reliability presents an optimistic view of probability that the system will work at some specified time, which is not the center of attention in terms of reliability point of view. If the lower bound reliability provides sufficient assurance that the system will be operational at some specified time, then no further effort for obtaining the exact reliability expression is necessary. As examples, the derivation of terminal, broadcast, lower and upper bounds network reliability expressions of the extra-stage cube network will be demonstrated.
Journal Article•10.1016/S0167-8191(02)00106-0•
Towards a general framework for FPGA based image processing using hardware skeletons

[...]

Khaled Benkrid1, Danny Crookes1, A. Benkrid1•
Queen's University Belfast1
1 Aug 2002
TL;DR: This paper presents and illustrates the approach to developing a general framework for FPGA based Image Processing based on a library of hardware skeletons, with optimised implementations specifically for Xilinx XC4000 FPGAs.
Abstract: In this paper, we present our approach to developing a general framework for FPGA based Image Processing. This framework is based on a library of hardware skeletons. A hardware skeleton is a parameterised description of a task-specific architecture. A skeleton's implementation will apply optimisations specific to the target hardware. The library normally contains a range of alternative skeletons for the same task, perhaps tailored for different data representations. The library also contains high level skeletons for compound operations, whose implementation can apply appropriate optimisations. Given a complete algorithm description in terms of skeletons, an efficient hardware configuration is generated automatically. We have developed a library of hardware skeletons for common image processing tasks, with optimised implementations specifically for Xilinx XC4000 FPGAs. This paper presents and illustrates our hardware skeleton approach in the context of some common image processing tasks. It demonstrates our approach to the broader problem of achieving optimised hardware configurations while retaining the convenience and rapid development cycle of an application-oriented, high level programming model.
Journal Article•10.1016/S0167-8191(01)00142-9•
Generalized least-squares polynomial preconditioners for symmetric indefinite linear equations

[...]

Yu Liang1, Jim Weston1, Marek Szularz1•
Ulster University1
1 Feb 2002
TL;DR: The GLS preconditioning polynomial and its influence on the flexible generalized minimized residual (FGMRES) solver are discussed in this paper and experimental results using classical benchmark systems are presented.
Abstract: Polynomial preconditioners are frequently used in a parallel environment for the computation of the solution of large-scale sparse linear equations (Ax = b) because of their easy implementation and trivial parallelization. With respect to symmetrical indefinite (SID) linear systems, the use of generalized least-squares (GLS) polynomial preconditioning is preferable to other polynomial preconditioning methods because of the ability to use a three-term recurrence relationship and the low implementation costs. The GLS preconditioning polynomial and its influence on the flexible generalized minimized residual (FGMRES) solver are discussed in this paper. The orthogonal polynomials required in the solution of the least-squares approximation problem are constructed using the Stieltjes procedure in multiple disjoint intervals which exclude the origin. The time-consuming numerical integration associated with this procedure is computed efficiently using Chebyshev polynomials of the first kind and the GLS polynomial reconditioned FGMRES algorithm is implemented using MPI in a highly parallel IBM SP2 environment. Experimental results using classical benchmark systems are presented and compared with those obtained using the recently developed SPAI preconditioned Bi-CGSTAB iterative method. The performance of the GLS preconditioned FGMRES solver is critically accessed.
Book Chapter•10.1007/3-540-48051-X_32•
Parallel and Blocked Algorithms for Reduction of a Regular Matrix Pair to Hessenberg-Triangular and Generalized Schur Forms

[...]

Björn Adlerborn1, Krister Dackland1, Bo Kågström1•
Umeå University1
15 Jun 2002
TL;DR: Algorithm and implementation issues regarding the single-/double-shift QZ algorithm are discussed and multishift strategies to enhance the performance in blocked as well as in parallell variants of the QZ method are described.
Abstract: A parallel three-stage algorithm for reduction of a regular matrix pair (A, B) to generalized Schur from (S, T) is presented. The first two stages transform (A, B) to upper Hessenberg-triangular form (H, T) using orthogonal equivalence transformations. The third stage iteratively reduces the matrix in (H, T) form to generalized Schur form. Algorithm and implementation issues regarding the single-/double-shift QZ algorithm are discussed. We also describe multishift strategies to enhance the performance in blocked as well as in parallell variants of the QZ method.
Journal Article•10.1016/S0167-8191(01)00144-2•
Parallel matrix computations in air pollution modelling

[...]

Wojciech Owczarz1, Zahari Zlatev•
Technical University of Denmark1
1 Feb 2002
TL;DR: Some results, which are obtained when several versions of a large-scale air pollution model are run on different parallel architectures, will be presented in this paper.
Abstract: Mathematical models for large-scale air pollution studies consist of systems of partial differential equations (PDEs). The number of equations in these systems of PDEs is equal to the number of chemical compounds (the number of chemical compounds involved in the current large-scale air pollution models varies from 20 to about 200). The space domain of the systems of PDEs is normally very large, because the models must be able to treat transboundary long-range transport of the harmful pollutants. The time-intervals are often very long (runs with meteorological data covering up to 10 years have sometimes to be carried out). Moreover, fine spatial and temporal resolution is as a rule required. This leads to very large computational tasks when the air pollution models are discretized. Therefore, it is necessary to use fast and sufficiently accurate numerical methods as well as to exploit efficiently the great potential power of the parallel computers. Some results, which are obtained when several versions of a large-scale air pollution model are run on different parallel architectures, will be presented in this paper.
Journal Article•10.1016/S0167-8191(02)00076-5•
Madeleine II: a portable and efficient communication library for high-performance cluster computing

[...]

Olivier Aumage1, Luc Bougé1, Jean-François Méhaut1, Raymond Namyst1•
École normale supérieure de Lyon1
1 Apr 2002
TL;DR: Madeleine II as mentioned in this paper is an adaptive and portable multiprotocol communication library for high-performance multithreaded applications, which has the ability to control multiple network protocols (BIP, SISCI, VIA) and multiple network adapters (ETHERNET, MYRINET, SCI).
Abstract: This paper introduces Madeleine II, an adaptive and portable multiprotocol communication library for high-performance multithreaded applications. Madeleine II has the ability to control multiple network protocols (BIP, SISCI, VIA) and multiple network adapters (ETHERNET, MYRINET, SCI). Moreover, it includes advanced mechanisms to dynamically select the most appropriate transfer method for a given network protocol according to various parameters such as data size or responsiveness user requirements. We report on performance measurements obtained using various protocols and we present preliminary results about porting the MPICH and the NEXUS communication libraries on top of Madeleine II.
Book Chapter•10.1007/3-540-48051-X_4•
Grid Computing: Enabling a Vision for Collaborative Research

[...]

Gregor von Laszewski1•
Argonne National Laboratory1
15 Jun 2002
TL;DR: This paper provides an overview showing why Grid research is difficult, and a number of management-related issues that must be addressed to make Grids a reality.
Abstract: In this paper we provide a motivation for Grid computing based on a vision to enable a collaborative research environment. Our vision goes beyond the connection of hardware resources. We argue that with an infrastructure such as the Grid, new modalities for collaborative research are enabled. We provide an overview showing why Grid research is difficult, and we present a number of management-related issues that must be addressed to make Grids a reality. We list projects that provide solutions to subsets of these issues.
Book Chapter•10.1007/3-540-48051-X_29•
A Recursive Formulation of the Inversion of Symmetric Positive Definite Matrices in Packed Storage Data Format

[...]

Bjarne Stig Andersen1, John A. Gunnels2, Fred G. Gustavson2, Jerzy Wasniewski1•
Technical University of Denmark1, IBM2
15 Jun 2002
TL;DR: A new Recursive Packed Inverse Calculation Algorithm for symmetric positive definite matrices has been developed and has nearly the same performance as the LAPACK full storage algorithm using n2 memory words.
Abstract: A new Recursive Packed Inverse Calculation Algorithm for symmetric positive definite matrices has been developed The new Recursive Inverse Calculation algorithm uses minimal storage, n(n + 1)/2, and has nearly the same performance as the LAPACK full storage algorithm using n2 memory words New recursive packed BLAS needed for this algorithm have been developed too Two transformation routines, from the LAPACK packed storage data format to the recursive storage data format were added to the package tooWe present performance measurements on several current architectures that demonstrate improvements over the traditional packed routines
Journal Article•10.1016/S0167-8191(02)00199-0•
Advanced environments for parallel and distributed applications: a view of current status

[...]

Pasqua D'Ambra1, Marco Danelutto2, Daniela di Serafino3, Marco Lapegna4•
Indian Council of Agricultural Research1, University of Pisa2, Seconda Università degli Studi di Napoli3, University of Naples Federico II4
1 Dec 2002
TL;DR: A view of the design and development activity concerning advanced environments for parallel and distributed computing is provided, and a "classification" of these environments into two main classes: programming environments and problems solving environments is come up.
Abstract: In this paper we provide a view of the design and development activity concerning advanced environments for parallel and distributed computing. We start from assessing the main issues driving this research track, in the areas of hardware and software technology and of applications. Then, we identify some key concepts, that can be considered as common guidelines and goals in the development of modern advanced environments, and we come up with a "classification" of these environments into two main classes: programming environments and problems solving environments. Both classes are widely discussed, in light of the key concepts previously outlined, and several examples are provided, in order to give a picture of the current status and trends.
Journal Article•10.1016/S0167-8191(02)00074-1•
CPU and incremental memory allocation in dynamic parallelization of SQL Queries

[...]

Abdelkader Hameurlain1, Franck Morvan1•
Paul Sabatier University1
1 Apr 2002
TL;DR: An incremental parallelization method which carries out simultaneously both scheduling and mapping in co-operation with two incremental memory allocation heuristics (ParAd: parallelism degree adjustment, and MaCRelax: mapping clues relaxation) in a dynamic multi-user context is proposed.
Abstract: In order to re-adjust the parallel execution of SQL queries in case of metric estimation or discretization errors, we propose an incremental parallelization method which carries out simultaneously both scheduling and mapping in co-operation with two incremental memory allocation heuristics (ParAd: parallelism degree adjustment, and MaCRelax: mapping clues relaxation) in a dynamic multi-user context. The two incremental memory allocation heuristics are integrated in the mapping method which attempt to avoid time-consuming multibucket join execution generating numerous additional I/O. A performance evaluation of the ParAd heuristic shows: (i) a significant join response time savings (from 16.11% to 35.62%), and (ii) with many complex queries, a more significant gain in response time (from 29% to 54%).
Journal Article•10.1016/S0167-8191(02)00078-9•
A scalable parallel algorithm for training a hierarchical mixture of neural experts

[...]

Pablo A. Estevez1, Hélène Paugam-Moisy2, Didier Puzenat2, Manuel Ugarte1•
University of Chile1, Centre national de la recherche scientifique2
1 Jun 2002
TL;DR: An analysis of the models shows that the parallel algorithms are highly scalable when the size of the experts grows from linear units to multi-layer perceptrons (MLPs) and achieving near-linear speedups for HME-MLP.
Abstract: Efficient parallel learning algorithms are proposed for training a powerful modular neural network, the hierarchical mixture of experts (HME). Parallelizations are based on the concept of modular parallelism, i.e. parallel execution of network modules. From modeling the speed-up as a function of the number of processors and the number of training examples, several improvements are derived, such as pipelining the training examples by packets. Compared to experimental measurements, theoretical models are accurate. For regular topologies, an analysis of the models shows that the parallel algorithms are highly scalable when the size of the experts grows from linear units to multi-layer perceptrons (MLPs). These results are confirmed experimentally, achieving near-linear speedups for HME-MLP. Although this work can be viewed as a case study in the parallelization of HME neural networks, both algorithms and theoretical models can be expanded to different learning rules or less regular tree architectures.
...

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve