Top 151 papers presented at Parallel Computing in 2002

Showing papers presented at "Parallel Computing in 2002"

Journal Article•10.1016/S0167-8191(02)00094-7•

Data management and transfer in high-performance computational grid environments

[...]

Bill Allcock¹, Joe Bester¹, John Bresnahan¹, Ann L. Chervenak², Ian Foster³, Carl Kesselman², Sam Meder¹, Veronika Nefedova¹, Darcy Quesnel¹, Steven Tuecke¹ - Show less +6 more•Institutions (3)

Argonne National Laboratory¹, University of Southern California², University of Chicago³

1 May 2002

TL;DR: A high-speed transport service that extends the popular FTP protocol with new features required for Data Grid applications, such as striping and partial file access and a replica management service that integrates a replica catalog with GridFTP transfers to provide for the creation, registration, location, and management of dataset replicas.

...read moreread less

Abstract: An emerging class of data-intensive applications involve the geographically dispersed extraction of complex scientific information from very large collections of measured or computed data. Such applications arise, for example, in experimental physics, where the data in question is generated by accelerators, and in simulation science, where the data is generated by supercomputers. So-called Data Grids provide essential infrastructure for such applications, much as the Internet provides essential services for applications such as e-mail and the Web. We describe here two services that we believe are fundamental to any Data Grid: reliable, high-speed transport and replica management. Our high-speed transport service, GridFTP, extends the popular FTP protocol with new features required for Data Grid applications, such as striping and partial file access. Our replica management service integrates a replica catalog with GridFTP transfers to provide for the creation, registration, location, and management of dataset replicas. We present the design of both services and also preliminary performance results. Our implementations exploit security and other services provided by the Globus Toolkit.

...read moreread less

699 citations

Journal Article•10.1016/S0167-8191(01)00141-7•

PASTIX: a high-performance parallel direct solver for sparse symmetric positive definite systems

[...]

Pascal Hénon¹, Pierre Ramet¹, Jean Roman¹•Institutions (1)

University of Bordeaux¹

1 Feb 2002

TL;DR: The block partitioning and scheduling problem for sparse parallel factorization without pivoting is considered, and the scalability of the parallel solver and the compromise between memory overhead and efficiency are considered.

...read moreread less

Abstract: Solving large sparse symmetric positive definite systems of linear equations is a crucial and time-consuming step, arising in many scientific and engineering applications. The block partitioning and scheduling problem for sparse parallel factorization without pivoting is considered. There are two major aims to this study: the scalability of the parallel solver, and the compromise between memory overhead and efficiency. Parallel experiments on a large collection of irregular industrial problems validate our approach.

...read moreread less

246 citations

Journal Article•10.1016/S0167-8191(02)00151-5•

Probabilistic methods for centroidal Voronoi tessellations and their parallel implementations

[...]

Lili Ju¹, Qiang Du², Max D. Gunzburger¹•Institutions (2)

Iowa State University¹, Pennsylvania State University²

1 Oct 2002

TL;DR: By using multi-sampling in a new probabilistic algorithm, more accurate and efficient approximations of CVTs are obtained without the need to explicit construct Voronoi diagrams.

...read moreread less

Abstract: Centroidal Voronoi tessellations (CVTs) are Voronoi tessellations of a region such that the generating points of the tessellations are also the centroids of the corresponding Voronoi cells. In this paper, some probabilistic methods for determining CVTs and their parallel implementations on distributed memory systems are presented. By using multi-sampling in a new probabilistic algorithm we introduce, more accurate and efficient approximations of CVTs are obtained without the need to explicit construct Voronoi diagrams. The new algorithm lends itself well to parallelization, i.e., near prefect linear speed up in the number of processors is achieved. The results of computational experiments performed on a CRAY T3E-600 system are provided which illustrate the superior sequential and parallel performance of the new algorithm when compared to existing algorithms. In particular, for the same amount of work, the new algorithms produce significantly more accurate CVTs.

...read moreread less

213 citations

Journal Article•10.1016/S0167-8191(01)00135-1•

Two-level dynamic scheduling in PARDISO: improved scalability on shared memory multiprocessing systems

[...]

Olaf Schenk¹, Klaus Gärtner•Institutions (1)

University of Basel¹

1 Feb 2002

TL;DR: A new parallelization strategy based on a dynamic two-level scheduling scheme that aims at minimizing cache conflicts and interprocessor communication costs and, at the same time, maximizing processor load balance and Level-3 BLAS performance is explored.

...read moreread less

Abstract: The PARDISO package is a mathematical library of OpenMP routines for the parallel direct solution of large sparse linear systems of equations. One objective of PARDISO is to achieve a high efficiency on shared memory multiprocessing systems. A new parallelization strategy based on a dynamic two-level scheduling scheme is therefore explored. The method aims at minimizing cache conflicts and interprocessor communication costs and, at the same time, maximizing processor load balance and Level-3 BLAS performance. The synchronization events are reduced by one order of magnitude compared with a one-level scheduling strategy. This results in an efficient parallel sparse LU decomposition method. An overview of the two-level scheduling algorithm and the key algorithmic features of the solver PARDISO is given, Finally, numerical results and a comparison with another software package demonstrate the performance.

...read moreread less

131 citations

Journal Article•10.1016/S0167-8191(02)00187-4•

ICENI: optimisation of component applications within a Grid environment

[...]

Nathalie Furmento¹, Anthony Mayer¹, Stephen McGough¹, Steven Newhouse¹, Tony Field¹, John Darlington¹ - Show less +2 more•Institutions (1)

Imperial College London¹

1 Dec 2002

TL;DR: Imperial College e-Science Networked Infrastructure (ICENI), a Grid middleware framework developed within the London e- science Centre, is described and the effectiveness of this architecture is demonstrated through the high-level specification and solution of a set of linear equations by automatic and selection of optimal resources and implementations.

...read moreread less

Abstract: Effective exploitation of Computational Grids can only be achieved when applications are fully integrated with the Grid middleware and the underlying computational resources. Fundamental to this exploitation is information. Information about the structure and behaviour of the application, the capability of the computational and networking resources, and the availability and access to these resources by an individual, a group or an organisation.In this paper we describe Imperial College e-Science Networked Infrastructure (ICENI), a Grid middleware framework developed within the London e-Science Centre. ICENI is a platform-independent framework that uses open and extensible XML derived protocols, within a framework built using Java and Jini, to explore effective application execution upon distributed federated resources. We match a high-level application specification, defined as a network of components, to an optimal combination of the currently available component implementations within our Grid environment, by using composite performance models. We demonstrate the effectiveness of this architecture through the high-level specification and solution of a set of linear equations by automatic and selection of optimal resources and implementations.

...read moreread less

125 citations

Journal Article•10.1016/S0167-8191(02)00190-4•

From patterns to frameworks to parallel programs

[...]

Steve MacDonald¹, John Anvik¹, Steven Bromling¹, Jonathan Schaeffer¹, Duane Szafron¹, Kai Tan¹ - Show less +2 more•Institutions (1)

University of Alberta¹

1 Dec 2002

TL;DR: The Parallel Design Patterns (PDP) process, the basis of the CO2P3S parallel programming system, combines these techniques in a layered development model, creating a new approach to parallel programming that addresses correctness and openness in a unique way.

...read moreread less

Abstract: Object-oriented programming, design patterns, and frameworks are abstraction techniques that have been used to reduce the complexity of sequential programming. This paper describes our approach of applying these three techniques to the more difficult parallel programming domain. The Parallel Design Patterns (PDP) process, the basis of the CO2P3S parallel programming system, combines these techniques in a layered development model. The result is a new approach to parallel programming that addresses correctness and openness in a unique way. At the topmost developmem layer, a customized framework is generated from a design pattern specification of the parallel structure of the program. This framework encapsulates all of the structural details of the pattern, including communication and synchronization, to prevent programmer errors and ensure correctness. Lower layers are used only for performance tuning to make the code as efficient as necessary. This paper describes CO2P3S, based on the PDP process, and demonstrates it using an example application. We also provide results from a usability study of CO2P3S.

...read moreread less

96 citations

Journal Article•10.1016/S0167-8191(02)00103-5•

A software architecture for user transparent parallel image processing

[...]

Frank J. Seinstra¹, Dennis C. Koelma¹, Jan-Mark Geusebroek¹•Institutions (1)

University of Amsterdam¹

1 Aug 2002

TL;DR: Results indicate that the core of the architecture forms a powerful basis for automatic parallelization and optimization of a wide range of imaging software.

...read moreread less

Abstract: This paper describes a software architecture that allows image processing researchers to develop parallel applications in a transparent manner. The architecture's main component is an extensive library of data parallel low level image operations capable of running on homogeneous distributed memory MIMD-style multicomputers. Since the library has an application programming interface identical to that of an existing sequential library, all parallelism is completely hidden from the user.The first part of the paper discusses implementation aspects of the parallel library, and shows how sequential as well as parallel operations are implemented on the basis of so-called parallelizable patterns. A library built in this manner is easily maintainable, as extensive code redundancy is avoided. The second part of the paper describes the application of performance models to ensure efficiency of execution on all target platforms. Experiments show that for a realistic application performance predictions are highly accurate. These results indicate that the core of the architecture forms a powerful basis for automatic parallelization and optimization of a wide range of imaging software.

...read moreread less

74 citations

Journal Article•10.1016/S0167-8191(02)00100-X•

Video compression with parallel processing

[...]

Ishfaq Ahmad¹, Yong He², M.L. Liou³•Institutions (3)

University of Texas at Arlington¹, Motorola², Hong Kong University of Science and Technology³

1 Aug 2002

TL;DR: An overview of the recent research in video compression using parallel processing is presented, outlining the basic philosophy of each approach and providing examples, and suggesting future research directions.

...read moreread less

Abstract: Driven by the rapidly increasing demand for audio-visual applications, digital video compression technology has become a mature field, offering several available products based on both hardware and software implementations. Taking advantage of spatial, temporal, and statistical redundancies in video data, a video compression system aims to maximize the compression ratio while maintaining a high picture quality. Despite the tremendous progress in this area, video compression remains a challenging research problem due to its computational requirements and also because of the need for higher picture quality at lower data rates. Designing efficient coding algorithms continues to be a prolific area of research. For circumvent the computational requirement, researchers has resorted to parallel processing with a variety of approaches using dedicated parallel VLSI architectures as well as software on general-purpose available multiprocessor systems. Despite the availability of fast single processors, parallel processing helps to explore advanced algorithms and to build more sophisticated systems. This paper presents an overview of the recent research in video compression using parallel processing. The paper provides a discussion of the basic compression techniques, existing video coding standards, and various parallelization approaches. Since video compression is multi-step in nature using various algorithms, parallel processing can be exploited at an individual algorithm or at a complete system level. The paper covers a broad spectrum of such approaches, outlining the basic philosophy of each approach and providing examples. We contrast these approaches when possible, highlight their pros and cons, and suggest future research directions. While the emphasis of this paper is on software-based methods, a significant discussion of hardware and VLSI is also included.

...read moreread less

72 citations

Journal Article•10.1016/S0167-8191(02)00097-2•

Processing large-scale multi-dimensional data in parallel and distributed environments

[...]

Michael D. Beynon¹, Chialin Chang¹, Ümit V. Çatalyürek², Tahsin Kurc², Alan Sussman¹, Henrique Andrade¹, Renato Ferreira¹, Joel H. Saltz² - Show less +4 more•Institutions (2)

University of Maryland, College Park¹, Ohio State University²

1 May 2002

TL;DR: This paper presents a compendium of frameworks and methods developed to support efficient execution of subsetting and aggregation operations in applications that query and manipulate large, multi-dimensional datasets in parallel and distributed computing environments.

...read moreread less

Abstract: Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability to analyze information is increasingly becoming hindered by dataset sizes. The vast amount of data in scientific datasets makes it a difficult task to efficiently access the data of interest, and manage potentially heterogeneous system resources to process the data. Subsetting and aggregation are common operations executed in a wide range of data-intensive applications. We argue that common runtime and programming support can be developed for applications that query and manipulate large datasets. This paper presents a compendium of frameworks and methods we have developed to support efficient execution of subsetting and aggregation operations in applications that query and manipulate large, multi-dimensional datasets in parallel and distributed computing environments.

...read moreread less

69 citations

Journal Article•10.1016/S0167-8191(02)00186-2•

Grid programming: some indications where we are headed

[...]

Domenico Laforenza¹•Institutions (1)

National Research Council¹

1 Dec 2002

TL;DR: The development of Grid programming environments that would enable programmers to efficiently exploit this technology is an important and hot research issue and the most important approaches/projects conducted in this field worldwide are reviewed.

...read moreread less

Abstract: Grid computing enables the development of large scientific applications on an unprecedented scale. Grid-aware applications, also called meta-applications or multi-disciplinary applications, make use of coupled computational resources that are not available at a single site. In this light, the Grids let scientists solve larger or new problems by pooling together resources that could not be coupled easily before. It is well known that the programmer's productivity in designing and implementing efficient distributed/parallel applications on high-performance computers is still usually a very time-consuming task. Grid computing makes the situation worse. Consequently, the development of Grid programming environments that would enable programmers to efficiently exploit this technology is an important and hot research issue.After an introduction on the main Grid programming issues, this paper will review the most important approaches/projects conducted in this field worldwide.

...read moreread less

61 citations

Journal Article•10.1016/S0167-8191(02)00091-1•

Parallel data intensive computing in scientific and commercial applications

[...]

Mario Cannataro¹, Domenico Talia², Pradip K. Srimani³•Institutions (3)

Indian Council of Agricultural Research¹, University of Calabria², Clemson University³

1 May 2002

TL;DR: The purpose of this introductory article is to provide an overview of the main issues in parallel data intensive computing in scientific and commercial applications and to encourage the reader to go into the more in-depth articles later in this special issue.

...read moreread less

Abstract: Applications that explore, query, analyze, visualize, and, in general, process very large scale data sets are known as Data Intensive Applications. Large scale data intensive computing plays an increasingly important role in many scientific activities and commercial applications, whether it involves data mining of commercial transactions, experimental data analysis and visualization, or intensive simulation such as climate modeling. By combining high performance computation, very large data storage, high bandwidth access, and high-speed local and wide area networking, data intensive computing enhances the technical capabilities and usefulness of most systems. The integration of parallel and distributed computational environments will produce major improvements in performance for both computing intensive and data intensive applications in the future. The purpose of this introductory article is to provide an overview of the main issues in parallel data intensive computing in scientific and commercial applications and to encourage the reader to go into the more in-depth articles later in this special issue.

...read moreread less

Journal Article•10.1016/S0167-8191(02)00135-7•

An efficient algorithm for constructing Hamiltonian paths in meshes

[...]

Shao Dong Chen, Hong Shen¹, Rodney Topor²•Institutions (2)

Japan Advanced Institute of Science and Technology¹, Griffith University²

1 Sep 2002

TL;DR: This paper presents an efficient linear-time sequential algorithm for constructing Hamiltonian paths between two given vertices in meshes with horizontal size m and vertical size n and shows that the algorithm can be optimally parallelized to obtain a constant-time parallel algorithm on the weakest parallel machine without need of inter-processor communication.

...read moreread less

Abstract: This paper presents an efficient linear-time sequential algorithm for constructing Hamiltonian paths between two given vertices in meshes with horizontal size m and vertical size n. The algorithm first partitions the given mesh into a number of submeshes in constant steps, and then constructs a Hamiltonian cycle or path in each submesh and combines them together to become a complete Hamiltonian path in mn steps. Our algorithm has improved the previous algorithm [6] by reducing the number of partition steps from O(m + n) to only a constant. Moreover, we show that our algorithm can be optimally parallelized to obtain a constant-time parallel algorithm on the weakest parallel machine without need of inter-processor communication, while this cannot be achieved for the previous algorithm.

...read moreread less

Journal Article•10.1016/S0167-8191(02)00191-6•

Parallel components for PDEs and optimization: some issues and experiences

[...]

Boyana Norris¹, Satish Balay¹, Steven J. Benson¹, Lori A. Freitag¹, Paul D. Hovland¹, Lois Curfman McInnes¹, Barry Smith¹ - Show less +3 more•Institutions (1)

Argonne National Laboratory¹

1 Dec 2002

TL;DR: This paper discusses recent work on building component interfaces and implementations in parallel numerical toolkits for mesh manipulations, discretization, linear algebra, and optimization for high-performance simulations in computational science.

...read moreread less

Abstract: High-performance simulations in computational science often involve the combined software contributions of multidisciplinary teams of scientists, engineers, mathematicians, and computer scientists. One goal of component-based software engineering in large-scale scientific simulations is to help manage such complexity by enabling better interoperability among codes developed by different groups. This paper discusses recent work on building component interfaces and implementations in parallel numerical toolkits for mesh manipulations, discretization, linear algebra, and optimization. We consider several motivating applications involving partial differential equations and unconstrained minimization to demonstrate this approach and evaluate performance.

...read moreread less

Journal Article•10.1016/S0167-8191(02)00105-9•

A data and task parallel image processing environment

[...]

Cristina Nicolescu¹, Pieter Jonker¹•Institutions (1)

Delft University of Technology¹

1 Aug 2002

TL;DR: A data and task parallel low-level image processing environment for distributed memory systems that is parallelized by data decomposition using algorithmic skeletons and validated on the multi-baseline stereo vision application.

...read moreread less

Abstract: The paper presents a data and task parallel low-level image processing environment for distributed memory systems. Image processing operators are parallelized by data decomposition using algorithmic skeletons. Image processing applications are parallelized by task decomposition, based on the image application task graph. In this way, an image processing application can be parallelized both by data and task decomposition, and thus better speed-ups can be obtained. We validate our method on the multi-baseline stereo vision application.

...read moreread less

Book Chapter•10.1007/3-540-48051-X_42•

Scheduling Strategies for Master-Slave Tasking on Heterogeneous Processor Grids

[...]

C. Banino¹, Olivier Beaumont¹, Arnaud Legrand², Yves Robert²•Institutions (2)

L'Abri¹, French Institute for Research in Computer Science and Automation²

15 Jun 2002

TL;DR: This paper uses a non-oriented graph to model a grid, where resources can have different speeds of computation and communication, as well as different overlap capabilities, and shows how to determine the optimal steady-state scheduling strategy for each processor.

...read moreread less

Abstract: In this paper, we consider the problem of allocating a large number of independent, equal-sized tasks to a heterogeneous "grid" computing platform. We use a non-oriented graph to model a grid, where resources can have different speeds of computation and communication, as well as different overlap capabilities. We show how to determine the optimal steady-state scheduling strategy for each processor.Because spanning trees are easier to deal with in practice, a natural question arises: how to extract the best spanning tree, i.e. the one with optimal steady-state throughput, out of a general interconnection graph? We show that this problem is NP-Complete. Still, we introduce and compare several low-complexity heuristics to determine a sub-optimal spanning tree.

...read moreread less

Journal Article•10.1016/S0167-8191(01)00128-4•

MODTRAN on supercomputers and parallel computers

[...]

Ping Wang¹, Karen Y. Liu¹, Tom Cwik¹, Robert O. Green¹•Institutions (1)

California Institute of Technology¹

7 Jan 2002

TL;DR: A flexible, parallel version of MODTRAN is implemented on the Cray T3E, the HP SPP2000, and a Beowulf-class cluster computer using domain decomposition techniques and the Message Passing Interface (MPI) library.

...read moreread less

Abstract: To enable efficient reduction of large data sets such as is done in the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) project at the Jet Propulsion Laboratory (JPL), a high performance version of MODTRAN is essential. One means to accomplish this is to apply the computational resources of parallel computer systems. In our present work, a flexible, parallel version of MODTRAN has been implemented on the Cray T3E, the HP SPP2000, and a Beowulf-class cluster computer using domain decomposition techniques and the Message Passing Interface (MPI) library. In this paper, porting the sequential MODTRAN to various platforms is discussed; strategies of designing a parallel version of MODTRAN are developed; detailed implementation for a parallel MODTRAN is reported, and performance data of the parallel code on various computers are presented. Near linear scaling performance of parallel MODTRAN has been obtained, and comparisons of wallclock time are made among various supercomputers and parallel computers. The parallel version of MODTRAN gives excellent speedup, which dramatically reduces total data processing time for many applications such as the AVIRIS project at JPL.

...read moreread less

Journal Article•10.1016/S0167-8191(02)00092-3•

Reconciling simplicity and realism in parallel disk modelsy

[...]

Peter Sanders¹•Institutions (1)

Max Planck Society¹

1 May 2002

TL;DR: In this paper, the authors propose a model that implements one large logical disk allowing concurrent access to arbitrary sets of variable size blocks, which can be implemented efficiently on multiple independent disks even if zones with different speed, communication bottlenecks and failed disks are allowed.

...read moreread less

Abstract: For the design and analysis of algorithms that process huge data sets, a machine model is needed that handles parallel disks. There seems to be a dilemma between simple and flexible use of such a model and accurate modeling of details of the hardware. This paper explains how many aspects of this problem can be resolved. The programming model implements one large logical disk allowing concurrent access to arbitrary sets of variable size blocks. This model can be implemented efficiently on multiple independent disks even if zones with different speed, communication bottlenecks and failed disks are allowed. These results not only provide useful algorithmic tools but also imply a theoretical justification for studying external memory algorithms using simple abstract models.The algorithmic approach is random redundant placement of data and optimal scheduling of accesses. The analysis generalizes a previous analysis for simple abstract external memory models in several ways (higher efficiency, variable block sizes, more detailed disk model).

...read moreread less

Proceedings Article•10.1142/9781860949630_0025•

Design of a parallel and distributed web search engine

[...]

Salvatore Orlando, Raffaele Perego, Fabrizio Silvestri

1 Jul 2002

TL;DR: MOSE as mentioned in this paper is a scalable parallel and distributed engine for searching the web, specifically designed to efficiently exploit affordable parallel architectures, such as clusters of workstations, which can be easily adjusted to fulfill the bandwidth requirements of the application at hand.

...read moreread less

Abstract: This paper describes the architecture of MOSE (My Own Search Engine), a scalable parallel and distributed engine for searching the web. MOSE was specifically designed to efficiently exploit affordable parallel architectures, such as clusters of workstations. Its modular and scalable architecture can be easily adjusted to fulfill the bandwidth requirements of the application at hand. Both task-parallel and data-parallel approaches are exploited within MOSE in order to increase the throughput and efficiently use communication, storing and computational resources. We used a collection of html documents as a benchmark and conducted preliminary experiments on a cluster of three SMP Linux PCs.

...read moreread less

Journal Article•10.1016/S0167-8191(02)00185-0•

Middleware for the use of storage in communication

[...]

Micah Beck¹, Dorian Arnold², Alessandro Bassi³, Francine Berman⁴, Henri Casanova⁴, Jack Dongarra¹, Terry Moore¹, Graziano Obertelli⁵, James S. Plank¹, Martin Swany⁵, Sathish Vadhiyar¹, Rich Wolski³ - Show less +8 more•Institutions (5)

University of Tennessee¹, University of Wisconsin-Madison², École normale supérieure de Lyon³, University of California, San Diego⁴, University of California, Santa Barbara⁵

1 Dec 2002

TL;DR: The Logistical Computing and Internetworking project is a reflection of the way that the next generation internetworking fundamentally changes the authors' definition of high performance wide area computing, with a richer view of the use of storage in communication and information sharing.

...read moreread less

Abstract: The Logistical Computing and Internetworking (LoCI) project is a reflection of the way that the next generation internetworking fundamentally changes our definition of high performance wide area computing. A key to achieving this aim is the development of middleware that can provide reliable, flexible, scalable, and cost-effective delivery of data with quality of service guarantees to support high performance applications of all types. The LoCI effort attacks this problem with a simple but innovative strategy. At the base of the LoCI project is a richer view of the use of storage in communication and information sharing.

...read moreread less

Book Chapter•10.1007/3-540-48051-X_50•

Reliability Bounds for Large Multistage Interconnection Networks

[...]

Nasser Fard¹, Indra Gunawan¹•Institutions (1)

Northeastern University¹

15 Jun 2002

TL;DR: The derivation of terminal, broadcast, lower and upper bounds network reliability expressions of the extra-stage cube network will be demonstrated and lower bound reliability provides sufficient assurance that the system will be operational at some specified time.

...read moreread less

Abstract: To derive the exact reliability expressions for large Multi-stage Interconnection Networks (MINs) can become rather complex. As network size increases, the reliability bounds could be used to estimate the reliability of the networks. In this paper, terminal, broadcast, lower and upper bounds network reliability will be determined. Lower bound reliability is the minimum probability that the system will be operational for a specified time. Upper bound reliability presents an optimistic view of probability that the system will work at some specified time, which is not the center of attention in terms of reliability point of view. If the lower bound reliability provides sufficient assurance that the system will be operational at some specified time, then no further effort for obtaining the exact reliability expression is necessary. As examples, the derivation of terminal, broadcast, lower and upper bounds network reliability expressions of the extra-stage cube network will be demonstrated.

...read moreread less

Journal Article•10.1016/S0167-8191(02)00106-0•

Towards a general framework for FPGA based image processing using hardware skeletons

[...]

Khaled Benkrid¹, Danny Crookes¹, A. Benkrid¹•Institutions (1)

Queen's University Belfast¹

1 Aug 2002

TL;DR: This paper presents and illustrates the approach to developing a general framework for FPGA based Image Processing based on a library of hardware skeletons, with optimised implementations specifically for Xilinx XC4000 FPGAs.

...read moreread less

Abstract: In this paper, we present our approach to developing a general framework for FPGA based Image Processing. This framework is based on a library of hardware skeletons. A hardware skeleton is a parameterised description of a task-specific architecture. A skeleton's implementation will apply optimisations specific to the target hardware. The library normally contains a range of alternative skeletons for the same task, perhaps tailored for different data representations. The library also contains high level skeletons for compound operations, whose implementation can apply appropriate optimisations. Given a complete algorithm description in terms of skeletons, an efficient hardware configuration is generated automatically. We have developed a library of hardware skeletons for common image processing tasks, with optimised implementations specifically for Xilinx XC4000 FPGAs. This paper presents and illustrates our hardware skeleton approach in the context of some common image processing tasks. It demonstrates our approach to the broader problem of achieving optimised hardware configurations while retaining the convenience and rapid development cycle of an application-oriented, high level programming model.

...read moreread less

Journal Article•10.1016/S0167-8191(01)00142-9•

Generalized least-squares polynomial preconditioners for symmetric indefinite linear equations

[...]

Yu Liang¹, Jim Weston¹, Marek Szularz¹•Institutions (1)

Ulster University¹

1 Feb 2002

TL;DR: The GLS preconditioning polynomial and its influence on the flexible generalized minimized residual (FGMRES) solver are discussed in this paper and experimental results using classical benchmark systems are presented.

...read moreread less

Abstract: Polynomial preconditioners are frequently used in a parallel environment for the computation of the solution of large-scale sparse linear equations (Ax = b) because of their easy implementation and trivial parallelization. With respect to symmetrical indefinite (SID) linear systems, the use of generalized least-squares (GLS) polynomial preconditioning is preferable to other polynomial preconditioning methods because of the ability to use a three-term recurrence relationship and the low implementation costs. The GLS preconditioning polynomial and its influence on the flexible generalized minimized residual (FGMRES) solver are discussed in this paper. The orthogonal polynomials required in the solution of the least-squares approximation problem are constructed using the Stieltjes procedure in multiple disjoint intervals which exclude the origin. The time-consuming numerical integration associated with this procedure is computed efficiently using Chebyshev polynomials of the first kind and the GLS polynomial reconditioned FGMRES algorithm is implemented using MPI in a highly parallel IBM SP2 environment. Experimental results using classical benchmark systems are presented and compared with those obtained using the recently developed SPAI preconditioned Bi-CGSTAB iterative method. The performance of the GLS preconditioned FGMRES solver is critically accessed.

...read moreread less

Book Chapter•10.1007/3-540-48051-X_32•

Parallel and Blocked Algorithms for Reduction of a Regular Matrix Pair to Hessenberg-Triangular and Generalized Schur Forms

[...]

Björn Adlerborn¹, Krister Dackland¹, Bo Kågström¹•Institutions (1)

Umeå University¹

15 Jun 2002

TL;DR: Algorithm and implementation issues regarding the single-/double-shift QZ algorithm are discussed and multishift strategies to enhance the performance in blocked as well as in parallell variants of the QZ method are described.

...read moreread less

Abstract: A parallel three-stage algorithm for reduction of a regular matrix pair (A, B) to generalized Schur from (S, T) is presented. The first two stages transform (A, B) to upper Hessenberg-triangular form (H, T) using orthogonal equivalence transformations. The third stage iteratively reduces the matrix in (H, T) form to generalized Schur form. Algorithm and implementation issues regarding the single-/double-shift QZ algorithm are discussed. We also describe multishift strategies to enhance the performance in blocked as well as in parallell variants of the QZ method.

...read moreread less

Journal Article•10.1016/S0167-8191(01)00144-2•

Parallel matrix computations in air pollution modelling

[...]

Wojciech Owczarz¹, Zahari Zlatev•Institutions (1)

Technical University of Denmark¹

1 Feb 2002

TL;DR: Some results, which are obtained when several versions of a large-scale air pollution model are run on different parallel architectures, will be presented in this paper.

...read moreread less

Abstract: Mathematical models for large-scale air pollution studies consist of systems of partial differential equations (PDEs). The number of equations in these systems of PDEs is equal to the number of chemical compounds (the number of chemical compounds involved in the current large-scale air pollution models varies from 20 to about 200). The space domain of the systems of PDEs is normally very large, because the models must be able to treat transboundary long-range transport of the harmful pollutants. The time-intervals are often very long (runs with meteorological data covering up to 10 years have sometimes to be carried out). Moreover, fine spatial and temporal resolution is as a rule required. This leads to very large computational tasks when the air pollution models are discretized. Therefore, it is necessary to use fast and sufficiently accurate numerical methods as well as to exploit efficiently the great potential power of the parallel computers. Some results, which are obtained when several versions of a large-scale air pollution model are run on different parallel architectures, will be presented in this paper.

...read moreread less

Journal Article•10.1016/S0167-8191(02)00076-5•

Madeleine II: a portable and efficient communication library for high-performance cluster computing

[...]

Olivier Aumage¹, Luc Bougé¹, Jean-François Méhaut¹, Raymond Namyst¹•Institutions (1)

École normale supérieure de Lyon¹

1 Apr 2002

TL;DR: Madeleine II as mentioned in this paper is an adaptive and portable multiprotocol communication library for high-performance multithreaded applications, which has the ability to control multiple network protocols (BIP, SISCI, VIA) and multiple network adapters (ETHERNET, MYRINET, SCI).

...read moreread less

Abstract: This paper introduces Madeleine II, an adaptive and portable multiprotocol communication library for high-performance multithreaded applications. Madeleine II has the ability to control multiple network protocols (BIP, SISCI, VIA) and multiple network adapters (ETHERNET, MYRINET, SCI). Moreover, it includes advanced mechanisms to dynamically select the most appropriate transfer method for a given network protocol according to various parameters such as data size or responsiveness user requirements. We report on performance measurements obtained using various protocols and we present preliminary results about porting the MPICH and the NEXUS communication libraries on top of Madeleine II.

...read moreread less

Book Chapter•10.1007/3-540-48051-X_4•

Grid Computing: Enabling a Vision for Collaborative Research

[...]

Gregor von Laszewski¹•Institutions (1)

Argonne National Laboratory¹

15 Jun 2002

TL;DR: This paper provides an overview showing why Grid research is difficult, and a number of management-related issues that must be addressed to make Grids a reality.

...read moreread less

Abstract: In this paper we provide a motivation for Grid computing based on a vision to enable a collaborative research environment. Our vision goes beyond the connection of hardware resources. We argue that with an infrastructure such as the Grid, new modalities for collaborative research are enabled. We provide an overview showing why Grid research is difficult, and we present a number of management-related issues that must be addressed to make Grids a reality. We list projects that provide solutions to subsets of these issues.

...read moreread less

Book Chapter•10.1007/3-540-48051-X_29•

A Recursive Formulation of the Inversion of Symmetric Positive Definite Matrices in Packed Storage Data Format

[...]

Bjarne Stig Andersen¹, John A. Gunnels², Fred G. Gustavson², Jerzy Wasniewski¹•Institutions (2)

Technical University of Denmark¹, IBM²

15 Jun 2002

TL;DR: A new Recursive Packed Inverse Calculation Algorithm for symmetric positive definite matrices has been developed and has nearly the same performance as the LAPACK full storage algorithm using n2 memory words.

...read moreread less

Abstract: A new Recursive Packed Inverse Calculation Algorithm for symmetric positive definite matrices has been developed The new Recursive Inverse Calculation algorithm uses minimal storage, n(n + 1)/2, and has nearly the same performance as the LAPACK full storage algorithm using n2 memory words New recursive packed BLAS needed for this algorithm have been developed too Two transformation routines, from the LAPACK packed storage data format to the recursive storage data format were added to the package tooWe present performance measurements on several current architectures that demonstrate improvements over the traditional packed routines

...read moreread less

Journal Article•10.1016/S0167-8191(02)00199-0•

Advanced environments for parallel and distributed applications: a view of current status

[...]

Pasqua D'Ambra¹, Marco Danelutto², Daniela di Serafino³, Marco Lapegna⁴•Institutions (4)

Indian Council of Agricultural Research¹, University of Pisa², Seconda Università degli Studi di Napoli³, University of Naples Federico II⁴

1 Dec 2002

TL;DR: A view of the design and development activity concerning advanced environments for parallel and distributed computing is provided, and a "classification" of these environments into two main classes: programming environments and problems solving environments is come up.

...read moreread less

Abstract: In this paper we provide a view of the design and development activity concerning advanced environments for parallel and distributed computing. We start from assessing the main issues driving this research track, in the areas of hardware and software technology and of applications. Then, we identify some key concepts, that can be considered as common guidelines and goals in the development of modern advanced environments, and we come up with a "classification" of these environments into two main classes: programming environments and problems solving environments. Both classes are widely discussed, in light of the key concepts previously outlined, and several examples are provided, in order to give a picture of the current status and trends.

...read moreread less

Journal Article•10.1016/S0167-8191(02)00074-1•

CPU and incremental memory allocation in dynamic parallelization of SQL Queries

[...]

Abdelkader Hameurlain¹, Franck Morvan¹•Institutions (1)

Paul Sabatier University¹

1 Apr 2002

TL;DR: An incremental parallelization method which carries out simultaneously both scheduling and mapping in co-operation with two incremental memory allocation heuristics (ParAd: parallelism degree adjustment, and MaCRelax: mapping clues relaxation) in a dynamic multi-user context is proposed.

...read moreread less

Abstract: In order to re-adjust the parallel execution of SQL queries in case of metric estimation or discretization errors, we propose an incremental parallelization method which carries out simultaneously both scheduling and mapping in co-operation with two incremental memory allocation heuristics (ParAd: parallelism degree adjustment, and MaCRelax: mapping clues relaxation) in a dynamic multi-user context. The two incremental memory allocation heuristics are integrated in the mapping method which attempt to avoid time-consuming multibucket join execution generating numerous additional I/O. A performance evaluation of the ParAd heuristic shows: (i) a significant join response time savings (from 16.11% to 35.62%), and (ii) with many complex queries, a more significant gain in response time (from 29% to 54%).

...read moreread less

Journal Article•10.1016/S0167-8191(02)00078-9•

A scalable parallel algorithm for training a hierarchical mixture of neural experts

[...]

Pablo A. Estevez¹, Hélène Paugam-Moisy², Didier Puzenat², Manuel Ugarte¹•Institutions (2)

University of Chile¹, Centre national de la recherche scientifique²

1 Jun 2002

TL;DR: An analysis of the models shows that the parallel algorithms are highly scalable when the size of the experts grows from linear units to multi-layer perceptrons (MLPs) and achieving near-linear speedups for HME-MLP.

...read moreread less

Abstract: Efficient parallel learning algorithms are proposed for training a powerful modular neural network, the hierarchical mixture of experts (HME). Parallelizations are based on the concept of modular parallelism, i.e. parallel execution of network modules. From modeling the speed-up as a function of the number of processors and the number of training examples, several improvements are derived, such as pipelining the training examples by packets. Compared to experimental measurements, theoretical models are accurate. For regular topologies, an analysis of the models shows that the parallel algorithms are highly scalable when the size of the experts grows from linear units to multi-layer perceptrons (MLPs). These results are confirmed experimentally, achieving near-linear speedups for HME-MLP. Although this work can be viewed as a case study in the parallelization of HME neural networks, both algorithms and theoretical models can be expanded to different learning rules or less regular tree architectures.

...read moreread less

...

Expand