Scispace (Formerly Typeset)
  1. Home
  2. Conferences
  3. Parallel Computing
  4. 2005
  1. Home
  2. Conferences
  3. Parallel Computing
  4. 2005
Showing papers presented at "Parallel Computing in 2005"
Journal Article•10.1016/J.PARCO.2005.04.002•
A high performance, low complexity algorithm for compile-time task scheduling in heterogeneous systems

[...]

Tarek Hagras1, Jan Janecek1•
Czech Technical University in Prague1
1 Jul 2005
TL;DR: This paper presents a simple scheduling algorithm based on list-scheduling and task-duplication on a bounded number of heterogeneous machines, called Heterogeneous Critical Parents with Fast Duplicator (HCPFD), which outperforms on average all other higher complexity algorithms.
Abstract: Heterogeneous computing systems are an interesting computing platforms due to the fact that a single parallel architecture may not be adequate for exploiting all of a program's available parallelism. In some cases, heterogeneous systems have been shown to produce higher performance for lower cost than a single large machine. Task scheduling is the key issue when aiming at high performance in these kind of systems. A large number of scheduling heuristics have been presented in the literature, most of them target only homogeneous computing systems. In this paper we present a simple scheduling algorithm based on list-scheduling and task-duplication on a bounded number of heterogeneous machines, called Heterogeneous Critical Parents with Fast Duplicator (HCPFD). The analysis and experiments have shown that HCPFD outperforms on average all other higher complexity algorithms.

139 citations

Journal Article•10.1016/J.PARCO.2005.04.010•
The complexity of static data replication in data grids

[...]

Uroš ibej1, Boštjan Slivnik1, Borut Robič1•
University of Ljubljana1
1 Aug 2005
TL;DR: This article describes data replication on data grids as a static optimization problem, and shows that this problem is NP-hard and non-approximable.
Abstract: Data replication is a well-known technique used in distributed computing to improve access to data and/or system fault-tolerance. Recently, studies of its applications to grid computing have also been initiated. In this article we describe data replication on data grids as a static optimization problem. We show that this problem is NP-hard and non-approximable. We discuss two approaches to solving it, i.e. integer programming and simplifications.

64 citations

Journal Article•10.1016/J.PARCO.2004.10.002•
Fault hamiltonicity of augmented cubes

[...]

Hong-Chun Hsu1, Liang-Chih Chiang2, Jimmy J. M. Tan2, Lih-Hsing Hsu•
Providence College1, National Chiao Tung University2
1 Jan 2005
TL;DR: It is proved that AQ"n-F is hamiltonian if |F|=<2n-3 and that AQ’s-f isHamiltonian connected if|F| =<2 n-4 and these bounds are tight.
Abstract: In this paper, we consider the fault hamiltonicity and the fault hamiltonian connectivity of the augmented cubes AQ"n. Assume that [email protected]?V(AQ"n)@?E(AQ"n) and n>=4. We prove that AQ"n-F is hamiltonian if |F|=<2n-3 and that AQ"n-F is hamiltonian connected if |F|=<2n-4. Moreover, these bounds are tight.

63 citations

Journal Article•10.1016/J.PARCO.2005.02.001•
Recent trends in the marketplace of high performance computing

[...]

Erich Strohmaier1, Jack Dongarra2, Hans W. Meuer3, Horst D. Simon1•
Lawrence Berkeley National Laboratory1, University of Tennessee2, University of Mannheim3
1 Mar 2005
TL;DR: This paper analyzes major recent trends and changes in the High Performance Computing (HPC) market place and indicates renewed broad interest in the scientific HPC community for new hardware architectures and new programming paradigms.
Abstract: In this paper we analyze major recent trends and changes in the High Performance Computing (HPC) market place The introduction of vector computers started the area of 'Supercomputing' The initial success of vector computers in the seventies was driven by raw performance Massive parallel systems (MPP) became successful in the early nineties due to their better price/performance ratios, which was enabled by the attack of the 'killer-micros' The success of microprocessor based on the shared memory concept (referred to as symmetric multiprocessors (SMP)) even for the very high-end systems, was the basis for the emerging cluster concepts in the early 2000s Within the first half of this decade clusters of PC's and workstations have become the prevalent architecture for many HPC application areas on all ranges of performance However, the Earth Simulator vector system demonstrated that many scientific applications could benefit greatly from other computer architectures At the same time there is renewed broad interest in the scientific HPC community for new hardware architectures and new programming paradigms The IBM BlueGene/L system is one early example of a shifting design focus for large-scale system The DARPA HPCS program has the declared goal of building a Petaflops computer system by the end of the decade using novel computer architectures

52 citations

Journal Article•10.1016/J.PARCO.2005.03.005•
A time-to-live based reservation algorithm on fully decentralized resource discovery in Grid computing

[...]

Sanya Tangpongprasit1, Takahiro Katagiri1, Kenji Kise1, Hiroki Honda1, Toshitsugu Yuba1 •
University of Electro-Communications1
1 Jun 2005
TL;DR: It is found that the performance of the algorithm seems to perform well only in the environment with enough resources, comparing with the density of requests in the network, and the algorithm that finds the available matching resource whose attributes are closest to the required attribute can improve the resource utilization.
Abstract: We present an alternative algorithm of fully decentralized resource discovery in Grid computing, which enables the sharing, selection, and aggregation of a wide variety of geographically distributed computational resources. Our algorithm is based on a simply unicast request transmission that can be easily implemented. The addition of a reservation algorithm is enable resource discovery mechanism to find more available matching resources. The deadline for resource discovery time is decided with time-to-live value. With our algorithm, the only one resource is automatically decided for any request if multiple available resources are found on forward path of resource discovery, resulting in no need to ask user to manually select the resource from a large list of available matching resources. We evaluated the performance of our algorithms by comparing with first-found-first-served algorithm. The experiment results show that the percentages of request that can be supported by both algorithms are not different. However, it can improve the performance of either resource utilization or turnaround time, depending on how to select the resource. The algorithm that finds the available matching resource whose attributes are closest to the required attribute can improve the resource utilization, whereas another one that finds the available matching resource which has the highest performance can improve the turn-around time. However, it is found that the performance of our algorithm relies on the density of resource in the network. Our algorithm seems to perform well only in the environment with enough resources, comparing with the density of requests in the network.

49 citations

Journal Article•10.1016/J.PARCO.2005.03.006•
Development of a parallel optimization method based on genetic simulated annealing algorithm

[...]

Z.G. Wang1, Yoke San Wong1, Mustafizur Rahman1•
National University of Singapore1
1 Aug 2005
TL;DR: In PGSA, the entire population is divided into sub-populations, and in each sub-population the algorithm uses the local search ability of simulated annealing after crossover and mutation to optimize continuous problems.
Abstract: This paper presents a parallel genetic simulated annealing (PGSA) algorithm that has been developed and applied to optimize continuous problems. In PGSA, the entire population is divided into sub-populations, and in each sub-population the algorithm uses the local search ability of simulated annealing after crossover and mutation. The best individuals of each subpopulation are migrated to neighboring ones after a certain number of epochs. An implementation of the algorithm is discussed and the performance is evaluated against a standard set of test functions. PGSA shows some remarkable improvement in comparison with the conventional parallel genetic algorithm and the breeder genetic algorithm (BGA).

49 citations

Journal Article•10.1016/J.PARCO.2005.02.011•
A distributed utility-based two level market solution for optimal resource scheduling in computational grid

[...]

Li Chunlin1, Li Layuan1•
Wuhan University of Technology1
1 Mar 2005
TL;DR: The paper presents two-level market grid resource pricing that is an iterative algorithm used to perform optimal resource allocation that outperforms one level market scheme in terms of task completion time and resource allocation efficiency.
Abstract: This paper investigates the interactions between agents representing users, services and resources to solve resource scheduling optimization in computational grid. In order to reduce the computational complexity, we further decompose the grid resource allocation optimization into subproblems: grid user agent-grid service agent in service market and grid service agent-grid resource agent in resource market. Two-level market converges to its optimal points; a globally optimal point is achieved. Total user benefit of the computational grid is maximized when the equilibrium prices are obtained through the service market level optimization and resource market level optimization. It demonstrates a practical approach to market responsive resource pricing that can benefit grid providers and users alike. The paper presents two-level market grid resource pricing that is an iterative algorithm used to perform optimal resource allocation. The experiment shows that two-level market based resource pricing scheme outperforms one level market scheme in terms of task completion time and resource allocation efficiency.

45 citations

Journal Article•10.1016/J.PARCO.2005.03.018•
Load balancing and OpenMP implementation of nested parallelism

[...]

Ragnhild Blikberg1, Tor Sørevik1•
University of Bergen1
1 Oct 2005
TL;DR: This paper provides an algorithm for finding good distributions of threads to tasks and discusses how to implement nested parallelism in OpenMP.
Abstract: Many problems have multiple layers of parallelism. The outer-level may consist of few and coarse-grained tasks. Next, each of these tasks may also be rich in parallelism, and be split into a number of fine-grained tasks, which again may consist of even finer subtasks, and so on. Here we argue and demonstrate by examples that utilizing multiple layers of parallelism may give much better scaling than if one restricts oneself to only one level of parallelism. Two non-trivial issues for multi-level parallelism are load balancing and implementation. In this paper we provide an algorithm for finding good distributions of threads to tasks and discuss how to implement nested parallelism in OpenMP.

43 citations

Journal Article•10.1016/J.PARCO.2005.01.001•
Short communication: A novel parallelization approach for hierarchical clustering

[...]

Z. Du1, F. Lin1•
Nanyang Technological University1
1 May 2005
TL;DR: In this paper, a parallel hierarchical clustering algorithm for gene expression data is presented. But, the algorithm is limited to handle large data sets within a reasonable time and memory resources.
Abstract: Identification of groups of genes that manifest similar expression patters is a key step in the analysis of gene expression data. Hierarchical clustering is developed for that purpose. A fundamental problem with the previous implementations of this clustering method is its limitation to handle large data sets within a reasonable time and memory resources. In this paper, we present a parallel approach for solving this problem. Implementation of the parallel algorithm is illustrated on data from high dimensional microarray experiments related to the gene expression in cancerous disease and Arabidopsis seedling growth. They show considerable reduction in computational time and inter-node communication overhead, especially for large data sets.

42 citations

Journal Article•10.1016/J.PARCO.2005.03.012•
OpenMP parallelization of agent-based models

[...]

Federico Massaioli, Filippo Castiglione1, Massimo Bernaschi1•
IAC1
1 Oct 2005
TL;DR: A series of tests, on different platforms, of simplified codes that reproduce non-trivial issues of the present hardware/software platforms for parallel processing and can be used as a starting point in the search of a possible solution.
Abstract: Agent-based models, an emerging paradigm of simulation of complex systems, appear very suitable to parallel processing. However, during the parallelization of a simulator of financial markets, we found that some features of these codes highlight non-trivial issues of the present hardware/software platforms for parallel processing. Here we present the results of a series of tests, on different platforms, of simplified codes that reproduce such problems and can be used as a starting point in the search of a possible solution.

39 citations

Journal Article•10.1016/J.PARCO.2005.04.012•
Performance optimization of irregular codes based on the combination of reordering and blocking techniques

[...]

Juan C. Pichel1, Dora B. Heras1, José C. Cabaleiro1, Francisco F. Rivera1•
University of Santiago de Compostela1
1 Aug 2005
TL;DR: The results expressed in terms of execution time show that an adequate reordering of the data improves the efficiency of applying register blocking, therefore, reducing the execution time for the sparse algebra code considered.
Abstract: The combination of techniques based on reordering data with classic code restructuring techniques for increasing the locality in the execution of sparse algebra codes is studied in this paper. The reordering techniques are based on, first modeling the locality in run-time, and then applying a heuristic for increasing it. After this, a code restructuring technique specially tuned for sparse algebra codes called register blocking is applied. The product of a sparse matrix by a dense vector (SpMxV) is the code studied on different monoprocessors and distributed memory multiprocessors. The combination of both techniques was tested for a broad set of matrices from real problems and known repositories. The results expressed in terms of execution time show that an adequate reordering of the data improves the efficiency of applying register blocking, therefore, reducing the execution time for the sparse algebra code considered.
Journal Article•10.1016/J.PARCO.2005.02.007•
Memory sharing for interactive ray tracing on clusters

[...]

David E. DeMarle1, Christiaan Gribble1, Solomon Boulos1, Steven G. Parker1•
Scientific Computing and Imaging Institute1
1 Feb 2005
TL;DR: Object- and page-based distributed shared memories are compared, and optimizations for efficient memory use are discussed.
Abstract: We present recent results in the application of distributed shared memory to image parallel ray tracing on clusters. Image parallel rendering is traditionally limited to scenes that are small enough to be replicated in the memory of each node, because any processor may require access to any piece of the scene. We solve this problem by making all of a cluster’s memory available through software distributed shared memory layers. With gigabit ethernet connections, this mechanism is sufficiently fast for interactive rendering of multi-gigabyte datasets. Object- and page-based distributed shared memories are compared, and optimizations for efficient memory use are discussed.
Journal Article•10.1016/J.PARCO.2005.03.015•
Towards a more efficient implementation of OpenMP for clusters via translation to global arrays

[...]

Lei Huang1, Barbara Chapman1, Zhenying Liu1•
University of Houston1
1 Oct 2005
TL;DR: An alternative approach that translates OpenMP to Global Arrays (GA) is introduced, explaining the basic strategy and a new directive INVARIANT is proposed to provide information about the dynamic scope of data access patterns.
Abstract: This paper discusses a novel approach to implementing OpenMP on clusters. Traditional approaches to do so rely on Software Distributed Shared Memory systems to handle shared data. We discuss these and then introduce an alternative approach that translates OpenMP to Global Arrays (GA), explaining the basic strategy. GA requires a data distribution. We do not expect the user to supply this; rather, we show how we perform data distribution and work distribution according to the user-supplied OpenMP static loop schedules. An inspector-executor strategy is employed for irregular applications in order to gather information on accesses to potentially non-local data, group non-local data transfers and overlap communications with local computations. Furthermore, a new directive INVARIANT is proposed to provide information about the dynamic scope of data access patterns. This directive can help us generate efficient codes for irregular applications using the inspector-executor approach. We also illustrate how to deal with some hard cases containing reshaping and strided accesses during the translation. Our experiments show promising results for the corresponding regular and irregular GA codes.
Journal Article•10.1016/J.PARCO.2005.02.009•
Evaluation of the asynchronous iterative algorithms in the context of distant heterogeneous clusters

[...]

Jacques M. Bahi1, Sylvain Contassot-Vivier1, Raphaël Couturier1•
University of Franche-Comté1
1 May 2005
TL;DR: The subject of this paper is to show the high efficiency of asynchronism for parallel iterative algorithms in the context of grid computing, that is to say, with machines scattered on a broad geographical scale.
Abstract: The subject of this paper is to show the high efficiency of asynchronism for parallel iterative algorithms in the context of grid computing, that is to say, with machines scattered on a broad geographical scale. The question is: does asynchronism help to reduce the communication penalty and the overall computation time of such a given algorithm? The asynchronous programming model is evaluated on two test problems representing two important classes of scientific applications: a stationary linear problem and a non-stationary non-linear problem. They are implemented with a multi-threaded environment and tested on a set of distant heterogeneous machines. Several experiments have been performed allowing us to compare the performances of such asynchronous algorithms and also to analyze their behavior and extract the main possible optimizations for their use in a grid computing context.
Proceedings Article•
Performance Analysis and Visualization of the N-Body Tree Code PEPC on Massively Parallel Computers

[...]

Paul Gibbon, Wolfgang Frings, S. Dominiczak, Bernd Mohr
1 Jan 2005
TL;DR: This paper presents a meta-analyses of the immune system’s response to TSPs and its applications in medicine and medicine-like settings.
Abstract: c © 2006 by John von Neumann Institute for Computing Permission to make digital or hard copies of portions of this work for personal or classroom use is granted provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise requires prior specific permission by the publisher mentioned above.
Journal Article•10.1016/J.PARCO.2005.04.001•
Guest editorial: Heterogeneous computing

[...]

Alexey Kalinov1, Alexey Lastovetsky2, Yves Robert3•
Russian Academy of Sciences1, University College Dublin2, École normale supérieure de Lyon3
1 Jul 2005
Proceedings Article•
Phase-Based Parallel Performance Profiling

[...]

Allen D. Malony, Sameer Shende, Alan H. Morris
1 Jan 2005
TL;DR: This paper presents a meta-analyses of the immune system’s response to TSPs and its applications in medicine and medicine-like settings.
Abstract: c © 2006 by John von Neumann Institute for Computing Permission to make digital or hard copies of portions of this work for personal or classroom use is granted provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise requires prior specific permission by the publisher mentioned above.
Journal Article•10.1016/J.PARCO.2005.02.005•
A parallel multiresolution volume rendering algorithm for large data visualization

[...]

Jinzhu Gao1, Chaoli Wang2, Liya Li2, Han-Wei Shen2•
Oak Ridge National Laboratory1, Ohio State University2
1 Feb 2005
TL;DR: A new parallel multiresolution volume rendering algorithm that can reduce the run-time communication cost to a minimum and ensure a well-balanced workload among processors when visualizing gigabytes of data with arbitrary error tolerances is presented.
Abstract: We present a new parallel multiresolution volume rendering algorithm for visualizing large data sets. Using the wavelet transform, the raw data is first converted to a multiresolution wavelet tree. To eliminate the data dependency between processors at run-time, and achieve load-balanced rendering, we design a novel algorithm to partition the tree and distribute the data along a hierarchical space-filling curve with error-guided bucketization. Further optimization is achieved by storing reconstructed data at pre-selected tree nodes for each processor based on the available storage resources to reduce the overall wavelet reconstruction cost. At run time, the wavelet tree is first traversed according to the user-specified error tolerance. Data blocks of different resolutions that satisfy the error tolerance are then decompressed and rendered to compose the final image in parallel. Experimental results showed that our algorithm can reduce the run-time communication cost to a minimum and ensure a well-balanced workload among processors when visualizing gigabytes of data with arbitrary error tolerances.
Proceedings Article•
A Java/Jini framework supporting stream parallel computations

[...]

Marco Danelutto, Patrizio Dazzi
1 Jan 2005
TL;DR: JJPF (the Java/Jini Parallel Framework) is a framework that can run stream parallel applications on several parallel-distributed architectures and achieves almost perfect, fully automatic load balancing in the execution of such kind of applications.
Abstract: JJPF (the Java/Jini Parallel Framework) is a framework that can run stream parallel applications on several parallel-distributed architectures. JJPF is actually a distributed execution server. It uses JINI to recruit the computational resources needed to compute parallel applications. Parallel applications can be run on JJPF provided they exploit parallelism accordingly to an arbitrary nesting of task farm and pipeline skeletons/patterns. JJPF achieves almost perfect, fully automatic load balancing in the execution of such kind of applications. It also transparently handles any number of node and network faults. Scalability and efficiency results are shown on workstation networks, both with a synthetic (embarrassingly parallel) image processing application and with a real (not embarrassingly parallel) page ranking application.
Journal Article•10.1016/J.PARCO.2004.12.007•
Efficient parallel algorithms and software for compressed octrees with applications to hierarchical methods

[...]

Bhanu Hariharan1, Srinivas Aluru1•
Iowa State University1
1 Mar 2005
TL;DR: The primary goal of this work is to identify and abstract the commonalities present in various hierarchical methods using octrees, design efficient parallel algorithms for them, and encapsulate them in a software library.
Abstract: We describe the design and implementation of efficient parallel algorithms, and a software library for the parallel implementation of compressed octree data structures. Octrees are widely used in supporting hierarchical methods for scientific applications such as the N-body problem, molecular dynamics and smoothed particle hydrodynamics. The primary goal of our work is to identify and abstract the commonalities present in various hierarchical methods using octrees, design efficient parallel algorithms for them, and encapsulate them in a software library. We designed provably efficient parallel algorithms and implementation strategies that perform well irrespective of the spatial distribution of data in the computational domain. The library will enable rapid development of applications, allowing application developers to use efficient parallel algorithms developed for this purpose, without the necessity of having detailed knowledge of the algorithms or of implementing them. The software is developed in C using the Message Passing Interface (MPI). We report experimental results on an IBM xSeries parallel computer.
Journal Article•10.1016/J.PARCO.2005.02.008•
Interactive parallel visualization of large particle datasets

[...]

Kevin Liang1, Patricia Monger1, Huge Couchman1•
McMaster University1
1 Feb 2005
TL;DR: A new interactive parallel visualization method for large particle datasets by directly rendering individual particles based on a parallel rendering cluster that provides real time interaction and interactive exploration of large datasets, which has been a challenge for scientific visualization and other real time data mining applications.
Abstract: This paper presents a new interactive parallel visualization method for large particle datasets by directly rendering individual particles based on a parallel rendering cluster. A frame rate of 9 frames-per-second is achieved for 256^3 particles using 7 render nodes and a display node. This provides real time interaction and interactive exploration of large datasets, which has been a challenge for scientific visualization and other real time data mining applications. A dynamic data distribution technique is designed for highlighting a subset of the particle volume. It maintains load balance of the system and minimizes network traffic by reconfiguring the rendering chain. Experiments show that on a given subset, interactive manipulation of the subset usually requires less than 3% of the particles inside the subset to be redistributed among all render nodes. The method can be easily extended to other large datasets such as hydrodynamic turbulence, fluid dynamics, and so on.
Journal Article•10.1016/J.PARCO.2004.11.003•
Path selection algorithm: the strategy for designing deterministic routing from alternative paths

[...]

Michihiro Koibuchi1, Akiya Jouraku1, Hideharu Amano1•
Keio University1
1 Jan 2005
TL;DR: Simulation results show that one of the four algorithms improves up to 92% of throughput against simple path selection algorithms, and policies to remove paths crossing the bottleneck channels are more efficient than ones to keep paths crossing channels that are not crowded.
Abstract: System Area Networks (SANs), which usually accept irregular topologies, have been used to connect nodes in PC/WS clusters or high-performance storage systems. Although routing algorithms for SANs usually find out alternative paths, SANs usually accept only deterministic routings. Thus, path selection algorithm, which chooses a single path from alternative paths, becomes essential for advanced routings in SANs. However, a few studies of it have been done only for SANs without virtual channels, and its impact is not well analyzed. In this paper, (1) we propose four path selection algorithms which have different concepts to distribute paths in SANs with virtual channels, and (2) we investigate the performance influences of various path selection algorithms through a flit-level simulation. Simulation results show that one of the four algorithms improves up to 92% of throughput against simple path selection algorithms, and policies to remove paths crossing the bottleneck channels are more efficient than ones to keep paths crossing channels that are not crowded.
Proceedings Article•
Optimal Tile Size Selection Guided by Analytical Models

[...]

Basilio B. Fraguela1, M. G. Carmueja1, Diego Andrade1•
University of A Coruña1
1 Jan 2005
TL;DR: This paper presents and compares a series of strategies to search the optimal tile size guided by an analytical model of the whole memory hierarchy and the CPU behavior and shows that these strategies find better tile sizes than traditional heuristic approaches proposed in the literature while requiring a small compile-time overhead.
Abstract: As the memory bottleneck problem continues to grow, so does the relevance of the techniques that help improve program locality. A well-known technique in this category is tiling, which decomposes data sets to be used several times in a computation into a series of tiles that are reused before proceeding to process the next tile. This way, capacity misses are avoided. Finding the optimal tile size is a complex task. In this paper we present and compare a series of strategies to search the optimal tile size guided by an analytical model of the whole memory hierarchy and the CPU behavior. Our experiments show that our strategies find better tile sizes than traditional heuristic approaches proposed in the literature while requiring a small compile-time overhead. Iterative compilation can yield better results, but at the expense of very large overheads.
Book Chapter•10.1016/S0927-5452(05)80009-7•
The grid relational catalog project

[...]

Giovanni Aloisio, Massimo Cafaro, Sandro Fiore, Maria Mirto
1 Jan 2005
TL;DR: The Grid-DBMS as discussed by the authors is a framework for dynamic data management in a grid enviroment, highlighting its requirements, architecture, components and services, as well as an overview about the Grid Relational Catalog Project (GRelC) developed at the CACT/ISUFI of the University of Lecce.
Abstract: Today many DataGrid applications need to manage and process a very large amount of data distributed across multiple grid nodes and stored into heterogeneous databases. Grids encourage and promote the publication, sharing and integration of scientifica data (distributed across several Virtual Organizations) in a more open manner than is currently the case, and many e-Science pojects have an urgent need to interconnect legacy and independently operated databases through a set os data access and integration services. The complexity of data management within a Computational Grid comes from the distribution, scale and heterogeneity of data sources. A set of dynamic and adaptive services could address specific issues related to automatic data management providing high performance and transparency as well as fully exploiting a grid infrastructure. These services should involved data migration and integration, discovery of data sources and so on, providing a transparent and dynamic layer of data virtualization. In this pape we introduce the Grid-DBMS concept, a framework for dynamic data management in a grid enviroment, highlighting its requirements, architecture, components and services. We also present an overview about the Grid Relational Catalog Project (GRelC) developed at the CACT/ISUFI of the University of Lecce, which represents a partial implementation of a Grid-DBMS for the Globus Community.
Journal Article•10.1016/J.PARCO.2005.03.011•
Parallel iterative solvers for finite-element methods using an OpenMP/MPI hybrid programming model on the Earth Simulator

[...]

Kengo Nakajima1•
University of Tokyo1
1 Oct 2005
TL;DR: An efficient parallel iterative method for finite-element method has been developed for symmetric multiprocessor (SMP) cluster architectures with vector processors such as the Earth Simulator and effect of color number in reordering has been evaluated on various types of computers.
Abstract: An efficient parallel iterative method for finite-element method has been developed for symmetric multiprocessor (SMP) cluster architectures with vector processors such as the Earth Simulator. The method is based on a three-level hybrid parallel programming model, including message passing for inter-SMP node communication, loop directives by OpenMP for intra-SMP node parallelization and vectorization for each processing element (PE). Simple 3D linear elastic problems with more than 2.2x10^9 DOF have been solved using 3x3 block ICCG(0) method with additive Schwarz domain decomposition and PDJDS/CM-RCM reordering on 176 nodes of the Earth Simulator, achieving performance of 3.80 TFLOPS. Furthermore, effect of color number in reordering has been evaluated on various types of computers.
Journal Article•10.1016/J.PARCO.2005.04.006•
Design and implementation of a novel dynamic load balancing library for cluster computing

[...]

Ioana Banicescu1, Ricolindo L. Cariño1, Jaderick P. Pabico1, Mahadevan Balasubramaniam1•
Mississippi State University1
1 Jul 2005
TL;DR: The design and implementation of a library based on an integrated approach to dynamic load balancing with the advantages of optimizing data migration via novel dynamic loop scheduling strategies with the advances in resource management and task migration capabilities offered by a recently developed parallel runtime system are presented.
Abstract: This paper presents the design and implementation of a library based on an integrated approach to dynamic load balancing. This approach combines the advantages of optimizing data migration via novel dynamic loop scheduling strategies with the advances in resource management and task migration capabilities offered by a recently developed parallel runtime system. The performance improvements obtained by the use of this library have been investigated by its use in three scientific applications: the N-body simulations, the profiling of automatic quadrature routines, and the heat solver in an unstructured grid. The experimental results obtained underscore the significance of using such an integrated approach, as well as the benefits of using the library especially in applications characterized by irregular and unpredictable behavior.
Journal Article•10.1016/J.PARCO.2005.04.009•
Workflow management and resource discovery for an intelligent grid

[...]

Han Yu1, Xin Bai1, Dan C. Marinescu1•
University of Central Florida1
1 Jul 2005
TL;DR: This paper discusses workflow management and resource discovery in an intelligent grid environment and proposes a novel approach to co-ordinating workflow creation and coordinated workflow execution.
Abstract: A computational grid provides coordinated and transparent access to computing resources for grid users. Workflow management and resource discovery are two important functions of an intelligent grid. Workflow management refers to automatic workflow creation and coordinated workflow execution, and resource discovery facilitates resource allocation and claiming. In this paper we discuss workflow management and resource discovery in an intelligent grid environment.
Journal Article•10.1016/J.PARCO.2004.12.005•
Cluster of re-configurable nodes for scanning large genomic banks

[...]

Stéphane Guyetant1, Mathieu Giraud1, Ludovic L'Hours1, Steven Derrien1, Stéphane Rubini2, Dominique Lavenier1, Frédéric Raimbault3 •
University of Rennes1, University of Western Brittany2, University of Southern Brittany3
1 Jan 2005
TL;DR: It is shown that low cost FPGA nodes interconnected through a standard Ethernet network may advantageously compete against high performance clusters and substitute PCs by re- configuration hardware closely connected to a hard disk.
Abstract: Genomic data are growing exponentially and are daily scanned by thousands of biologists. To reduce the scan time, efficient parallelism can be exploited by dispatching data among a cluster of processing units able to scan locally and independently their own data. If PC clusters are well suited to support this type of parallelism, we propose to substitute PCs by re-configurable hardware closely connected to a hard disk. We show that low cost FPGA nodes interconnected through a standard Ethernet network may advantageously compete against high performance clusters. A prototype of 48 re-configurable processing nodes has been experimented on two genomic applications: a content-based similarity search and a pattern search.
Journal Article•10.1016/J.PARCO.2005.04.003•
Mapping subtasks with multiple versions on an ad hoc grid

[...]

S. Shivle1, Prasanna Sugavanam1, Howard Jay Siegel1, Anthony A. Maciejewski1, Tarun Banka1, K. Chindam1, S. Dussinger, A. Kutruff1, P. Penumarthy1, Prakash Pichumani1, Praveen Satyasekaran1, D. Sendek1, Jay Smith2, J. Sousa, Jayashree Sridharan1, J. Velazco •
Colorado State University1, IBM2
1 Jul 2005
TL;DR: Five resource allocation heuristics to derive near-optimal solutions to the problem of assigning statically computing resources to the subtasks of an application that has an execution time constraint are presented, evaluated, and compared.
Abstract: An ad hoc grid is a heterogeneous computing system composed of mobile devices. Each computing resource is constrained in battery energy. The problem being studied is to assign statically computing resources to the subtasks of an application that has an execution time constraint, when the resources are oversubscribed. All subtasks must be executed; to accommodate this in an oversubscribed environment, each subtask has two versions: the primary or full version, and the secondary or degraded version. The secondary version utilizes only 10% of the resources that the primary version requires, and produces only 10% of the data output for the subsequent children subtasks. Thus, the degraded version (secondary version) represents a reduced capability of lesser overall value, while consuming fewer resources. The goal is to assign resources so that the application meets an execution time constraint and the battery energy constraint while minimizing the number of degraded versions used. Five resource allocation heuristics to derive near-optimal solutions to this problem are presented, evaluated, and compared.
Journal Article•10.1016/J.PARCO.2004.12.003•
A massively parallel approach to deformable matching of 3D medical images via stochastic differential equations

[...]

Michel Salomon1, Fabrice Heitz1, Guy-René Perrin1, Jean-Paul Armspach1•
Centre national de la recherche scientifique1
1 Jan 2005
TL;DR: A comprehensive parallel approach based on the simulation of stochastic differential equations enabling the optimization of the global objective function, through an annealing process, that yields computation times compatible with clinical routine.
Abstract: The deformable matching of 3D medical images remains a difficult problem due to the high dimension of both geometric transformations and data. The matching problem is usually expressed as the minimization of a highly non-linear energy (objective) function, yielding a hard, computationally intensive, optimization problem. This paper presents a comprehensive parallel approach that yields computation times compatible with clinical routine. The image matching is based on the simulation of stochastic differential equations, enabling the optimization of the global objective function, through an annealing process. The resulting algorithm allows a fully parallel sampling of the parameters to be optimized. Due to the large number of parameters involved in deformable matching, this approach is naturally suited to massively parallel implementations. We present implementation issues and timing analysis on an MIMD parallel processing computer (SGI Origin 2000). The performances of the approach are assessed on real data, using 3D brain MR images from different individuals. Beside yielding accurate registrations, the parallel algorithm exhibits excellent relative speedups.
...

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve