Top 64 papers presented at Parallel Computing Technologies in 2007

Showing papers presented at "Parallel Computing Technologies in 2007"

Book Chapter•10.1007/978-3-540-73940-1_63•

Load balancing approach parallel algorithm for frequent pattern mining

[...]

Kun-Ming Yu¹, Jiayi Zhou¹, Wei Chen Hsiao¹•Institutions (1)

Chung Hua University¹

3 Sep 2007

TL;DR: A parallel and distributed mining algorithm based on FP-tree structure, Load Balancing FP-Tree (LFP-tree).

...read moreread less

Abstract: Association rules mining from transaction-oriented databases is an important issue in data mining. Frequent pattern is crucial for association rules generation, time series analysis, classification, etc. There are two categories of algorithms that had been proposed, candidate set generate-and-test approach (Apriori-like) and Pattern growth approach. Many methods had been proposed to solve the association rules mining problem based on FP-tree instead of Apriori-like, since apriori-like algorithm scans the database many times. However, the computation time is costly when the database size is large with FP-tree data structure. Parallel and distributed computing is a good strategy to solve this circumstance. Some parallel algorithms had been proposed, however, most of them did not consider the load balancing issue. In this paper, we proposed a parallel and distributed mining algorithm based on FP-tree structure, Load Balancing FP-Tree (LFP-tree). The algorithm divides the item set for mining by evaluating the tree's width and depth. Moreover, a simple and trusty calculate formulation for loading degree is proposed. The experimental results show that LFP-tree can reduce the computation time and has less idle time compared with Parallel FP-Tree (PFP-tree). In addition, it has better speed-up ratio than PFP-tree when number of processors grow. The communication time can be reduced by preserving the heavy loading items in their local computing node.

...read moreread less

37 citations

Book Chapter•10.1007/978-3-540-73940-1_3•

A stochastic semantics for bioambients

[...]

Linda Brodo, Pierpaolo Degano, Corrado Priami¹•Institutions (1)

The Microsoft Research - University of Trento Centre for Computational and Systems Biology¹

3 Sep 2007

TL;DR: BioAmbients is extended to take quantitative information into account by defining a stochastic semantics, based on a simulation stoChastic algorithm, to determine the actual rate of transitions.

...read moreread less

Abstract: We consider BioAmbients, a calculus for specifying biological entities and for simulating and analysing their behaviour. We extend BioAmbients to take quantitative information into account by defining a stochastic semantics, based on a simulation stochastic algorithm, to determine the actual rate of transitions.

...read moreread less

29 citations

Book Chapter•10.1007/978-3-540-73940-1_30•

Parallelism granules aggregation with the T-system

[...]

Alexander Moskovsky¹, Vladimir Roganov¹, Sergei M. Abramov¹•Institutions (1)

Russian Academy of Sciences¹

3 Sep 2007

TL;DR: This paper is dedicated to one of important T-system aspects -- ability to change parallelism granule size at runtime -- and is shown to reduce overhead incurred by runtime support library dramatically.

...read moreread less

Abstract: T-system is a tool for parallel computing developed at the PSI RAS. The most recent implementation is available on both Linux and Windows platforms. The paper is dedicated to one of important T-system aspects -- ability to change parallelism granule size at runtime. The technique is available, primarily, for recursive programs, but it's possible to extent it to non-recursive ones as well. In the latter case, we employ C++ template"traits"for program transformation. The technique is shown to reduce overhead incurred by runtime support library dramatically.

...read moreread less

20 citations

Book Chapter•10.1007/978-3-540-73940-1_29•

Dynamic job scheduling on the grid environment using the great deluge algorithm

[...]

Paul McMullan¹, Barry McCollum¹•Institutions (1)

Queen's University Belfast¹

3 Sep 2007

TL;DR: This paper presents an extension of a technique used in optimization and scheduling which can provide the means of achieving a trade-off between solution quality and speed in achieving a solution.

...read moreread less

Abstract: The utilization of the computational Grid processor network has become a common method for researchers and scientists without access to local processor clusters to avail of the benefits of parallel processing for computeintensive applications. As a result, this demand requires effective and efficient dynamic allocation of available resources. Although static scheduling and allocation techniques have proved effective, the dynamic nature of the Grid requires innovative techniques for reacting to change and maintaining stability for users. The dynamic scheduling process requires quite powerful optimization techniques, which can themselves lack the performance required in reaction time for achieving an effective schedule solution. Often there is a trade-off between solution quality and speed in achieving a solution. This paper presents an extension of a technique used in optimization and scheduling which can provide the means of achieving this balance and improves on similar approaches currently published.

...read moreread less

18 citations

Book Chapter•10.1007/978-3-540-73940-1_43•

Comparison of evolving uniform, non-uniform cellular automaton, and genetic programming for centroid detection with hardware agents

[...]

Marcus Komann¹, Andreas Mainka¹, Dietmar Fey¹•Institutions (1)

University of Jena¹

3 Sep 2007

TL;DR: This paper compares three different approaches for finding geometric algorithms for centroid detection which are appropriate for a fine-grained parallel hardware architecture in an embedded vision chip.

...read moreread less

Abstract: Current industrial applications require fast and robust image processing in systems with low size and power dissipation. One of the main tasks in industrial vision is fast detection of centroids of objects. This paper compares three different approaches for finding geometric algorithms for centroid detection which are appropriate for a fine-grained parallel hardware architecture in an embedded vision chip. The algorithms shall comprise emergent capabilities and high problem-specific functionality without requiring large amounts of states or memory. For that problem, we consider uniform and non-uniform cellular automata (CA) as well as Genetic Programming. Due to the inherent complexity of the problem, an evolutionary approach is applied. The appropriateness of these approaches for centroid detection is discussed.

...read moreread less

17 citations

Book Chapter•10.1007/978-3-540-73940-1_9•

Optimized parallel approach for 3D modelling of forest fire behaviour

[...]

Gilbert Accary, Oleg Bessonov¹, Dominique Fougère, Sofiane Meradji, Dominique Morvan - Show less +1 more•Institutions (1)

Russian Academy of Sciences¹

3 Sep 2007

TL;DR: Methods for parallelization of 3D CFD forest fire modelling code on Non-uniform memory computers in frame of the OpenMP environment are presented and performance results for the parallelized algorithm are presented.

...read moreread less

Abstract: In this paper we present methods for parallelization of 3D CFD forest fire modelling code on Non-uniform memory computers in frame of the OpenMP environment. Mathematical model is presented first. Then, some peculiarities of this class of computers are considered, along with properties and limitations of the OpenMP model. Techniques for efficient parallelization are discussed, considering different types of data processing algorithms. Finally, performance results for the parallelized algorithm are presented and analyzed (for up to 16 processors).

...read moreread less

16 citations

Book Chapter•10.1007/978-3-540-73940-1_14•

Generation of SMACA and its application in web services

[...]

Anirban Kundu¹, Ruma Dutta¹, Debajyoti Mukhopadhyay¹•Institutions (1)

West Bengal University of Technology¹

3 Sep 2007

TL;DR: An efficient solution to handle the indexing problem is proposed with the introduction of Nonlinear Single Cycle Multiple Attractor Cellular Automata (SMACA), which simultaneously shows generation of SMACA by using specific rule sequence.

...read moreread less

Abstract: Web Search Engine uses forward indexing and inverted indexing as a part of its functional design. This indexing mechanism helps retrieving data from the database based on user query. In this paper, an efficient solution to handle the indexing problem is proposed with the introduction of Nonlinear Single Cycle Multiple Attractor Cellular Automata (SMACA). This work simultaneously shows generation of SMACA by using specific rule sequence. Searching mechanism is done with linear time complexity.

...read moreread less

16 citations

Book Chapter•10.1007/978-3-540-73940-1_35•

Accelerating the singular value decomposition of rectangular matrices with the CSK600 and the integrable SVD

[...]

Yusaku Yamamoto¹, Takeshi Fukaya¹, Takashi Uneyama², Masami Takata³, Kinji Kimura⁴, Masashi Iwasaki⁵, Yoshimasa Nakamura² - Show less +3 more•Institutions (5)

Nagoya University¹, Kyoto University², Nara Women's University³, Niigata University⁴, Kyoto Prefectural University⁵

3 Sep 2007

TL;DR: This paper optimize two of the major components of rectangular SVD, namely, QR decomposition of the input matrix and back-transformation of the left singular vectors by matrix Q, so that large-size matrix multiplications can be used efficiently.

...read moreread less

Abstract: We propose an approach to speed up the singular value decomposition (SVD) of very large rectangular matrices using the CSX600 floating point coprocessor. The CSX600-based acceleration board we use offers 50GFLOPS of sustained performance, which is many times greater than that provided by standard microprocessors. However, this performance can be achieved only when a vendor-supplied matrix-matrix multiplication routine is used and the matrix size is sufficiently large. In this paper, we optimize two of the major components of rectangular SVD, namely, QR decomposition of the input matrix and back-transformation of the left singular vectors by matrix Q, so that large-size matrix multiplications can be used efficiently. In addition, we use the Integrable SVD algorithm to compute the SVD of an intermediate bidiagonal matrix. This helps to further speed up the computation and reduce the memory requirements. As a result, we achieved up to 3.5 times speedup over the Intel Math Kernel Library running on an 3.2GHz Xeon processor when computing the SVD of a 100,000 × 4000 matrix.

...read moreread less

14 citations

Book Chapter•10.1007/978-3-540-73940-1_5•

From unreliable objects to reliable objects: the case of atomic registers and consensus

[...]

Rachid Guerraoui¹, Michel Raynal²•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, University of Rennes²

3 Sep 2007

TL;DR: The paper addresses the object failure model where the base objects can suffer responsive or nonresponsive crash failures and considers self-implementations, i.e., the case where a reliable atomic register is built from unreliable atomic registers (resp., unreliable consensus objects).

...read moreread less

Abstract: A concurrent object is an object that can be concurrently accessed by several processes. It has been shown by Maurice Herlihy that any concurrent object O defined by a sequential specification can be waitfree implemented from reliable atomic registers (shared variables) and consensus objects. Wait-free means that any invocation of an operation of the object O issued by a non-faulty process does terminate, whatever the behavior of the other processes (e.g., despite the fact they are very slow or even have crashed). So, an important issue consists in providing reliable atomic registers and reliable consensus objects despite the failures experienced by the base objects from which these atomic registers and consensus objects are built. This paper considers self-implementations, i.e., the case where a reliable atomic register (resp., consensus object) is built from unreliable atomic registers (resp., unreliable consensus objects). The paper addresses the object failure model where the base objects can suffer responsive or nonresponsive crash failures. When there are solutions the paper presents corresponding algorithms, and when there is no solution, it presents the corresponding impossibility result. The paper has a tutorial flavor whose aim is to make the reader familiar with important results when one has to build resilient concurrent objects. To that aim, the paper use both algorithms from the literature and new algorithms.

...read moreread less

14 citations

Book Chapter•10.1007/978-3-540-73940-1_26•

Dynamic load balancing of black-box applications with a resource selection mechanism on heterogeneous resources of the grid

[...]

Valeria V. Krzhizhanovskaya¹, Vladimir Korkhov¹•Institutions (1)

Saint Petersburg State Polytechnic University¹

3 Sep 2007

TL;DR: This paper describes the proposed algorithm for automated load balancing, paying attention to the influence of resource heterogeneity metrics, demonstrates the speedup achieved with this technique, and proposes a way to extend the approach to a wider class of applications.

...read moreread less

Abstract: In this paper we address the critical issues of efficient resource management and high-performance parallel distributed computing on the Grid by introducing a new hierarchical approach that combines a user-level job scheduling with a dynamic load balancing technique that automatically adapts a blackbox distributed or parallel application to the heterogeneous resources. The algorithm developed dynamically selects the resources best suited for a particular task or parallel process of the executed application, and optimizes the load balance based on the dynamically measured resource parameters and estimated requirements of the application. We describe the proposed algorithm for automated load balancing, paying attention to the influence of resource heterogeneity metrics, demonstrate the speedup achieved with this technique for different types of applications and resources, and propose a way to extend the approach to a wider class of applications.

...read moreread less

13 citations

Book Chapter•10.1007/978-3-540-73940-1_46•

Self-organised criticality in a model of the rat somatosensory cortex

[...]

Grzegorz M. Wojcik¹, Wieslaw A. Kaminski¹, Piotr Matejanka²•Institutions (2)

Maria Curie-Skłodowska University¹, Motorola²

3 Sep 2007

TL;DR: Large Hodgkin-Huxley neural networks were examined and the structures discussed in this article simulated a part of the rat somatosensory cortex and an occurrence of the self-organised criticality (SOC) was demonstrated.

...read moreread less

Abstract: Large Hodgkin-Huxley (HH) neural networks were examined and the structures discussed in this article simulated a part of the rat somatosensory cortex. We used a modular architecture of the network divided into layers and sub-regions. Because of a high degree of complexity effective parallelisation of algorithms was required. The results of parallel simulations were presented. An occurrence of the self-organised criticality (SOC) was demonstrated. Most notably, in large biological neural networks consisting of artificial HH neurons, the SOC was shown to manifest itself in the frequency of its appearance as a function of the size of spike potential avalanches generated within such nets. These two parameters followed the power law characteristic of other systems exhibiting the SOC behaviour.

...read moreread less

Book Chapter•10.1007/978-3-540-73940-1_37•

Pedestrian and crowd dynamics simulation: testing SCA on paradigmatic cases of emerging coordination in negative interaction conditions

[...]

Stefania Bandini¹, Mizar Luca Federici¹, Sara Manzoni¹, Giuseppe Vizzari¹•Institutions (1)

University of Milan¹

3 Sep 2007

TL;DR: This paper presents a set of theoretical experiments performed to evaluate Situated Cellular Agent (SCA) approach within pedestrian dynamics research context and focuses on two emerging phenomena (freezing by heating and lane formation phenomena) that have been empirically observed and already modeled by analytical particle- based models and Cellular Automata-based models.

...read moreread less

Abstract: The paper presents a set of theoretical experiments performed to evaluate Situated Cellular Agent (SCA) approach within pedestrian dynamics research context. SCA is a modeling and simulation approach based on Multi Agent Systems principles that derives from Cellular Automata. In particular, we focus on two emerging phenomena (freezing by heating and lane formation phenomena) that have been empirically observed and already modeled by analytical particle-based models and Cellular Automata-based models.

...read moreread less

Book Chapter•10.1007/978-3-540-73940-1_33•

Latencies of conflicting writes on contemporary multicore architectures

[...]

Josef Weidendorfer¹, Michael Ott¹, Tobias Klug¹, Carsten Trinitis¹•Institutions (1)

Technische Universität München¹

3 Sep 2007

TL;DR: Results show that multicore architectures with shared cache can reduce unwanted effects of false sharing, and a benchmark allowing for quantitative estimates about the consequences of the false sharing effect is presented.

...read moreread less

Abstract: This paper provides a detailed investigation of latency penalties caused by repeated memory writes to nearby memory cells from different threads in parallel programs. When such writes map to the same corresponding cache lines in multiple processors, one can observe the so called false sharing effect. This effect can unnecessarily hamper parallel code due to the line granularity based cache hierarchy, which is common on contemporary processor architectures. In this contribution, a benchmark allowing for quantitative estimates about the consequences of the false sharing effect, is presented. Results show that multicore architectures with shared cache can reduce unwanted effects of false sharing.

...read moreread less

Proceedings Article•

Proceedings of the 9th international conference on Parallel Computing Technologies

[...]

Victor E. Malyshkin¹•Institutions (1)

Russian Academy of Sciences¹

3 Sep 2007

Book Chapter•10.1007/978-3-540-73940-1_38•

Coarse-grained parallelization of cellular-automata simulation algorithms

[...]

Olga Bandman¹•Institutions (1)

Russian Academy of Sciences¹

3 Sep 2007

TL;DR: A general approach to CA parallelization, based on domain decomposition correctness conditions, is formulated and particular parallelization methods are developed for the main classes of CA simulation models: synchronous CA with multi-cell updating rules, asynchronous probabilistic CA, and CA compositions.

...read moreread less

Abstract: Simulating spatial dynamics in physics by Cellular Automata (CA) requires very large computation power, and, hence, CA simulation algorithms are to be implemented on multiprocessors. The preconceived opinion, that no much effort is required to obtain highly efficient coarse grained parallel CA algorithm, is not always true. In fact, a great variety of CA modifications coming into practical use need appropriate, sometimes sophisticated, methods of CA algorithms parallel implementation. Proceeding from the above a general approach to CA parallelization, based on domain decomposition correctness conditions, is formulated. Starting from the correctness conditions particular parallelization methods are developed for the main classes of CA simulation models: synchronous CA with multi-cell updating rules, asynchronous probabilistic CA, and CA compositions. Examples and experimental results are given for each case.

...read moreread less

Book Chapter•10.1007/978-3-540-73940-1_21•

Address-free all-to-all routing in sparse torus

[...]

Risto Honkanen¹, Ville Leppänen², Martti Penttonen¹•Institutions (2)

University of Eastern Finland¹, University of Turku²

3 Sep 2007

TL;DR: A time-scheduled routing algorithm where packets are routed address-free and it is shown that a total exchange relation, where every processor has a packet to route to every other processor, can be routed with routing cost of 1/2 + o(1) time units per packet.

...read moreread less

Abstract: In this work we present a simple network design for all-to-all routing and study deflection routing on it. We present a time-scheduled routing algorithm where packets are routed address-free. We show that a total exchange relation, where every processor has a packet to route to every other processor, can be routed with routing cost of 1/2 + o(1) time units per packet. The network consists of an n-sided d-dimensional torus, where the nd-1 processor (or input/output) nodes are sparsely but regularly situated among nd - nd-1 deflection routing nodes, having d input and d output links. The finite-state routing nodes change their states by a fixed, preprogrammed pattern.

...read moreread less

Book Chapter•10.1007/978-3-540-73940-1_56•

Using analytical models to load balancing in a heterogeneous network of computers

[...]

Jean Marcos Laine¹, Edson T. Midorikawa¹•Institutions (1)

University of São Paulo¹

3 Sep 2007

TL;DR: Two approaches to workload distribution based on analytical models developed to performance prediction of parallel applications, named PEMPIs VRP (Vector of Relative Performances), are presented and the results show that the VRP's dynamic strategy can reduce the imbalance, among the execution time of the processes, in relation to average time.

...read moreread less

Abstract: An effective workload distribution has a prime rule on reducing the total execution time of a parallel application on heterogeneous environments, such as computational grids and heterogeneous clusters. Several methods have been proposed in the literature by many researchers in the last decade. This paper presents two approaches to workload distribution based on analytical models developed to performance prediction of parallel applications, named PEMPIs VRP (Vector of Relative Performances). The workload is distributed based on relative performance ratios, obtained by these models. In this work, we present two schemes, static and dynamic, in a research middleware for a heterogeneous network of computers. In the experimental tests we evaluated and compared them using two MPI applications. The results show that, using the VRP's dynamic strategy, we can reduce the imbalance, among the execution time of the processes, in relation to average time from 25% to near of 5%.

...read moreread less

Book Chapter•10.1007/978-3-540-73940-1_57•

Block-based allocation algorithms for FLASH memory in embedded systems

[...]

Pangfeng Liu¹, Chung-Hao Chuang¹, Jan-Jan Wu²•Institutions (2)

National Taiwan University¹, Academia Sinica²

3 Sep 2007

TL;DR: An offline allocation algorithm called Best Match (BestM) for allocating blocks in FLASH file systems is proposed and experimental results indicate that BestM delivers better performance than a previously proposed First Rearrival First Serve (FRFS) method.

...read moreread less

Abstract: A flash memory has write-once and bulk-erase properties so that an intelligent allocation algorithm is essential to providing applications efficient storage service. This paper first demonstrates that the online version of FLASH allocation problem is difficult, since we can find an adversary that makes every online algorithm to use as many number of blocks as a naive and inefficient algorithm. As a result we propose an offline allocation algorithm called Best Match (BestM) for allocating blocks in FLASH file systems. The experimental results indicate that BestM delivers better performance than a previously proposed First Rearrival First Serve (FRFS) method.

...read moreread less

Book Chapter•10.1007/978-3-540-73940-1_54•

Runtime system for parallel execution of fragmented subroutines

[...]

Konstantin Kalgin¹, Victor E. Malyshkin¹, S. P. Nechaev¹, G. A. Tschukin²•Institutions (2)

Russian Academy of Sciences¹, Novosibirsk State Technical University²

3 Sep 2007

TL;DR: The architecture of a runtime system supporting parallel execution of fragmented library subroutines on multicomputers is proposed and makes possible to develop the library of parallel subroutedines and to provide automatically their dynamic properties such as dynamic load balancing.

...read moreread less

Abstract: The architecture of a runtime system supporting parallel execution of fragmented library subroutines on multicomputers is proposed The approach makes possible to develop the library of parallel subroutines and to provide automatically their dynamic properties such as dynamic load balancing Usage of the MPI for communications programming provides good portability of an application

...read moreread less

Book Chapter•10.1007/978-3-540-73940-1_58•

Variable reassignment in the T++ parallel programming language

[...]

Alexander Moskovsky¹, Vladimir Roganov¹, Sergei M. Abramov¹, A. B. Kuznetsov¹•Institutions (1)

Russian Academy of Sciences¹

3 Sep 2007

TL;DR: The paper focused on semantics and implementation of repeated assignments to a variable in T++, an extension for C++ that adds a set of keywords to C++, allowing smooth transition from sequential to parallel applications.

...read moreread less

Abstract: The paper describes the OpenTS parallel programming system that provides the runtime environment for T++ language. T++ is an extension for C++ that adds a set of keywords to C++, allowing smooth transition from sequential to parallel applications. In this context the support of repeated assignments to a variable is an important feature. The paper focused on semantics and implementation of such variables in T++. Applications written in T++ can be run on computational clusters, SMPs and GRIDs, either in Linux or Windows OS.

...read moreread less

Book Chapter•10.1007/978-3-540-73940-1_32•

Multicriteria Scheduling Strategies in Scalable Computing Systems

[...]

Victor V. Toporkov¹•Institutions (1)

Moscow Power Engineering Institute¹

3 Sep 2007

TL;DR: The approach allows the decomposition of the problem of multicriteria strategy synthesis for the totality of parameterized models of programs with the use of partial and vector quality criteria including, for instance, a cost function and load balancing factors.

...read moreread less

Abstract: An approach to generation and optimization of scheduling and resource allocation strategies in scalable computing systems is proposed. The approach allows the decomposition of the problem of multicriteria strategy synthesis for the totality of parameterized models of programs with the use of partial and vector quality criteria including, for instance, a cost function and load balancing factors.

...read moreread less

Book Chapter•10.1007/978-3-540-73940-1_49•

Dynamic strategy of placement of the replicas in data grid

[...]

Ghalem Belalem¹, Farouk Bouhraoua²•Institutions (2)

University of Oran¹, University of Mostaganem²

3 Sep 2007

TL;DR: The contribution to a cost model whose objective is to reduce the cost of access to replicated data is presented, which depends on many factors like the bandwidth, data size, network latency and the number of the read/ write operations.

...read moreread less

Abstract: Grid computing is a type of parallel and distributed systems, that is designed to provide pervasive and reliable access to data and computational resources over wide are network. Data Grids connect a collect of geographically distributed computers and storage resources located in different parts of the world to facilitate sharing of data and resources. These grids are concentrated on the reduction of the execution time of the applications that require a great number of processing cycles by the computer. In such environment, these advantages are not possible unless by the use of the replication. This later is considered as an important technique to reduce the cost of access to the data in grid. In this present paper, we present our contribution to a cost model whose objective is to reduce the cost of access to replicated data. These costs depend on many factors like the bandwidth, data size, network latency and the number of the read/ write operations.

...read moreread less

Book Chapter•10.1007/978-3-540-73940-1_39•

Cellular automata models for complex matter

[...]

Dominique Désérable¹, Pascal Dupont¹, Mustapha Hellou¹, Siham Kamali-Bernard¹•Institutions (1)

Institut national des sciences appliquées¹

3 Sep 2007

TL;DR: This paper will attempt to bring out the main concepts underlying Cellular automata models and to give an insight for future work

...read moreread less

Abstract: Complex matter may lie in various forms from granular matter, soft matter, fluid-fluid or solid-fluid mixtures to compact heterogeneous material. Cellular automata models make a suitable and powerful tool to catch the influence of the microscopic scale onto the macroscopic behaviour of these complex systems. Rather than a survey, this paper will attempt to bring out the main concepts underlying these models and to give an insight for future work.

...read moreread less

Book Chapter•10.1007/978-3-540-73940-1_28•

Parallel pseudorandom number generator for large-scale monte carlo simulations

[...]

Mikhail Marchenko

3 Sep 2007

TL;DR: A parallel random number generator is given to perform large-scale distributed Monte Carlo simulations and the generator's quality was verified using statistically rigorous tests.

...read moreread less

Abstract: A parallel random number generator is given to perform large-scale distributed Monte Carlo simulations. The generator's quality was verified using statistically rigorous tests. Also special problems with known solutions were used for the testing. The description of program system MONC for large-scale distributed Monte Carlo simulations is also given.

...read moreread less

Book Chapter•10.1007/978-3-540-73940-1_41•

CAOS: a domain-specific language for the parallel simulation of cellular automata

[...]

Clemens Grelck¹, Frank Penczek¹, Kai Trojahner²•Institutions (2)

University of Hertfordshire¹, University of Lübeck²

3 Sep 2007

TL;DR: The design and implementation of CAOS, a domain-specific high-level programming language for the parallel simulation of extended cellular automata, and the CAOS compiler generates efficiently executable code that automatically harnesses the potential of contemporary multi-core processors, shared memory multiprocessors, workstation clusters and supercomputers.

...read moreread less

Abstract: We present the design and implementation of CAOS, a domain-specific high-level programming language for the parallel simulation of extended cellular automata. CAOS allows scientists to specify complex simulations with limited programming skills and effort. Yet the CAOS compiler generates efficiently executable code that automatically harnesses the potential of contemporary multi-core processors, shared memory multiprocessors, workstation clusters and supercomputers.

...read moreread less

Book Chapter•10.1007/978-3-540-73940-1_7•

Towards a computing model for open distributed systems

[...]

Achour Mostefaoui¹•Institutions (1)

University of Rennes¹

3 Sep 2007

TL;DR: A new communication and synchronization model adapted from workqueues used in parallel computing is defined, which allows to benefit from the potential parallelism offered by this style of programming when only an approximate solution is needed.

...read moreread less

Abstract: This paper proposes an implementation of the data structure called bag or multiset used by descriptive programming languages (e.g. Gamma, Linda) over an open system. In this model, a succession of "chemical reactions" consumes the elements of the bag and produces new elements according to specific rules. This approach is particularly interesting as it suppresses all unneeded synchronization and reveals all the potential parallelism of a program. An efficient implementation of a bag provides an efficient implementation of the subsequent program. This paper defines a new communication and synchronization model adapted from workqueues used in parallel computing. The proposed model allows to benefit from the potential parallelism offered by this style of programming when only an approximate solution is needed.

...read moreread less

Book Chapter•10.1007/978-3-540-73940-1_24•

Efficient race verification for debugging programs with openMP directives

[...]

Young-Joo Kim¹, Mun-Hye Kang¹, Ok-Kyoon Ha¹, Yong-Kee Jun¹•Institutions (1)

Gyeongsang National University¹

3 Sep 2007

TL;DR: This tool verifies the existence of races over 250 times faster in average than the previous tool even in the case that the maximum parallelism increases with the fixed number of total accesses using a set of synthetic programs without synchronization such as critical section.

...read moreread less

Abstract: Races must be detected for debugging parallel programs with OpenMP directives because they may cause unintended nondeterministic results of programs. The previous tool that detects races does not verify the existence of races in programs with no internal nondeterminism because the tool regards nested sibling threads as ordered threads and has the possibility of ignoring accesses involved in races in program models with synchronization such as critical section. This paper suggests an efficient tool that verifies the existence of races with optimal performance by applying race detection engines for labeling and detection protocol. The labeling scheme generates a unique identifier for each parallel thread created during a program execution, and the protocol scheme detects at least one race if any. This tool verifies the existence of races over 250 times faster in average than the previous tool even in the case that the maximum parallelism increases with the fixed number of total accesses using a set of synthetic programs without synchronization such as critical section.

...read moreread less

Book Chapter•10.1007/978-3-540-73940-1_12•

Parallel broadband finite element time domain algorithm implemented to dispersive electromagnetic problem

[...]

Boguslaw Butrylo¹•Institutions (1)

Białystok Technical University¹

3 Sep 2007

TL;DR: The spatial and time-dependent distribution of the electromagnetic field is approximated by the finite element method and the parallel form of the algorithm valid for some linear materials, and the formulation of the FE code for a dispersive electromagnetic problem are presented and compared.

...read moreread less

Abstract: The numerical analysis of some broadband electromagnetic fields and frequency-dependent materials using a time domain method is the main subject of this paper. The spatial and time-dependent distribution of the electromagnetic field is approximated by the finite element method. The parallel form of the algorithm valid for some linear materials, and the formulation of the FE code for a dispersive electromagnetic problem are presented and compared. The complex forms of these algorithms have an effect on the memory and computational costs of the distributed formulation. The properties of the algorithm are estimated using high performance cluster of workstations.

...read moreread less

Book Chapter•10.1007/978-3-540-73940-1_2•

Adaptive workflow nets for grid computing

[...]

Carmen Bratosin¹, Kees M. van Hee¹, Natalia Sidorova¹•Institutions (1)

Eindhoven University of Technology¹

3 Sep 2007

TL;DR: This paper defines Adaptive Grid Workflow nets (AGWF nets) appropriate for modeling grid workflows and allowing changes in the process structure as a response to triggering events/exceptions, which makes the model especially appropriate for a number of grid applications.

...read moreread less

Abstract: Existing grid applications commonly use workflows for the orchestration of grid services. Existing workflow models however suffer from the lack of adaptivity. In this paper we define Adaptive Grid Workflow nets (AGWF nets) appropriate for modeling grid workflows and allowing changes in the process structure as a response to triggering events/exceptions. Moreover, a recursion is allowed, which makes the model especially appropriate for a number of grid applications. We show that soundness can be verified for AGWF nets.

...read moreread less

Book Chapter•10.1007/978-3-540-73940-1_13•

Strategies for development of a parallel program for protoplanetary disc simulation

[...]

Sergei Kireev, Elvira A. Kuksheva, Aleksey Snytnikov, Nikolay Snytnikov, Vitaly A. Vshivkov - Show less +1 more

3 Sep 2007

TL;DR: The reduction of the 3D protoplanetary disc model to quasi-3D, the use of fundamental Poisson equation solution, the simulation in the natural coordinate system and computation domain decomposition.

...read moreread less

Abstract: Protoplanetary disc simulation must be done first, with high precision, and second, with high speed. Some strategies to reach these goals are presented in the paper. They include: the reduction of the 3D protoplanetary disc model to quasi-3D, the use of fundamental Poisson equation solution, the simulation in the natural (cylindrical) coordinate system and computation domain decomposition. The domain decomposition strategy is shown to reach the simulation goals the best.

...read moreread less