Scispace (Formerly Typeset)
  1. Home
  2. Conferences
  3. Parallel Computing Technologies
  4. 2007
  1. Home
  2. Conferences
  3. Parallel Computing Technologies
  4. 2007
Showing papers presented at "Parallel Computing Technologies in 2007"
Book Chapter•10.1007/978-3-540-73940-1_63•
Load balancing approach parallel algorithm for frequent pattern mining

[...]

Kun-Ming Yu1, Jiayi Zhou1, Wei Chen Hsiao1•
Chung Hua University1
3 Sep 2007
TL;DR: A parallel and distributed mining algorithm based on FP-tree structure, Load Balancing FP-Tree (LFP-tree).
Abstract: Association rules mining from transaction-oriented databases is an important issue in data mining. Frequent pattern is crucial for association rules generation, time series analysis, classification, etc. There are two categories of algorithms that had been proposed, candidate set generate-and-test approach (Apriori-like) and Pattern growth approach. Many methods had been proposed to solve the association rules mining problem based on FP-tree instead of Apriori-like, since apriori-like algorithm scans the database many times. However, the computation time is costly when the database size is large with FP-tree data structure. Parallel and distributed computing is a good strategy to solve this circumstance. Some parallel algorithms had been proposed, however, most of them did not consider the load balancing issue. In this paper, we proposed a parallel and distributed mining algorithm based on FP-tree structure, Load Balancing FP-Tree (LFP-tree). The algorithm divides the item set for mining by evaluating the tree's width and depth. Moreover, a simple and trusty calculate formulation for loading degree is proposed. The experimental results show that LFP-tree can reduce the computation time and has less idle time compared with Parallel FP-Tree (PFP-tree). In addition, it has better speed-up ratio than PFP-tree when number of processors grow. The communication time can be reduced by preserving the heavy loading items in their local computing node.

37 citations

Book Chapter•10.1007/978-3-540-73940-1_3•
A stochastic semantics for bioambients

[...]

Linda Brodo, Pierpaolo Degano, Corrado Priami1•
The Microsoft Research - University of Trento Centre for Computational and Systems Biology1
3 Sep 2007
TL;DR: BioAmbients is extended to take quantitative information into account by defining a stochastic semantics, based on a simulation stoChastic algorithm, to determine the actual rate of transitions.
Abstract: We consider BioAmbients, a calculus for specifying biological entities and for simulating and analysing their behaviour. We extend BioAmbients to take quantitative information into account by defining a stochastic semantics, based on a simulation stochastic algorithm, to determine the actual rate of transitions.

29 citations

Book Chapter•10.1007/978-3-540-73940-1_30•
Parallelism granules aggregation with the T-system

[...]

Alexander Moskovsky1, Vladimir Roganov1, Sergei M. Abramov1•
Russian Academy of Sciences1
3 Sep 2007
TL;DR: This paper is dedicated to one of important T-system aspects -- ability to change parallelism granule size at runtime -- and is shown to reduce overhead incurred by runtime support library dramatically.
Abstract: T-system is a tool for parallel computing developed at the PSI RAS. The most recent implementation is available on both Linux and Windows platforms. The paper is dedicated to one of important T-system aspects -- ability to change parallelism granule size at runtime. The technique is available, primarily, for recursive programs, but it's possible to extent it to non-recursive ones as well. In the latter case, we employ C++ template"traits"for program transformation. The technique is shown to reduce overhead incurred by runtime support library dramatically.

20 citations

Book Chapter•10.1007/978-3-540-73940-1_29•
Dynamic job scheduling on the grid environment using the great deluge algorithm

[...]

Paul McMullan1, Barry McCollum1•
Queen's University Belfast1
3 Sep 2007
TL;DR: This paper presents an extension of a technique used in optimization and scheduling which can provide the means of achieving a trade-off between solution quality and speed in achieving a solution.
Abstract: The utilization of the computational Grid processor network has become a common method for researchers and scientists without access to local processor clusters to avail of the benefits of parallel processing for computeintensive applications. As a result, this demand requires effective and efficient dynamic allocation of available resources. Although static scheduling and allocation techniques have proved effective, the dynamic nature of the Grid requires innovative techniques for reacting to change and maintaining stability for users. The dynamic scheduling process requires quite powerful optimization techniques, which can themselves lack the performance required in reaction time for achieving an effective schedule solution. Often there is a trade-off between solution quality and speed in achieving a solution. This paper presents an extension of a technique used in optimization and scheduling which can provide the means of achieving this balance and improves on similar approaches currently published.

18 citations

Book Chapter•10.1007/978-3-540-73940-1_43•
Comparison of evolving uniform, non-uniform cellular automaton, and genetic programming for centroid detection with hardware agents

[...]

Marcus Komann1, Andreas Mainka1, Dietmar Fey1•
University of Jena1
3 Sep 2007
TL;DR: This paper compares three different approaches for finding geometric algorithms for centroid detection which are appropriate for a fine-grained parallel hardware architecture in an embedded vision chip.
Abstract: Current industrial applications require fast and robust image processing in systems with low size and power dissipation. One of the main tasks in industrial vision is fast detection of centroids of objects. This paper compares three different approaches for finding geometric algorithms for centroid detection which are appropriate for a fine-grained parallel hardware architecture in an embedded vision chip. The algorithms shall comprise emergent capabilities and high problem-specific functionality without requiring large amounts of states or memory. For that problem, we consider uniform and non-uniform cellular automata (CA) as well as Genetic Programming. Due to the inherent complexity of the problem, an evolutionary approach is applied. The appropriateness of these approaches for centroid detection is discussed.

17 citations

Book Chapter•10.1007/978-3-540-73940-1_9•
Optimized parallel approach for 3D modelling of forest fire behaviour

[...]

Gilbert Accary, Oleg Bessonov1, Dominique Fougère, Sofiane Meradji, Dominique Morvan •
Russian Academy of Sciences1
3 Sep 2007
TL;DR: Methods for parallelization of 3D CFD forest fire modelling code on Non-uniform memory computers in frame of the OpenMP environment are presented and performance results for the parallelized algorithm are presented.
Abstract: In this paper we present methods for parallelization of 3D CFD forest fire modelling code on Non-uniform memory computers in frame of the OpenMP environment. Mathematical model is presented first. Then, some peculiarities of this class of computers are considered, along with properties and limitations of the OpenMP model. Techniques for efficient parallelization are discussed, considering different types of data processing algorithms. Finally, performance results for the parallelized algorithm are presented and analyzed (for up to 16 processors).

16 citations

Book Chapter•10.1007/978-3-540-73940-1_14•
Generation of SMACA and its application in web services

[...]

Anirban Kundu1, Ruma Dutta1, Debajyoti Mukhopadhyay1•
West Bengal University of Technology1
3 Sep 2007
TL;DR: An efficient solution to handle the indexing problem is proposed with the introduction of Nonlinear Single Cycle Multiple Attractor Cellular Automata (SMACA), which simultaneously shows generation of SMACA by using specific rule sequence.
Abstract: Web Search Engine uses forward indexing and inverted indexing as a part of its functional design. This indexing mechanism helps retrieving data from the database based on user query. In this paper, an efficient solution to handle the indexing problem is proposed with the introduction of Nonlinear Single Cycle Multiple Attractor Cellular Automata (SMACA). This work simultaneously shows generation of SMACA by using specific rule sequence. Searching mechanism is done with linear time complexity.

16 citations

Book Chapter•10.1007/978-3-540-73940-1_35•
Accelerating the singular value decomposition of rectangular matrices with the CSK600 and the integrable SVD

[...]

Yusaku Yamamoto1, Takeshi Fukaya1, Takashi Uneyama2, Masami Takata3, Kinji Kimura4, Masashi Iwasaki5, Yoshimasa Nakamura2 •
Nagoya University1, Kyoto University2, Nara Women's University3, Niigata University4, Kyoto Prefectural University5
3 Sep 2007
TL;DR: This paper optimize two of the major components of rectangular SVD, namely, QR decomposition of the input matrix and back-transformation of the left singular vectors by matrix Q, so that large-size matrix multiplications can be used efficiently.
Abstract: We propose an approach to speed up the singular value decomposition (SVD) of very large rectangular matrices using the CSX600 floating point coprocessor. The CSX600-based acceleration board we use offers 50GFLOPS of sustained performance, which is many times greater than that provided by standard microprocessors. However, this performance can be achieved only when a vendor-supplied matrix-matrix multiplication routine is used and the matrix size is sufficiently large. In this paper, we optimize two of the major components of rectangular SVD, namely, QR decomposition of the input matrix and back-transformation of the left singular vectors by matrix Q, so that large-size matrix multiplications can be used efficiently. In addition, we use the Integrable SVD algorithm to compute the SVD of an intermediate bidiagonal matrix. This helps to further speed up the computation and reduce the memory requirements. As a result, we achieved up to 3.5 times speedup over the Intel Math Kernel Library running on an 3.2GHz Xeon processor when computing the SVD of a 100,000 × 4000 matrix.

14 citations

Book Chapter•10.1007/978-3-540-73940-1_5•
From unreliable objects to reliable objects: the case of atomic registers and consensus

[...]

Rachid Guerraoui1, Michel Raynal2•
École Polytechnique Fédérale de Lausanne1, University of Rennes2
3 Sep 2007
TL;DR: The paper addresses the object failure model where the base objects can suffer responsive or nonresponsive crash failures and considers self-implementations, i.e., the case where a reliable atomic register is built from unreliable atomic registers (resp., unreliable consensus objects).
Abstract: A concurrent object is an object that can be concurrently accessed by several processes. It has been shown by Maurice Herlihy that any concurrent object O defined by a sequential specification can be waitfree implemented from reliable atomic registers (shared variables) and consensus objects. Wait-free means that any invocation of an operation of the object O issued by a non-faulty process does terminate, whatever the behavior of the other processes (e.g., despite the fact they are very slow or even have crashed). So, an important issue consists in providing reliable atomic registers and reliable consensus objects despite the failures experienced by the base objects from which these atomic registers and consensus objects are built. This paper considers self-implementations, i.e., the case where a reliable atomic register (resp., consensus object) is built from unreliable atomic registers (resp., unreliable consensus objects). The paper addresses the object failure model where the base objects can suffer responsive or nonresponsive crash failures. When there are solutions the paper presents corresponding algorithms, and when there is no solution, it presents the corresponding impossibility result. The paper has a tutorial flavor whose aim is to make the reader familiar with important results when one has to build resilient concurrent objects. To that aim, the paper use both algorithms from the literature and new algorithms.

14 citations

Book Chapter•10.1007/978-3-540-73940-1_26•
Dynamic load balancing of black-box applications with a resource selection mechanism on heterogeneous resources of the grid

[...]

Valeria V. Krzhizhanovskaya1, Vladimir Korkhov1•
Saint Petersburg State Polytechnic University1
3 Sep 2007
TL;DR: This paper describes the proposed algorithm for automated load balancing, paying attention to the influence of resource heterogeneity metrics, demonstrates the speedup achieved with this technique, and proposes a way to extend the approach to a wider class of applications.
Abstract: In this paper we address the critical issues of efficient resource management and high-performance parallel distributed computing on the Grid by introducing a new hierarchical approach that combines a user-level job scheduling with a dynamic load balancing technique that automatically adapts a blackbox distributed or parallel application to the heterogeneous resources. The algorithm developed dynamically selects the resources best suited for a particular task or parallel process of the executed application, and optimizes the load balance based on the dynamically measured resource parameters and estimated requirements of the application. We describe the proposed algorithm for automated load balancing, paying attention to the influence of resource heterogeneity metrics, demonstrate the speedup achieved with this technique for different types of applications and resources, and propose a way to extend the approach to a wider class of applications.

13 citations

Book Chapter•10.1007/978-3-540-73940-1_46•
Self-organised criticality in a model of the rat somatosensory cortex

[...]

Grzegorz M. Wojcik1, Wieslaw A. Kaminski1, Piotr Matejanka2•
Maria Curie-Skłodowska University1, Motorola2
3 Sep 2007
TL;DR: Large Hodgkin-Huxley neural networks were examined and the structures discussed in this article simulated a part of the rat somatosensory cortex and an occurrence of the self-organised criticality (SOC) was demonstrated.
Abstract: Large Hodgkin-Huxley (HH) neural networks were examined and the structures discussed in this article simulated a part of the rat somatosensory cortex. We used a modular architecture of the network divided into layers and sub-regions. Because of a high degree of complexity effective parallelisation of algorithms was required. The results of parallel simulations were presented. An occurrence of the self-organised criticality (SOC) was demonstrated. Most notably, in large biological neural networks consisting of artificial HH neurons, the SOC was shown to manifest itself in the frequency of its appearance as a function of the size of spike potential avalanches generated within such nets. These two parameters followed the power law characteristic of other systems exhibiting the SOC behaviour.
Book Chapter•10.1007/978-3-540-73940-1_37•
Pedestrian and crowd dynamics simulation: testing SCA on paradigmatic cases of emerging coordination in negative interaction conditions

[...]

Stefania Bandini1, Mizar Luca Federici1, Sara Manzoni1, Giuseppe Vizzari1•
University of Milan1
3 Sep 2007
TL;DR: This paper presents a set of theoretical experiments performed to evaluate Situated Cellular Agent (SCA) approach within pedestrian dynamics research context and focuses on two emerging phenomena (freezing by heating and lane formation phenomena) that have been empirically observed and already modeled by analytical particle- based models and Cellular Automata-based models.
Abstract: The paper presents a set of theoretical experiments performed to evaluate Situated Cellular Agent (SCA) approach within pedestrian dynamics research context. SCA is a modeling and simulation approach based on Multi Agent Systems principles that derives from Cellular Automata. In particular, we focus on two emerging phenomena (freezing by heating and lane formation phenomena) that have been empirically observed and already modeled by analytical particle-based models and Cellular Automata-based models.
Book Chapter•10.1007/978-3-540-73940-1_33•
Latencies of conflicting writes on contemporary multicore architectures

[...]

Josef Weidendorfer1, Michael Ott1, Tobias Klug1, Carsten Trinitis1•
Technische Universität München1
3 Sep 2007
TL;DR: Results show that multicore architectures with shared cache can reduce unwanted effects of false sharing, and a benchmark allowing for quantitative estimates about the consequences of the false sharing effect is presented.
Abstract: This paper provides a detailed investigation of latency penalties caused by repeated memory writes to nearby memory cells from different threads in parallel programs. When such writes map to the same corresponding cache lines in multiple processors, one can observe the so called false sharing effect. This effect can unnecessarily hamper parallel code due to the line granularity based cache hierarchy, which is common on contemporary processor architectures. In this contribution, a benchmark allowing for quantitative estimates about the consequences of the false sharing effect, is presented. Results show that multicore architectures with shared cache can reduce unwanted effects of false sharing.
Proceedings Article•
Proceedings of the 9th international conference on Parallel Computing Technologies

[...]

Victor E. Malyshkin1•
Russian Academy of Sciences1
3 Sep 2007
Book Chapter•10.1007/978-3-540-73940-1_38•
Coarse-grained parallelization of cellular-automata simulation algorithms

[...]

Olga Bandman1•
Russian Academy of Sciences1
3 Sep 2007
TL;DR: A general approach to CA parallelization, based on domain decomposition correctness conditions, is formulated and particular parallelization methods are developed for the main classes of CA simulation models: synchronous CA with multi-cell updating rules, asynchronous probabilistic CA, and CA compositions.
Abstract: Simulating spatial dynamics in physics by Cellular Automata (CA) requires very large computation power, and, hence, CA simulation algorithms are to be implemented on multiprocessors. The preconceived opinion, that no much effort is required to obtain highly efficient coarse grained parallel CA algorithm, is not always true. In fact, a great variety of CA modifications coming into practical use need appropriate, sometimes sophisticated, methods of CA algorithms parallel implementation. Proceeding from the above a general approach to CA parallelization, based on domain decomposition correctness conditions, is formulated. Starting from the correctness conditions particular parallelization methods are developed for the main classes of CA simulation models: synchronous CA with multi-cell updating rules, asynchronous probabilistic CA, and CA compositions. Examples and experimental results are given for each case.
Book Chapter•10.1007/978-3-540-73940-1_21•
Address-free all-to-all routing in sparse torus

[...]

Risto Honkanen1, Ville Leppänen2, Martti Penttonen1•
University of Eastern Finland1, University of Turku2
3 Sep 2007
TL;DR: A time-scheduled routing algorithm where packets are routed address-free and it is shown that a total exchange relation, where every processor has a packet to route to every other processor, can be routed with routing cost of 1/2 + o(1) time units per packet.
Abstract: In this work we present a simple network design for all-to-all routing and study deflection routing on it. We present a time-scheduled routing algorithm where packets are routed address-free. We show that a total exchange relation, where every processor has a packet to route to every other processor, can be routed with routing cost of 1/2 + o(1) time units per packet. The network consists of an n-sided d-dimensional torus, where the nd-1 processor (or input/output) nodes are sparsely but regularly situated among nd - nd-1 deflection routing nodes, having d input and d output links. The finite-state routing nodes change their states by a fixed, preprogrammed pattern.
Book Chapter•10.1007/978-3-540-73940-1_56•
Using analytical models to load balancing in a heterogeneous network of computers

[...]

Jean Marcos Laine1, Edson T. Midorikawa1•
University of São Paulo1
3 Sep 2007
TL;DR: Two approaches to workload distribution based on analytical models developed to performance prediction of parallel applications, named PEMPIs VRP (Vector of Relative Performances), are presented and the results show that the VRP's dynamic strategy can reduce the imbalance, among the execution time of the processes, in relation to average time.
Abstract: An effective workload distribution has a prime rule on reducing the total execution time of a parallel application on heterogeneous environments, such as computational grids and heterogeneous clusters. Several methods have been proposed in the literature by many researchers in the last decade. This paper presents two approaches to workload distribution based on analytical models developed to performance prediction of parallel applications, named PEMPIs VRP (Vector of Relative Performances). The workload is distributed based on relative performance ratios, obtained by these models. In this work, we present two schemes, static and dynamic, in a research middleware for a heterogeneous network of computers. In the experimental tests we evaluated and compared them using two MPI applications. The results show that, using the VRP's dynamic strategy, we can reduce the imbalance, among the execution time of the processes, in relation to average time from 25% to near of 5%.
Book Chapter•10.1007/978-3-540-73940-1_57•
Block-based allocation algorithms for FLASH memory in embedded systems

[...]

Pangfeng Liu1, Chung-Hao Chuang1, Jan-Jan Wu2•
National Taiwan University1, Academia Sinica2
3 Sep 2007
TL;DR: An offline allocation algorithm called Best Match (BestM) for allocating blocks in FLASH file systems is proposed and experimental results indicate that BestM delivers better performance than a previously proposed First Rearrival First Serve (FRFS) method.
Abstract: A flash memory has write-once and bulk-erase properties so that an intelligent allocation algorithm is essential to providing applications efficient storage service. This paper first demonstrates that the online version of FLASH allocation problem is difficult, since we can find an adversary that makes every online algorithm to use as many number of blocks as a naive and inefficient algorithm. As a result we propose an offline allocation algorithm called Best Match (BestM) for allocating blocks in FLASH file systems. The experimental results indicate that BestM delivers better performance than a previously proposed First Rearrival First Serve (FRFS) method.
Book Chapter•10.1007/978-3-540-73940-1_54•
Runtime system for parallel execution of fragmented subroutines

[...]

Konstantin Kalgin1, Victor E. Malyshkin1, S. P. Nechaev1, G. A. Tschukin2•
Russian Academy of Sciences1, Novosibirsk State Technical University2
3 Sep 2007
TL;DR: The architecture of a runtime system supporting parallel execution of fragmented library subroutines on multicomputers is proposed and makes possible to develop the library of parallel subroutedines and to provide automatically their dynamic properties such as dynamic load balancing.
Abstract: The architecture of a runtime system supporting parallel execution of fragmented library subroutines on multicomputers is proposed The approach makes possible to develop the library of parallel subroutines and to provide automatically their dynamic properties such as dynamic load balancing Usage of the MPI for communications programming provides good portability of an application
Book Chapter•10.1007/978-3-540-73940-1_58•
Variable reassignment in the T++ parallel programming language

[...]

Alexander Moskovsky1, Vladimir Roganov1, Sergei M. Abramov1, A. B. Kuznetsov1•
Russian Academy of Sciences1
3 Sep 2007
TL;DR: The paper focused on semantics and implementation of repeated assignments to a variable in T++, an extension for C++ that adds a set of keywords to C++, allowing smooth transition from sequential to parallel applications.
Abstract: The paper describes the OpenTS parallel programming system that provides the runtime environment for T++ language. T++ is an extension for C++ that adds a set of keywords to C++, allowing smooth transition from sequential to parallel applications. In this context the support of repeated assignments to a variable is an important feature. The paper focused on semantics and implementation of such variables in T++. Applications written in T++ can be run on computational clusters, SMPs and GRIDs, either in Linux or Windows OS.
Book Chapter•10.1007/978-3-540-73940-1_32•
Multicriteria Scheduling Strategies in Scalable Computing Systems

[...]

Victor V. Toporkov1•
Moscow Power Engineering Institute1
3 Sep 2007
TL;DR: The approach allows the decomposition of the problem of multicriteria strategy synthesis for the totality of parameterized models of programs with the use of partial and vector quality criteria including, for instance, a cost function and load balancing factors.
Abstract: An approach to generation and optimization of scheduling and resource allocation strategies in scalable computing systems is proposed. The approach allows the decomposition of the problem of multicriteria strategy synthesis for the totality of parameterized models of programs with the use of partial and vector quality criteria including, for instance, a cost function and load balancing factors.
Book Chapter•10.1007/978-3-540-73940-1_49•
Dynamic strategy of placement of the replicas in data grid

[...]

Ghalem Belalem1, Farouk Bouhraoua2•
University of Oran1, University of Mostaganem2
3 Sep 2007
TL;DR: The contribution to a cost model whose objective is to reduce the cost of access to replicated data is presented, which depends on many factors like the bandwidth, data size, network latency and the number of the read/ write operations.
Abstract: Grid computing is a type of parallel and distributed systems, that is designed to provide pervasive and reliable access to data and computational resources over wide are network. Data Grids connect a collect of geographically distributed computers and storage resources located in different parts of the world to facilitate sharing of data and resources. These grids are concentrated on the reduction of the execution time of the applications that require a great number of processing cycles by the computer. In such environment, these advantages are not possible unless by the use of the replication. This later is considered as an important technique to reduce the cost of access to the data in grid. In this present paper, we present our contribution to a cost model whose objective is to reduce the cost of access to replicated data. These costs depend on many factors like the bandwidth, data size, network latency and the number of the read/ write operations.
Book Chapter•10.1007/978-3-540-73940-1_39•
Cellular automata models for complex matter

[...]

Dominique Désérable1, Pascal Dupont1, Mustapha Hellou1, Siham Kamali-Bernard1•
Institut national des sciences appliquées1
3 Sep 2007
TL;DR: This paper will attempt to bring out the main concepts underlying Cellular automata models and to give an insight for future work
Abstract: Complex matter may lie in various forms from granular matter, soft matter, fluid-fluid or solid-fluid mixtures to compact heterogeneous material. Cellular automata models make a suitable and powerful tool to catch the influence of the microscopic scale onto the macroscopic behaviour of these complex systems. Rather than a survey, this paper will attempt to bring out the main concepts underlying these models and to give an insight for future work.
Book Chapter•10.1007/978-3-540-73940-1_28•
Parallel pseudorandom number generator for large-scale monte carlo simulations

[...]

Mikhail Marchenko
3 Sep 2007
TL;DR: A parallel random number generator is given to perform large-scale distributed Monte Carlo simulations and the generator's quality was verified using statistically rigorous tests.
Abstract: A parallel random number generator is given to perform large-scale distributed Monte Carlo simulations. The generator's quality was verified using statistically rigorous tests. Also special problems with known solutions were used for the testing. The description of program system MONC for large-scale distributed Monte Carlo simulations is also given.
Book Chapter•10.1007/978-3-540-73940-1_41•
CAOS: a domain-specific language for the parallel simulation of cellular automata

[...]

Clemens Grelck1, Frank Penczek1, Kai Trojahner2•
University of Hertfordshire1, University of Lübeck2
3 Sep 2007
TL;DR: The design and implementation of CAOS, a domain-specific high-level programming language for the parallel simulation of extended cellular automata, and the CAOS compiler generates efficiently executable code that automatically harnesses the potential of contemporary multi-core processors, shared memory multiprocessors, workstation clusters and supercomputers.
Abstract: We present the design and implementation of CAOS, a domain-specific high-level programming language for the parallel simulation of extended cellular automata. CAOS allows scientists to specify complex simulations with limited programming skills and effort. Yet the CAOS compiler generates efficiently executable code that automatically harnesses the potential of contemporary multi-core processors, shared memory multiprocessors, workstation clusters and supercomputers.
Book Chapter•10.1007/978-3-540-73940-1_7•
Towards a computing model for open distributed systems

[...]

Achour Mostefaoui1•
University of Rennes1
3 Sep 2007
TL;DR: A new communication and synchronization model adapted from workqueues used in parallel computing is defined, which allows to benefit from the potential parallelism offered by this style of programming when only an approximate solution is needed.
Abstract: This paper proposes an implementation of the data structure called bag or multiset used by descriptive programming languages (e.g. Gamma, Linda) over an open system. In this model, a succession of "chemical reactions" consumes the elements of the bag and produces new elements according to specific rules. This approach is particularly interesting as it suppresses all unneeded synchronization and reveals all the potential parallelism of a program. An efficient implementation of a bag provides an efficient implementation of the subsequent program. This paper defines a new communication and synchronization model adapted from workqueues used in parallel computing. The proposed model allows to benefit from the potential parallelism offered by this style of programming when only an approximate solution is needed.
Book Chapter•10.1007/978-3-540-73940-1_24•
Efficient race verification for debugging programs with openMP directives

[...]

Young-Joo Kim1, Mun-Hye Kang1, Ok-Kyoon Ha1, Yong-Kee Jun1•
Gyeongsang National University1
3 Sep 2007
TL;DR: This tool verifies the existence of races over 250 times faster in average than the previous tool even in the case that the maximum parallelism increases with the fixed number of total accesses using a set of synthetic programs without synchronization such as critical section.
Abstract: Races must be detected for debugging parallel programs with OpenMP directives because they may cause unintended nondeterministic results of programs. The previous tool that detects races does not verify the existence of races in programs with no internal nondeterminism because the tool regards nested sibling threads as ordered threads and has the possibility of ignoring accesses involved in races in program models with synchronization such as critical section. This paper suggests an efficient tool that verifies the existence of races with optimal performance by applying race detection engines for labeling and detection protocol. The labeling scheme generates a unique identifier for each parallel thread created during a program execution, and the protocol scheme detects at least one race if any. This tool verifies the existence of races over 250 times faster in average than the previous tool even in the case that the maximum parallelism increases with the fixed number of total accesses using a set of synthetic programs without synchronization such as critical section.
Book Chapter•10.1007/978-3-540-73940-1_12•
Parallel broadband finite element time domain algorithm implemented to dispersive electromagnetic problem

[...]

Boguslaw Butrylo1•
Białystok Technical University1
3 Sep 2007
TL;DR: The spatial and time-dependent distribution of the electromagnetic field is approximated by the finite element method and the parallel form of the algorithm valid for some linear materials, and the formulation of the FE code for a dispersive electromagnetic problem are presented and compared.
Abstract: The numerical analysis of some broadband electromagnetic fields and frequency-dependent materials using a time domain method is the main subject of this paper. The spatial and time-dependent distribution of the electromagnetic field is approximated by the finite element method. The parallel form of the algorithm valid for some linear materials, and the formulation of the FE code for a dispersive electromagnetic problem are presented and compared. The complex forms of these algorithms have an effect on the memory and computational costs of the distributed formulation. The properties of the algorithm are estimated using high performance cluster of workstations.
Book Chapter•10.1007/978-3-540-73940-1_2•
Adaptive workflow nets for grid computing

[...]

Carmen Bratosin1, Kees M. van Hee1, Natalia Sidorova1•
Eindhoven University of Technology1
3 Sep 2007
TL;DR: This paper defines Adaptive Grid Workflow nets (AGWF nets) appropriate for modeling grid workflows and allowing changes in the process structure as a response to triggering events/exceptions, which makes the model especially appropriate for a number of grid applications.
Abstract: Existing grid applications commonly use workflows for the orchestration of grid services. Existing workflow models however suffer from the lack of adaptivity. In this paper we define Adaptive Grid Workflow nets (AGWF nets) appropriate for modeling grid workflows and allowing changes in the process structure as a response to triggering events/exceptions. Moreover, a recursion is allowed, which makes the model especially appropriate for a number of grid applications. We show that soundness can be verified for AGWF nets.
Book Chapter•10.1007/978-3-540-73940-1_13•
Strategies for development of a parallel program for protoplanetary disc simulation

[...]

Sergei Kireev, Elvira A. Kuksheva, Aleksey Snytnikov, Nikolay Snytnikov, Vitaly A. Vshivkov 
3 Sep 2007
TL;DR: The reduction of the 3D protoplanetary disc model to quasi-3D, the use of fundamental Poisson equation solution, the simulation in the natural coordinate system and computation domain decomposition.
Abstract: Protoplanetary disc simulation must be done first, with high precision, and second, with high speed. Some strategies to reach these goals are presented in the paper. They include: the reduction of the 3D protoplanetary disc model to quasi-3D, the use of fundamental Poisson equation solution, the simulation in the natural (cylindrical) coordinate system and computation domain decomposition. The domain decomposition strategy is shown to reach the simulation goals the best.

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve