Top 33 Parallel Processing Letters papers published in 1997

TL;DR: This paper addresses the compile-time optimization of a form of nested-loop computation that is motivated by a computational physics application and a pruning search strategy for determination of an optimal form is developed.

...read moreread less

Abstract: This paper addresses the compile-time optimization of a form of nested-loop computation that is motivated by a computational physics application. The computations involve multi-dimensional surface and volume integrals where the integrand is a product of a number of array terms. Besides the issue of optimal distribution of the arrays among the processors, there is also scope for reordering of the operations using the commutativity and associativity properties of addition and multiplication, and the application of the distributive law to significantly reduce the number of operations executed. A formalization of the operation minimization problem and proof of its NP-completeness is provided. A pruning search strategy for determination of an optimal form is developed. An analysis of the communication requirements and a polynomial-time algorithm for determination of optimal distribution of the arrays are also provided.

...read moreread less

105 citations

Journal Article•10.1142/S0129626497000073•

On Embedding Cycles in k-Ary n-Cubes

[...]

Yaagoub A. Ashir¹, Iain A. Stewart²•Institutions (2)

Swansea University¹, University of Leicester²

TL;DR: This analysis yields an efficient algorithm for generating a cycle of any given length, if indeed one exists, thus answering a question posed by Bose, Broeg, Kwon and Ashir.

...read moreread less

Abstract: We completely classify when a k-ary n-cube , for k ≥ 3 and n ≥ 2, contains a cycle of some given length. Our analysis yields an efficient algorithm for generating a cycle of any given length, if indeed one exists, thus answering a question posed by Bose, Broeg, Kwon and Ashir.

...read moreread less

44 citations

Journal Article•10.1142/S0129626497000334•

On Self-Stabilizing Wait-Free Clock Synchronization

[...]

Marina Papatriantafilou¹, Philippas Tsigas¹•Institutions (1)

Max Planck Society¹

Doran Wilde, Sanjay Rajopadhye

TL;DR: This work presents a protocol that achieves quadratic synchronization time, by "re-parameterizing" and improving the best previously known solution, which had cubic synchronization time.

...read moreread less

Abstract: A protocol which can tolerate any number of processors failing by ceasing operation for unbounded time and resuming operation (with or) without knowing that they were faulty is called wait-free; if it also works correctly even when the starting state of the system is arbitrary, it is called wait-free, self-stabilizing. This work is on the problem of wait-free, self-stabilizing clock synchronization of n processors in "in-phase" multiprocessor systems and presents a protocol that achieves quadratic synchronization time, by "re-parameterizing" and improving the best previously known solution, which had cubic synchronization time. Both the protocol and its analysis are intuitive and easy to understand.

...read moreread less

29 citations

Journal Article•10.1142/S0129626497000218•

Memory Reuse Analysis in the Polyhedral Model

[...]

TL;DR: It is shown how the polyhedral model allows us to statically compute the lifetimes of program variables, and thus enables us to derive necessary and sufficient conditions for reusing memory.

...read moreread less

Abstract: In the context of developing a compiler for a ALPHA, a functional data-parallel language based on systems of affine recurrence equations (SAREs), we address the problem of transforming scheduled single-assignment code to multiple assignment code. We show how the polyhedral model allows us to statically compute the lifetimes of program variables, and thus enables us to derive necessary and sufficient conditions for reusing memory.

...read moreread less

21 citations

Journal Article•10.1142/S0129626497000061•

On the Hardness of Devising Interval Routing Schemes

[...]

Michele Flammini

Centre national de la recherche scientifique¹, University of Passau²

TL;DR: It is proved that the problem of deciding whether there exists a 2-IRS for any network G is NP-complete and this is the first hardness result for k- IRS where k is constant and the graph underlying the network is unweighted.

...read moreread less

Abstract: The k-Interval Routing Scheme (k-IRS) is a compact routing scheme on general networks. It has been studied extensively and recently been implemented on the latest generation of the INMOS transputer router chips. In this paper we investigate the time complexity of devising a minimal space k-IRS and we prove that the problem of deciding whether there exists a 2-IRS for any network G is NP-complete. This is the first hardness result for k-IRS where k is constant and the graph underlying the network is unweighted. Moreover, the NP-completeness holds also for linear and strict 2-IRS.

...read moreread less

21 citations

Journal Article•10.1142/S0129626497000140•

Array Dataflow Analysis for Explicitly Parallel Programs

[...]

Jean-Francois Collard¹, Martin Griebl²•Institutions (2)

University of California, Irvine¹

TL;DR: This analysis departs from previous work because it simultaneously handles both parallel programming paradigms and does not rely on the usual iterative solving process of a set of data flow equations but extends array dataflow analysis based on integer linear programming, thus improving the precision of results.

...read moreread less

Abstract: This paper describes a dataflow analysis of array data structures for data-parallel and/or control- (or task-) parallel imperative languages. This analysis departs from previous work because it 1) simultaneously handles both parallel programming paradigms, and 2) does not rely on the usual iterative solving process of a set of data flow equations but extends array dataflow analysis based on integer linear programming, thus improving the precision of results.

...read moreread less

14 citations

Journal Article•10.1142/S0129626497000310•

Dictionary Compression on the Pram

[...]

Daniel S. Hirschberg¹, Lynn M. Stauffer¹•Institutions (1)

National Chiao Tung University¹

TL;DR: Parallel algorithms for lossless data compression via dictionary compression using optimal, longest fragment first (LFF), and greedy parsing strategies are described and are practical in the sense that their analysis elicits small constants.

...read moreread less

Abstract: Parallel algorithms for lossless data compression via dictionary compression using optimal, longest fragment first (LFF), and greedy parsing strategies are described. Dictionary compression removes redundancy by replacing substrings of the input by references to strings stored in a dictionary. Given a static dictionary stored as a suffix tree, we present a CREW PRAM algorithm for optimal compression which runs in O(M + log M log n) time with O(nM2) processors, where it is assumed that M is the maximum length of any dictionary entry. Under the same model, we give an algorithm for LFF compression which runs in O(log2 n) time with O(n/log n) processors where it is assumed that the maximum dictionary entry is of length O(log n). We also describe an O(M + log n) time and O(n) processor algorithm for greedy parsing given a static or sliding-window dictionary. For sliding-window compression, a different approach finds the greedy parsing in O(log n) time using (O(nM log M/log n) processors. Our algorithms are practical in the sense that their analysis elicits small constants.

...read moreread less

14 citations

Journal Article•10.1142/S0129626497000279•

A Permutation Routing Algorithm for Double Loop Networks

[...]

Frank K. Hwang¹, Tzai-Shunne Lin¹, Rong-Hong Jan¹•Institutions (1)

Research Academic Computer Technology Institute¹

TL;DR: This work shows that double-loop networks have parallel processing capability by giving the first permutation routing algorithm, and shows that the number of routing steps required is equal to the diameter of the network, the best bound one can get.

...read moreread less

Abstract: Double-loop networks are popular architectures for interconnecting networks. We show that these networks have parallel processing capability by giving the first permutation routing algorithm. Furthermore, we show that the number of routing steps required is equal to the diameter of the network, the best bound one can get.

...read moreread less

12 citations

Journal Article•10.1142/S0129626497000267•

Pure Greedy Hot-Potato in the 2-D Mesh with Random Destinations

[...]

Paul G. Spirakis¹, Vassilis Triantafillou¹•Institutions (1)

Paul G. Spirakis, Vassilis Triantafillou

TL;DR: In this paper, a pure greedy hot-potato routing strategy on a two-dimensional mesh of n2 nodes is analyzed, where each packet attempts to follow the shortest path leading first to the destination row/column and then to the actual destination node.

...read moreread less

Abstract: We analyze here a pure greedy hot-potato routing strategy on a two-dimensional mesh of n2 nodes. We specifically study the case of n2 packets, originating one per node, to be delivered at random uniform destinations. Each packet attempts to follow the shortest path leading first to the destination row/column (whichever is closest) and then to the actual destination node. A deflection policy ia adopted to solve conflicts. We prove that all packets are delivered to the destinations in average time O(nlogn). The average is taken over all possible destination functions. No average case analysis of pure greedy hot-potato routing was known up to now.

...read moreread less

10 citations

Journal Article•

Pure greedy hot-potato routing in the 2-D mesh with random destinations

[...]

01 Jan 1997-Parallel Processing Letters

TL;DR: It is proved that all packets are delivered to the destinations in average time O(nlogn), the average is taken over all possible destination functions.

...read moreread less

Abstract: We analyze here a pure greedy hot-potato routing strategy on a two-dimensional mesh of n 2 nodes. We specifically study the case of n 2 packets, originating one per node, to be delivered at random uniform destinations. Each packet attempts to follow the shortest path leading first to the destination row/column (whichever is closest) and then to the actual destination node. A deflection policy is adopted to solve conflicts. We prove that all packets are delivered to the destinations in average time O(nlogn). The average is taken over all possible destination functions. No average case analysis of pure greedy hot-potato routing was known up to now.

...read moreread less

10 citations

Journal Article•10.1142/S0129626497000115•

Competitive Dynamic Multiprocessor Allocation for Parallel Applications

[...]

Tim Brecht¹, Xiaotie Deng¹, Nian Gu¹•Institutions (1)

Keele University¹

TL;DR: The approach of competitive analysis is applied to compare preemptive scheduling policies, and is interested in determining which policy achieves the best competitive ratio (i.e., is within the smallest constant factor of optimal).

...read moreread less

Abstract: We study dynamic multiprocessor allocation policies for parallel jobs, which allow the preemption and reallocation of processors to take place at any time. The objective is to minimize the completion time of the last job to finish executing (the makespan). We characterize a parallel job using two parameter. The job's parallelism, Pi, which is the number of tasks being executed in parallel by a job, and its execution time, li, when Pi processors are allocated to the job. The only information available to the scheduler is the parallelism of jobs. The job execution time is not known to the scheduler until the job's execution is completed. We apply the approach of competitive analysis to compare preemptive scheduling policies, and are interested in determining which policy achieves the best competitive ratio (i.e., is within the smallest constant factor of optimal). We devise an optimal competitive scheduling policy for scheduling two parallel jobs on P processors. Then, we apply the method to schedule N parallel jobs on P processors. Finally we extend our work to incorporate jobs for which the number of parallel tasks changes during execution (i.e., jobs with multiple phases of parallelism).

...read moreread less

Journal Article•10.1142/S0129626497000425•

High Performance Fortran, Version 2

[...]

Robert Schreiber¹•Institutions (1)

Hewlett-Packard¹

TL;DR: This paper introduces the ideas that underly the data-parallel language High Performance Fortran (HPF) and the new ideas in version 2 of HPF and reviews HPF's key language elements.

...read moreread less

Abstract: This paper introduces the ideas that underly the data-parallel language High Performance Fortran (HPF) and the new ideas in version 2 of HPF. It first reviews HPF's key language elements. It discusses the meaning of data parallelism and the limitations of HPF version 1 as a data-parallel programming language. The second part of the paper is a review of the development of version 2 of HPF. The extended language, under development in 1996, includes a richer data mapping capability; an extension to the independent loop that allows reduction operations in the loop range; a means for directing the mapping of computation as well as data; and a way to specify concurrent execution of several parallel tasks on disjoint subsets of processors.

...read moreread less

Journal Article•10.1142/S0129626497000085•

Fault-Tolerant Parallel Communication in the Star Network

[...]

Adele A. Rescigno¹•Institutions (1)

University of Salerno¹

École normale supérieure de Lyon¹

TL;DR: Using the Information Dispersal Algorithm (IDA), a fault-tolerant randomised routing algorithm is obtained whose probability of success is 1 - N-Θ(n), where N = n! is the number of nodes of the star graph Sn.

...read moreread less

Abstract: In this paper we study the problem of fault-tolerant parallel routing in the star network, i.e., we assume that all processors send packets according to the prescribed protocol but some packets may fail to reach (on time) their destination. Using the Information Dispersal Algorithm (IDA) we obtain a fault-tolerant randomised routing algorithm whose probability of success is 1 - N-Θ(n), where N = n! is the number of nodes of the star graph Sn.

...read moreread less

Journal Article•10.1142/S0129626497000152•

Parallelizing Nested Loops with Approximation of Distance Vectors: A Survey

[...]

Alain Darte¹, Frédéric Vivien¹•Institutions (1)

TL;DR: This study identifies which algorithm is the most suitable for a given representation of distance vectors for nested loops parallelization.

...read moreread less

Abstract: In this paper, we compare three nested loops parallelization algorithms (Allen and Kennedy's algorithm, Wolf and Lam's algorithm and Darte and Vivien's algorithm) that use different representations of distance vectors as input. We study the optimality of each with respect to the dependence analysis it uses. We propose well-chosen examples that illustrate the power and limitations of the three algorithms. This study identifies which algorithm is the most suitable for a given representation of distance vectors.

...read moreread less

Journal Article•10.1142/S0129626497000188•

Modular Mappings and Data Distribution Independent Computations

[...]

Hyuk-Jae Lee¹, José A. B. Fortes²•Institutions (2)

Louisiana Tech University¹, Purdue University²

Louisiana State University¹

TL;DR: This paper considers the problem of writing data distribution independent (DDI) programs in order to eliminate or reduce initial data redistribution overheads for distributed memory parallel computers.

...read moreread less

Abstract: This paper considers the problem of writing data distribution independent (DDI) programs in order to eliminate or reduce initial data redistribution overheads for distributed memory parallel computers. The functionality and execution time of DDI programs are independent of initial data distributions. Modular mappings, which can be used to derive many equally optimal and functionally equivalent programs, are briefly reviewed. Relations between modular mappings and input data distributions are then established. These relations are the basis of a systematic approach to the derivation of DDI programs which is illustrated for matrix-matrix multiplication (c = a × b). Conditions of data distributions for which it is possible to find a modular mapping that yields a programa as efficient as Cannon's algorithm are: (1) the first row of the inverse of pattern distribution of array 'a' should be equal to be equal to the second row of the inverse of pattern distribution of array 'b', (2) the second row of the inverse of pattern distribution of array 'a' should be linearly independent of the first row of the inverse of pattern distribution of array 'b', and (3) each pattern distribution of arrays 'a', 'b', and 'c' should have at least one zero entry, respectively.

...read moreread less

Journal Article•10.1142/S0129626497000206•

Communication Generation for Block-Cyclic Distributions

[...]

Arun Venkatachar¹, J. Ramanujam¹, Ashwath Thirumalai¹•Institutions (1)

TL;DR: A novel approach for the generation of communication sets that exploits a pttern of send-receive index pairs is presented and an algorithm for code generation is presented.

...read moreread less

Abstract: Data-parallel languages such as High Performance Fortran, Vienna Fortran and Fortran D include directives such as alignment and distribution that describe how data and computation are mapped onto the processors in a distributed-memory multiprocessor. A compiler for HPF that generates code for each processor has to compute the sequence of local memory addresses accessed by each processor and the sequence of send and receives for a given processor to access non-local data. In this paper, we present a novel approach for the generation of communication sets that exploits a pttern of send-receive index pairs. In addition, we present an algorithm for code generation. Experimental results demonstrate the viability of this technique.

...read moreread less

Journal Article•10.1142/S012962649700005X•

Simple and Work-Efficient Parallel Algorithms for the Minimum Spanning Tree Problem

[...]

Christos D. Zaroliagis¹•Institutions (1)

Max Planck Society¹

Kalyani Government Engineering College¹

TL;DR: Two Simple and work-efficient parallel algorithms for the minimum spanning tree problem are presented and both perform O(m log n) work.

...read moreread less

Abstract: Two Simple and work-efficient parallel algorithms for the minimum spanning tree problem are presented. Both algorithms perform O(m log n) work. The first algorithm runs in O(log2 n) time on an EREW PRAM, while the second algorithm runs in O(log n) time on a COMMON CRCW PRAM.

...read moreread less

Journal Article•10.1142/S0129626497000048•

Fast Parallel Multiplication Using Redundant Quarternary Number System

[...]

Mallika De¹, Bhabani P. Sinha•Institutions (1)

University of Illinois at Urbana–Champaign¹

TL;DR: The number of computational elements of an m-digit multiplier based on the proposed algorithm is O(m2).

...read moreread less

Abstract: In this paper, we propose a high-speed VLSI multiplication scheme using redundant radix-4 representation of numbers. For m-digit by m-digit redundant radix-4 integer multiplication, we first generate m partial products, each of (m+1) digits in redundant radix-4 (RR-4) number system. These partial products are then added up four at a time by means of redundant quarternary adders. Parallel addition of four (m+1)-digit redundant radix-4 numbers can be performed in a constant time independent of m without any carry propagation. With these adders, multiplication of two m-digit numbers in RR-4 number system can be performed in ⌈(1/2)log2 m ⌉ + 1 steps of such additions of four RR-4 numbers. The number of computational elements of an m-digit multiplier based on the proposed algorithm is O(m2). Since the multiplier has a regular cellular array structure, it is suitable for VLSI implementation with O(m2 log m) AT-value.

...read moreread less

Journal Article•10.1142/S0129626497000413•

Compiling for scalable multiprocessors with polaris

[...]

Yunheung Paek¹, David Padua¹•Institutions (1)

TL;DR: This paper uses Polaris, a parallelizing Fortran restructurer developed at Illinois, as the infrastructure to implement algorithms and discusses the development and implementation of a few compiler techniques for some of these transformations.

...read moreread less

Abstract: Due to the complexity of programming scalable multiprocessors with physically distributed memories, it is onerous to manually generate parallel code for these machines. As a consequense, there has been much research on the development of compiler techniques to simplify programming, to increase reliability, and to reduce development costs. For code generation, a compiler applies a number of transformations in areas such as data privatization, data copying and replication, synchronization, and data and work distribution. In this paper, we discuss our recent work on the development and implementation of a few compiler techniques for some of these transformations. We use Polaris, a parallelizing Fortran restructurer developed at Illinois, as the infrastructure to implement our algorithms. The paper includes experimental results obtained by applying our techniques to several benchmark codes.

...read moreread less

Journal Article•10.1142/S0129626497000036•

A simple optimal parallel algorithm for reporting paths in a tree

[...]

Andrzej Lingas¹, Anil Maheshwari²•Institutions (2)

Lund University¹, Carleton University²

TL;DR: This work provides a simple optimal parallel algorithm for preprocessing the input tree such that the path queries can be answered efficiently and report the path between a single pair of distinct nodes in O(log n) time using O(L/ log n) processors.

...read moreread less

Abstract: We present optimal parallel solutions to reporting paths between pairs of nodes in an n-node tree. Our algorithms are deterministic and designed to run on an exclusive read exclusive write parallel random-access machine (EREW PRAM). In particular, we provide a simple optimal parallel algorithm for preprocessing the input tree such that the path queries can be answered efficiently. Our algorithm for preprocessing runs in O(log n) time using O(n/log n) processors. Using the preprocessing, we can report paths between k node pairs in O(log n + log k) time using O(k + (n + S)/log n) processors on an EREW PRAM, where S is the size of the output. In particular, we can report the path between a single pair of distinct nodes in O(log n) time using O(L/log n) processors, where L denotes the length of the path.

...read moreread less

Journal Article•10.1142/S0129626497000231•

Deterministic Routing on the Array with Reconfigurable Optimal Buses

[...]

Sanguthevar Rajasekaran¹, Sartaj Sahni¹•Institutions (1)

University of Florida¹

TL;DR: In this paper, efficient deterministic algorithms for various classes of routing problems on the array with reconfigurable optical buses (AROB) are presented.

...read moreread less

Abstract: In this paper we present efficient deterministic algorithms for various classes of routing problems on the array with reconfigurable optical buses (AROB).

...read moreread less

Journal Article•10.1142/S0129626497000255•

A Routing Strategy for Object-Oriented Applications in Massively Parallel Architectures

[...]

Maurelio Boari¹, Antonioo Corradi¹, Cesare Stefanelli¹, Letizia Leonardi•Institutions (1)

University of Bologna¹

Nadjib Badache, Aomar Maddi

TL;DR: This paper adopted a routing strategy designed to be effective in case of objects dynamically created/destroyed and capable of moving during the execution, and does not assume any knowledge of both object allocation and system topology configuration.

...read moreread less

Abstract: Parallel object-oriented environments have a high degree of dynamicity and need specialised support to achieve efficiency of execution. Static strategies are not suitable for these environments: any prediction before execution can only roughly estimate the real behaviour. In object-oriented environments, the decision to create/destroy objects is usually taken at run-time and object allocation can change during the execution. The requirement of dynamicity should be considered in the design of every component of the support. The routing system, for instance, must ensure delivery even in case of object dynamic allocation/reallocation. The paper argues that routing algorithms for parallel object-oriented environments in massively parallel architectures should be both adaptive and efficient. We adopted a routing strategy designed to be effective in case of objects dynamically created/destroyed and capable of moving during the execution. Our adaptive strategy does not assume any knowlegde of both object allocation and system topology configuration.

...read moreread less

Journal Article•10.1142/S0129626497000322•

Gradual Design of a Causal Broadcast Protocol

[...]

French Institute for Research in Computer Science and Automation¹

TL;DR: This paper presents a gradual approach to designing a protocol to implement causal ordering in the particular case of a broadcast group and obtains a simple protocol that has low communication overhead.

...read moreread less

Abstract: This paper presents a gradual approach to designing a protocol to implement causal ordering in the particular case of a broadcast group. Each message is received by all the processes of the group, including its sender. The protocol we obtain is simple and has low communication overhead.

...read moreread less

Journal Article•10.1142/s0129626497000152•

Parallelizing Nested Loops with Approximations of Distance Vectors: A Survey

[...]

A. Darte, Frédéric Vivien¹•Institutions (1)

TL;DR: This study identifies which algorithm is the most suitable for a given representation of distance vectors for nested loops parallelization.

...read moreread less

Journal Article•10.1142/S0129626497000097•

Efficient Byzantine Agreement in Networks with Random Faults

[...]

Adam Malinowski¹•Institutions (1)

University of Warsaw¹

Loyola University Chicago¹

TL;DR: This paper considers the Byzantine Agreement problem under the assumption that nodes and links of a synchronous network fail independently with constant probabilities p 0 an arbitrary constant.

...read moreread less

Abstract: We consider the Byzantine Agreement problem under the assumption that nodes and links of a synchronous network fail independently with constant probabilities p 0 an arbitrary constant.

...read moreread less

Journal Article•10.1142/S0129626497000280•

Parallel Algorithms for Single-Layer Channel Routing

[...]

Ronald I. Greenberg¹, Shih-Chuan Hung, Jau-Der Shih•Institutions (1)

École normale supérieure de Lyon¹

TL;DR: An efficient parallel algorithms for the minimum separation, offset range, and optimal offset problems for single-layer channel routing and an even better time of O((lg lg N)2) on the CRCW PRAM in the river routing context is obtained.

...read moreread less

Abstract: We provide efficient parallel algorithms for the minimum separation, offset range, and optimal offset problems for single-layer channel routing. We consider all the variations of these problems that are known to have linear-time sequential solutions rather than limiting attention to the "river-routing" context, where single-sided connections are disallowed. For the minimum separation problem, we obtain O(lg N) time on a CREW PRAM or time on a (common) CRCW PRAM, both with optimal work (processor-time product) of O(N), where $N$ is the number of terminals. For the offset range problem, we obtain the same time and processor bounds as long as only one side of the channel contains single-sided nets. For the optimal offset problem with single-sided nets on one side of the channel, we obtain time O(lg N lg lg N) on a CREW PRAM or time on a CRCW PRAM with O(N lg lg N) work. Not only does this improve on previous results for river routing, but we can obtain an even better time of O((lg lg N)2) on the CRCW PRAM in the river routing context. In addition, wherever our results allow a channel boundary to contain single-sided nets, the results also apply when that boundary is ragged and N incorporates the number of bendpoints.

...read moreread less

Journal Article•10.1142/S0129626497000383•

Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling

[...]

Alain Darte¹, Georges-André Silber¹, Frédéric Vivien¹•Institutions (1)

Seoul National University¹

TL;DR: This paper demonstrates how the structure of the reduced dependence graph can be taken into account for detecting more permutable loops and shows how the way it is handled can be useful for fine-grain loop parallelization as well.

...read moreread less

Abstract: Tiling is a technique used for exploiting medium-grain parallelism in nested loops. It relies on a first step that detects sets of permutable nested loops. All algorithms developed so far consider the statements of the loop body as a single block, in other words, they are not able to take advantage of the structure of dependences between different statements. In this paper, we overcame this limitation by showing how the structure of the reduced dependence graph can be taken into account for detecting more permutable loops. Our method combines graph retiming techniques and graph scheduling techniques. It can be viewed as an extension of Wolf and Lam's algorithm to the case of loops with multiple statements. Loan independent dependences play a particular role in our study, and we show how the way we handle them can be useful for fine-grain loop parallelization as well.

...read moreread less

Journal Article•10.1142/s0129626497000413•

Compiling for Scalable Multiprocessors with Polaris

[...]

Yunheung Paek¹, D. Padua•Institutions (1)

University of New England (Australia)¹

TL;DR: Polaris, a parallelizing Fortran restructurer developed at Illinois, is used as the infrastructure to implement the development and implementation of a few compiler techniques for some of these transformations.

...read moreread less

Journal Article•10.1142/S0129626497000371•

Optimal and Near–Optimal Solutions for Hard Compilation Problems

[...]

Ulrich Kremer¹•Institutions (1)

Rutgers University¹

01 Jan 1997-Parallel Processing Letters

TL;DR: The potential benefits of integer programming as a tool to deal with NP–complete compiler optimization formulations in compilers and programming environments is discussed.

...read moreread less

Abstract: An optimizing compiler typically uses multiple program representations at different levels of program and performance abstractions in order to be able to perform transformations that – at least in the majority of cases – will lead to an overall improvement in program performance. The complexities of the program and performance abstractions used to formulate compiler optimization problems have to match the complexities of the high–level programming model and of the underlying target system. Scalable parallel systems typically have multi–level memory hierarchies and able to exploit coarse–grain and fine–grain parallelism. Most likely, future systems will have even deeper memory hierarchies and more granularities of parallelism. As a result, future compiler optimizations will have to use more and more complex, multi–level computation and performance models in order to keep up with the complexities of their future target systems. Most of the optimization problems encountered in highly optimizing compilers are already NP–hard, and there is little hope that most newly encountered optimization formulations will not be at least NP–hard as well. To face this "complexity crisis", new methods are needed to evaluate the benefits of a compiler optimization formulation. A crucial step in this evaluation process is to compute the optimal solution of the formulation. Using ad–hoc methods to compute optimal solutions to NP–complete problems may be prohibitively expensive. Recent improvements in mixed integer and 0–1 integer programming suggest that this technology may provide the key to efficient, optimal and near–optimal solutions to NP–complete compiler optimization problems. In fact, early results indicate that integer programming formulations may be efficient enough to be included in not only evaluation prototypes, but in production programming environments or even production compilers. This paper discusses the potential benefits of integer programming as a tool to deal with NP–complete compiler optimization formulations in compilers and programming environments.

...read moreread less

Journal Article•10.1142/S0129626497000401•

On Tiling as a Loop Transformation

[...]

Jingling Xue¹•Institutions (1)