Scispace (Formerly Typeset)
  1. Home
  2. Journals
  3. Parallel Processing Letters
  4. 1997
  1. Home
  2. Journals
  3. Parallel Processing Letters
  4. 1997
Showing papers in "Parallel Processing Letters in 1997"
Journal Article•10.1142/S0129626497000176•
On Optimizing a Class of Multi-Dimensional Loops with Reduction for Parallel Execution

[...]

Chi-Chung Lam1, P. Sadayappan1, Rephael Wenger1•
Ohio State University1
01 Jun 1997-Parallel Processing Letters
TL;DR: This paper addresses the compile-time optimization of a form of nested-loop computation that is motivated by a computational physics application and a pruning search strategy for determination of an optimal form is developed.
Abstract: This paper addresses the compile-time optimization of a form of nested-loop computation that is motivated by a computational physics application. The computations involve multi-dimensional surface and volume integrals where the integrand is a product of a number of array terms. Besides the issue of optimal distribution of the arrays among the processors, there is also scope for reordering of the operations using the commutativity and associativity properties of addition and multiplication, and the application of the distributive law to significantly reduce the number of operations executed. A formalization of the operation minimization problem and proof of its NP-completeness is provided. A pruning search strategy for determination of an optimal form is developed. An analysis of the communication requirements and a polynomial-time algorithm for determination of optimal distribution of the arrays are also provided.

105 citations

Journal Article•10.1142/S0129626497000073•
On Embedding Cycles in k-Ary n-Cubes

[...]

Yaagoub A. Ashir1, Iain A. Stewart2•
Swansea University1, University of Leicester2
01 Mar 1997-Parallel Processing Letters
TL;DR: This analysis yields an efficient algorithm for generating a cycle of any given length, if indeed one exists, thus answering a question posed by Bose, Broeg, Kwon and Ashir.
Abstract: We completely classify when a k-ary n-cube , for k ≥ 3 and n ≥ 2, contains a cycle of some given length. Our analysis yields an efficient algorithm for generating a cycle of any given length, if indeed one exists, thus answering a question posed by Bose, Broeg, Kwon and Ashir.

44 citations

Journal Article•10.1142/S0129626497000334•
On Self-Stabilizing Wait-Free Clock Synchronization

[...]

Marina Papatriantafilou1, Philippas Tsigas1•
Max Planck Society1
01 Sep 1997-Parallel Processing Letters
TL;DR: This work presents a protocol that achieves quadratic synchronization time, by "re-parameterizing" and improving the best previously known solution, which had cubic synchronization time.
Abstract: A protocol which can tolerate any number of processors failing by ceasing operation for unbounded time and resuming operation (with or) without knowing that they were faulty is called wait-free; if it also works correctly even when the starting state of the system is arbitrary, it is called wait-free, self-stabilizing. This work is on the problem of wait-free, self-stabilizing clock synchronization of n processors in "in-phase" multiprocessor systems and presents a protocol that achieves quadratic synchronization time, by "re-parameterizing" and improving the best previously known solution, which had cubic synchronization time. Both the protocol and its analysis are intuitive and easy to understand.

29 citations

Journal Article•10.1142/S0129626497000218•
Memory Reuse Analysis in the Polyhedral Model

[...]

Doran Wilde, Sanjay Rajopadhye
01 Jun 1997-Parallel Processing Letters
TL;DR: It is shown how the polyhedral model allows us to statically compute the lifetimes of program variables, and thus enables us to derive necessary and sufficient conditions for reusing memory.
Abstract: In the context of developing a compiler for a ALPHA, a functional data-parallel language based on systems of affine recurrence equations (SAREs), we address the problem of transforming scheduled single-assignment code to multiple assignment code. We show how the polyhedral model allows us to statically compute the lifetimes of program variables, and thus enables us to derive necessary and sufficient conditions for reusing memory.

21 citations

Journal Article•10.1142/S0129626497000061•
On the Hardness of Devising Interval Routing Schemes

[...]

Michele Flammini
01 Mar 1997-Parallel Processing Letters
TL;DR: It is proved that the problem of deciding whether there exists a 2-IRS for any network G is NP-complete and this is the first hardness result for k- IRS where k is constant and the graph underlying the network is unweighted.
Abstract: The k-Interval Routing Scheme (k-IRS) is a compact routing scheme on general networks. It has been studied extensively and recently been implemented on the latest generation of the INMOS transputer router chips. In this paper we investigate the time complexity of devising a minimal space k-IRS and we prove that the problem of deciding whether there exists a 2-IRS for any network G is NP-complete. This is the first hardness result for k-IRS where k is constant and the graph underlying the network is unweighted. Moreover, the NP-completeness holds also for linear and strict 2-IRS.

21 citations

Journal Article•10.1142/S0129626497000140•
Array Dataflow Analysis for Explicitly Parallel Programs

[...]

Jean-Francois Collard1, Martin Griebl2•
Centre national de la recherche scientifique1, University of Passau2
01 Jun 1997-Parallel Processing Letters
TL;DR: This analysis departs from previous work because it simultaneously handles both parallel programming paradigms and does not rely on the usual iterative solving process of a set of data flow equations but extends array dataflow analysis based on integer linear programming, thus improving the precision of results.
Abstract: This paper describes a dataflow analysis of array data structures for data-parallel and/or control- (or task-) parallel imperative languages. This analysis departs from previous work because it 1) simultaneously handles both parallel programming paradigms, and 2) does not rely on the usual iterative solving process of a set of data flow equations but extends array dataflow analysis based on integer linear programming, thus improving the precision of results.

14 citations

Journal Article•10.1142/S0129626497000310•
Dictionary Compression on the Pram

[...]

Daniel S. Hirschberg1, Lynn M. Stauffer1•
University of California, Irvine1
01 Sep 1997-Parallel Processing Letters
TL;DR: Parallel algorithms for lossless data compression via dictionary compression using optimal, longest fragment first (LFF), and greedy parsing strategies are described and are practical in the sense that their analysis elicits small constants.
Abstract: Parallel algorithms for lossless data compression via dictionary compression using optimal, longest fragment first (LFF), and greedy parsing strategies are described. Dictionary compression removes redundancy by replacing substrings of the input by references to strings stored in a dictionary. Given a static dictionary stored as a suffix tree, we present a CREW PRAM algorithm for optimal compression which runs in O(M + log M log n) time with O(nM2) processors, where it is assumed that M is the maximum length of any dictionary entry. Under the same model, we give an algorithm for LFF compression which runs in O(log2 n) time with O(n/log n) processors where it is assumed that the maximum dictionary entry is of length O(log n). We also describe an O(M + log n) time and O(n) processor algorithm for greedy parsing given a static or sliding-window dictionary. For sliding-window compression, a different approach finds the greedy parsing in O(log n) time using (O(nM log M/log n) processors. Our algorithms are practical in the sense that their analysis elicits small constants.

14 citations

Journal Article•10.1142/S0129626497000279•
A Permutation Routing Algorithm for Double Loop Networks

[...]

Frank K. Hwang1, Tzai-Shunne Lin1, Rong-Hong Jan1•
National Chiao Tung University1
01 Sep 1997-Parallel Processing Letters
TL;DR: This work shows that double-loop networks have parallel processing capability by giving the first permutation routing algorithm, and shows that the number of routing steps required is equal to the diameter of the network, the best bound one can get.
Abstract: Double-loop networks are popular architectures for interconnecting networks. We show that these networks have parallel processing capability by giving the first permutation routing algorithm. Furthermore, we show that the number of routing steps required is equal to the diameter of the network, the best bound one can get.

12 citations

Journal Article•10.1142/S0129626497000267•
Pure Greedy Hot-Potato in the 2-D Mesh with Random Destinations

[...]

Paul G. Spirakis1, Vassilis Triantafillou1•
Research Academic Computer Technology Institute1
01 Sep 1997-Parallel Processing Letters
TL;DR: In this paper, a pure greedy hot-potato routing strategy on a two-dimensional mesh of n2 nodes is analyzed, where each packet attempts to follow the shortest path leading first to the destination row/column and then to the actual destination node.
Abstract: We analyze here a pure greedy hot-potato routing strategy on a two-dimensional mesh of n2 nodes. We specifically study the case of n2 packets, originating one per node, to be delivered at random uniform destinations. Each packet attempts to follow the shortest path leading first to the destination row/column (whichever is closest) and then to the actual destination node. A deflection policy ia adopted to solve conflicts. We prove that all packets are delivered to the destinations in average time O(nlogn). The average is taken over all possible destination functions. No average case analysis of pure greedy hot-potato routing was known up to now.

10 citations

Journal Article•
Pure greedy hot-potato routing in the 2-D mesh with random destinations

[...]

Paul G. Spirakis, Vassilis Triantafillou
01 Jan 1997-Parallel Processing Letters
TL;DR: It is proved that all packets are delivered to the destinations in average time O(nlogn), the average is taken over all possible destination functions.
Abstract: We analyze here a pure greedy hot-potato routing strategy on a two-dimensional mesh of n 2 nodes. We specifically study the case of n 2 packets, originating one per node, to be delivered at random uniform destinations. Each packet attempts to follow the shortest path leading first to the destination row/column (whichever is closest) and then to the actual destination node. A deflection policy is adopted to solve conflicts. We prove that all packets are delivered to the destinations in average time O(nlogn). The average is taken over all possible destination functions. No average case analysis of pure greedy hot-potato routing was known up to now.

10 citations

Journal Article•10.1142/S0129626497000115•
Competitive Dynamic Multiprocessor Allocation for Parallel Applications

[...]

Tim Brecht1, Xiaotie Deng1, Nian Gu1•
Keele University1
01 Mar 1997-Parallel Processing Letters
TL;DR: The approach of competitive analysis is applied to compare preemptive scheduling policies, and is interested in determining which policy achieves the best competitive ratio (i.e., is within the smallest constant factor of optimal).
Abstract: We study dynamic multiprocessor allocation policies for parallel jobs, which allow the preemption and reallocation of processors to take place at any time. The objective is to minimize the completion time of the last job to finish executing (the makespan). We characterize a parallel job using two parameter. The job's parallelism, Pi, which is the number of tasks being executed in parallel by a job, and its execution time, li, when Pi processors are allocated to the job. The only information available to the scheduler is the parallelism of jobs. The job execution time is not known to the scheduler until the job's execution is completed. We apply the approach of competitive analysis to compare preemptive scheduling policies, and are interested in determining which policy achieves the best competitive ratio (i.e., is within the smallest constant factor of optimal). We devise an optimal competitive scheduling policy for scheduling two parallel jobs on P processors. Then, we apply the method to schedule N parallel jobs on P processors. Finally we extend our work to incorporate jobs for which the number of parallel tasks changes during execution (i.e., jobs with multiple phases of parallelism).
Journal Article•10.1142/S0129626497000425•
High Performance Fortran, Version 2

[...]

Robert Schreiber1•
Hewlett-Packard1
01 Dec 1997-Parallel Processing Letters
TL;DR: This paper introduces the ideas that underly the data-parallel language High Performance Fortran (HPF) and the new ideas in version 2 of HPF and reviews HPF's key language elements.
Abstract: This paper introduces the ideas that underly the data-parallel language High Performance Fortran (HPF) and the new ideas in version 2 of HPF. It first reviews HPF's key language elements. It discusses the meaning of data parallelism and the limitations of HPF version 1 as a data-parallel programming language. The second part of the paper is a review of the development of version 2 of HPF. The extended language, under development in 1996, includes a richer data mapping capability; an extension to the independent loop that allows reduction operations in the loop range; a means for directing the mapping of computation as well as data; and a way to specify concurrent execution of several parallel tasks on disjoint subsets of processors.
Journal Article•10.1142/S0129626497000085•
Fault-Tolerant Parallel Communication in the Star Network

[...]

Adele A. Rescigno1•
University of Salerno1
01 Mar 1997-Parallel Processing Letters
TL;DR: Using the Information Dispersal Algorithm (IDA), a fault-tolerant randomised routing algorithm is obtained whose probability of success is 1 - N-Θ(n), where N = n! is the number of nodes of the star graph Sn.
Abstract: In this paper we study the problem of fault-tolerant parallel routing in the star network, i.e., we assume that all processors send packets according to the prescribed protocol but some packets may fail to reach (on time) their destination. Using the Information Dispersal Algorithm (IDA) we obtain a fault-tolerant randomised routing algorithm whose probability of success is 1 - N-Θ(n), where N = n! is the number of nodes of the star graph Sn.
Journal Article•10.1142/S0129626497000152•
Parallelizing Nested Loops with Approximation of Distance Vectors: A Survey

[...]

Alain Darte1, Frédéric Vivien1•
École normale supérieure de Lyon1
01 Jun 1997-Parallel Processing Letters
TL;DR: This study identifies which algorithm is the most suitable for a given representation of distance vectors for nested loops parallelization.
Abstract: In this paper, we compare three nested loops parallelization algorithms (Allen and Kennedy's algorithm, Wolf and Lam's algorithm and Darte and Vivien's algorithm) that use different representations of distance vectors as input. We study the optimality of each with respect to the dependence analysis it uses. We propose well-chosen examples that illustrate the power and limitations of the three algorithms. This study identifies which algorithm is the most suitable for a given representation of distance vectors.
Journal Article•10.1142/S0129626497000188•
Modular Mappings and Data Distribution Independent Computations

[...]

Hyuk-Jae Lee1, José A. B. Fortes2•
Louisiana Tech University1, Purdue University2
01 Jun 1997-Parallel Processing Letters
TL;DR: This paper considers the problem of writing data distribution independent (DDI) programs in order to eliminate or reduce initial data redistribution overheads for distributed memory parallel computers.
Abstract: This paper considers the problem of writing data distribution independent (DDI) programs in order to eliminate or reduce initial data redistribution overheads for distributed memory parallel computers. The functionality and execution time of DDI programs are independent of initial data distributions. Modular mappings, which can be used to derive many equally optimal and functionally equivalent programs, are briefly reviewed. Relations between modular mappings and input data distributions are then established. These relations are the basis of a systematic approach to the derivation of DDI programs which is illustrated for matrix-matrix multiplication (c = a × b). Conditions of data distributions for which it is possible to find a modular mapping that yields a programa as efficient as Cannon's algorithm are: (1) the first row of the inverse of pattern distribution of array 'a' should be equal to be equal to the second row of the inverse of pattern distribution of array 'b', (2) the second row of the inverse of pattern distribution of array 'a' should be linearly independent of the first row of the inverse of pattern distribution of array 'b', and (3) each pattern distribution of arrays 'a', 'b', and 'c' should have at least one zero entry, respectively.
Journal Article•10.1142/S0129626497000206•
Communication Generation for Block-Cyclic Distributions

[...]

Arun Venkatachar1, J. Ramanujam1, Ashwath Thirumalai1•
Louisiana State University1
01 Jun 1997-Parallel Processing Letters
TL;DR: A novel approach for the generation of communication sets that exploits a pttern of send-receive index pairs is presented and an algorithm for code generation is presented.
Abstract: Data-parallel languages such as High Performance Fortran, Vienna Fortran and Fortran D include directives such as alignment and distribution that describe how data and computation are mapped onto the processors in a distributed-memory multiprocessor. A compiler for HPF that generates code for each processor has to compute the sequence of local memory addresses accessed by each processor and the sequence of send and receives for a given processor to access non-local data. In this paper, we present a novel approach for the generation of communication sets that exploits a pttern of send-receive index pairs. In addition, we present an algorithm for code generation. Experimental results demonstrate the viability of this technique.
Journal Article•10.1142/S012962649700005X•
Simple and Work-Efficient Parallel Algorithms for the Minimum Spanning Tree Problem

[...]

Christos D. Zaroliagis1•
Max Planck Society1
01 Mar 1997-Parallel Processing Letters
TL;DR: Two Simple and work-efficient parallel algorithms for the minimum spanning tree problem are presented and both perform O(m log n) work.
Abstract: Two Simple and work-efficient parallel algorithms for the minimum spanning tree problem are presented. Both algorithms perform O(m log n) work. The first algorithm runs in O(log2 n) time on an EREW PRAM, while the second algorithm runs in O(log n) time on a COMMON CRCW PRAM.
Journal Article•10.1142/S0129626497000048•
Fast Parallel Multiplication Using Redundant Quarternary Number System

[...]

Mallika De1, Bhabani P. Sinha•
Kalyani Government Engineering College1
01 Mar 1997-Parallel Processing Letters
TL;DR: The number of computational elements of an m-digit multiplier based on the proposed algorithm is O(m2).
Abstract: In this paper, we propose a high-speed VLSI multiplication scheme using redundant radix-4 representation of numbers. For m-digit by m-digit redundant radix-4 integer multiplication, we first generate m partial products, each of (m+1) digits in redundant radix-4 (RR-4) number system. These partial products are then added up four at a time by means of redundant quarternary adders. Parallel addition of four (m+1)-digit redundant radix-4 numbers can be performed in a constant time independent of m without any carry propagation. With these adders, multiplication of two m-digit numbers in RR-4 number system can be performed in ⌈(1/2)log2 m ⌉ + 1 steps of such additions of four RR-4 numbers. The number of computational elements of an m-digit multiplier based on the proposed algorithm is O(m2). Since the multiplier has a regular cellular array structure, it is suitable for VLSI implementation with O(m2 log m) AT-value.
Journal Article•10.1142/S0129626497000413•
Compiling for scalable multiprocessors with polaris

[...]

Yunheung Paek1, David Padua1•
University of Illinois at Urbana–Champaign1
01 Dec 1997-Parallel Processing Letters
TL;DR: This paper uses Polaris, a parallelizing Fortran restructurer developed at Illinois, as the infrastructure to implement algorithms and discusses the development and implementation of a few compiler techniques for some of these transformations.
Abstract: Due to the complexity of programming scalable multiprocessors with physically distributed memories, it is onerous to manually generate parallel code for these machines. As a consequense, there has been much research on the development of compiler techniques to simplify programming, to increase reliability, and to reduce development costs. For code generation, a compiler applies a number of transformations in areas such as data privatization, data copying and replication, synchronization, and data and work distribution. In this paper, we discuss our recent work on the development and implementation of a few compiler techniques for some of these transformations. We use Polaris, a parallelizing Fortran restructurer developed at Illinois, as the infrastructure to implement our algorithms. The paper includes experimental results obtained by applying our techniques to several benchmark codes.
Journal Article•10.1142/S0129626497000036•
A simple optimal parallel algorithm for reporting paths in a tree

[...]

Andrzej Lingas1, Anil Maheshwari2•
Lund University1, Carleton University2
01 Mar 1997-Parallel Processing Letters
TL;DR: This work provides a simple optimal parallel algorithm for preprocessing the input tree such that the path queries can be answered efficiently and report the path between a single pair of distinct nodes in O(log n) time using O(L/ log n) processors.
Abstract: We present optimal parallel solutions to reporting paths between pairs of nodes in an n-node tree. Our algorithms are deterministic and designed to run on an exclusive read exclusive write parallel random-access machine (EREW PRAM). In particular, we provide a simple optimal parallel algorithm for preprocessing the input tree such that the path queries can be answered efficiently. Our algorithm for preprocessing runs in O(log n) time using O(n/log n) processors. Using the preprocessing, we can report paths between k node pairs in O(log n + log k) time using O(k + (n + S)/log n) processors on an EREW PRAM, where S is the size of the output. In particular, we can report the path between a single pair of distinct nodes in O(log n) time using O(L/log n) processors, where L denotes the length of the path.
Journal Article•10.1142/S0129626497000231•
Deterministic Routing on the Array with Reconfigurable Optimal Buses

[...]

Sanguthevar Rajasekaran1, Sartaj Sahni1•
University of Florida1
01 Sep 1997-Parallel Processing Letters
TL;DR: In this paper, efficient deterministic algorithms for various classes of routing problems on the array with reconfigurable optical buses (AROB) are presented.
Abstract: In this paper we present efficient deterministic algorithms for various classes of routing problems on the array with reconfigurable optical buses (AROB).
Journal Article•10.1142/S0129626497000255•
A Routing Strategy for Object-Oriented Applications in Massively Parallel Architectures

[...]

Maurelio Boari1, Antonioo Corradi1, Cesare Stefanelli1, Letizia Leonardi•
University of Bologna1
01 Sep 1997-Parallel Processing Letters
TL;DR: This paper adopted a routing strategy designed to be effective in case of objects dynamically created/destroyed and capable of moving during the execution, and does not assume any knowledge of both object allocation and system topology configuration.
Abstract: Parallel object-oriented environments have a high degree of dynamicity and need specialised support to achieve efficiency of execution. Static strategies are not suitable for these environments: any prediction before execution can only roughly estimate the real behaviour. In object-oriented environments, the decision to create/destroy objects is usually taken at run-time and object allocation can change during the execution. The requirement of dynamicity should be considered in the design of every component of the support. The routing system, for instance, must ensure delivery even in case of object dynamic allocation/reallocation. The paper argues that routing algorithms for parallel object-oriented environments in massively parallel architectures should be both adaptive and efficient. We adopted a routing strategy designed to be effective in case of objects dynamically created/destroyed and capable of moving during the execution. Our adaptive strategy does not assume any knowlegde of both object allocation and system topology configuration.
Journal Article•10.1142/S0129626497000322•
Gradual Design of a Causal Broadcast Protocol

[...]

Nadjib Badache, Aomar Maddi
01 Sep 1997-Parallel Processing Letters
TL;DR: This paper presents a gradual approach to designing a protocol to implement causal ordering in the particular case of a broadcast group and obtains a simple protocol that has low communication overhead.
Abstract: This paper presents a gradual approach to designing a protocol to implement causal ordering in the particular case of a broadcast group. Each message is received by all the processes of the group, including its sender. The protocol we obtain is simple and has low communication overhead.
Journal Article•10.1142/s0129626497000152•
Parallelizing Nested Loops with Approximations of Distance Vectors: A Survey

[...]

A. Darte, Frédéric Vivien1•
French Institute for Research in Computer Science and Automation1
01 Jun 1997-Parallel Processing Letters
TL;DR: This study identifies which algorithm is the most suitable for a given representation of distance vectors for nested loops parallelization.
Journal Article•10.1142/S0129626497000097•
Efficient Byzantine Agreement in Networks with Random Faults

[...]

Adam Malinowski1•
University of Warsaw1
01 Mar 1997-Parallel Processing Letters
TL;DR: This paper considers the Byzantine Agreement problem under the assumption that nodes and links of a synchronous network fail independently with constant probabilities p 0 an arbitrary constant.
Abstract: We consider the Byzantine Agreement problem under the assumption that nodes and links of a synchronous network fail independently with constant probabilities p 0 an arbitrary constant.
Journal Article•10.1142/S0129626497000280•
Parallel Algorithms for Single-Layer Channel Routing

[...]

Ronald I. Greenberg1, Shih-Chuan Hung, Jau-Der Shih•
Loyola University Chicago1
01 Sep 1997-Parallel Processing Letters
TL;DR: An efficient parallel algorithms for the minimum separation, offset range, and optimal offset problems for single-layer channel routing and an even better time of O((lg lg N)2) on the CRCW PRAM in the river routing context is obtained.
Abstract: We provide efficient parallel algorithms for the minimum separation, offset range, and optimal offset problems for single-layer channel routing. We consider all the variations of these problems that are known to have linear-time sequential solutions rather than limiting attention to the "river-routing" context, where single-sided connections are disallowed. For the minimum separation problem, we obtain O(lg N) time on a CREW PRAM or time on a (common) CRCW PRAM, both with optimal work (processor-time product) of O(N), where $N$ is the number of terminals. For the offset range problem, we obtain the same time and processor bounds as long as only one side of the channel contains single-sided nets. For the optimal offset problem with single-sided nets on one side of the channel, we obtain time O(lg N lg lg N) on a CREW PRAM or time on a CRCW PRAM with O(N lg lg N) work. Not only does this improve on previous results for river routing, but we can obtain an even better time of O((lg lg N)2) on the CRCW PRAM in the river routing context. In addition, wherever our results allow a channel boundary to contain single-sided nets, the results also apply when that boundary is ragged and N incorporates the number of bendpoints.
Journal Article•10.1142/S0129626497000383•
Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling

[...]

Alain Darte1, Georges-André Silber1, Frédéric Vivien1•
École normale supérieure de Lyon1
01 Dec 1997-Parallel Processing Letters
TL;DR: This paper demonstrates how the structure of the reduced dependence graph can be taken into account for detecting more permutable loops and shows how the way it is handled can be useful for fine-grain loop parallelization as well.
Abstract: Tiling is a technique used for exploiting medium-grain parallelism in nested loops. It relies on a first step that detects sets of permutable nested loops. All algorithms developed so far consider the statements of the loop body as a single block, in other words, they are not able to take advantage of the structure of dependences between different statements. In this paper, we overcame this limitation by showing how the structure of the reduced dependence graph can be taken into account for detecting more permutable loops. Our method combines graph retiming techniques and graph scheduling techniques. It can be viewed as an extension of Wolf and Lam's algorithm to the case of loops with multiple statements. Loan independent dependences play a particular role in our study, and we show how the way we handle them can be useful for fine-grain loop parallelization as well.
Journal Article•10.1142/s0129626497000413•
Compiling for Scalable Multiprocessors with Polaris

[...]

Yunheung Paek1, D. Padua•
Seoul National University1
01 Dec 1997-Parallel Processing Letters
TL;DR: Polaris, a parallelizing Fortran restructurer developed at Illinois, is used as the infrastructure to implement the development and implementation of a few compiler techniques for some of these transformations.
Journal Article•10.1142/S0129626497000371•
Optimal and Near–Optimal Solutions for Hard Compilation Problems

[...]

Ulrich Kremer1•
Rutgers University1
01 Jan 1997-Parallel Processing Letters
TL;DR: The potential benefits of integer programming as a tool to deal with NP–complete compiler optimization formulations in compilers and programming environments is discussed.
Abstract: An optimizing compiler typically uses multiple program representations at different levels of program and performance abstractions in order to be able to perform transformations that – at least in the majority of cases – will lead to an overall improvement in program performance. The complexities of the program and performance abstractions used to formulate compiler optimization problems have to match the complexities of the high–level programming model and of the underlying target system. Scalable parallel systems typically have multi–level memory hierarchies and able to exploit coarse–grain and fine–grain parallelism. Most likely, future systems will have even deeper memory hierarchies and more granularities of parallelism. As a result, future compiler optimizations will have to use more and more complex, multi–level computation and performance models in order to keep up with the complexities of their future target systems. Most of the optimization problems encountered in highly optimizing compilers are already NP–hard, and there is little hope that most newly encountered optimization formulations will not be at least NP–hard as well. To face this "complexity crisis", new methods are needed to evaluate the benefits of a compiler optimization formulation. A crucial step in this evaluation process is to compute the optimal solution of the formulation. Using ad–hoc methods to compute optimal solutions to NP–complete problems may be prohibitively expensive. Recent improvements in mixed integer and 0–1 integer programming suggest that this technology may provide the key to efficient, optimal and near–optimal solutions to NP–complete compiler optimization problems. In fact, early results indicate that integer programming formulations may be efficient enough to be included in not only evaluation prototypes, but in production programming environments or even production compilers. This paper discusses the potential benefits of integer programming as a tool to deal with NP–complete compiler optimization formulations in compilers and programming environments.
Journal Article•10.1142/S0129626497000401•
On Tiling as a Loop Transformation

[...]

Jingling Xue1•
University of New England (Australia)1
01 Dec 1997-Parallel Processing Letters
TL;DR: The results of this paper are discussed in terms of their impact on dependence abstractions suitable for legality test and on tiling to optimise a certain given goal.
Abstract: This paper is a follow-up Irigoin and Triolet's earlier work and our recent work on tiling. In this paper, tiling is discussed in terms of its effects on the dependences between tiles, the dependences within a tile and the required dependence test for legality. A necessary and sufficient condition is given for enforcing the data dependences of the program, while Irigion and Triolet's atomic tile constraint is only sufficient. A condition is identified under which both Irigoin and Triolet's and our constraints are equivalent. The results of this paper are discussed in terms of their impact on dependence abstractions suitable for legality test and on tiling to optimise a certain given goal.

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve