Top 292 papers published in the topic of Distributed memory in 2002

Showing papers on "Distributed memory published in 2002"

Patent•

Memory protection system and method for computer architecture for broadband networks

[...]

Masakazu Suzuoki¹, Takeshi Yamazaki¹•Institutions (1)

Sony Computer Entertainment¹

19 Mar 2002

179 citations

Patent•

Method and apparatus for optimizing performance in a multi-processing system

[...]

David J. Koenen¹•Institutions (1)

Hewlett-Packard¹

25 Jul 2002

TL;DR: In this article, a technique for improving performance in a multi-processor system by reducing access latency by correlating processor, node and memory allocation is presented, where a Process/Thread Scheduler is modified such that system mapping and node proximity tables may be referenced to help determine processor assignments for ready-to-run processes/threads.

...read moreread less

Abstract: A technique for improving performance in a multi-processor system by reducing access latency by correlating processor, node and memory allocation. Specifically, a Process/Thread Scheduler is modified such that system mapping and node proximity tables may be referenced to help determine processor assignments for ready-to-run processes/threads. Processors are chosen to minimize access latency. Further, the Page Fault Handler is modified such that free memory pages are assigned to a process based partially on the proximity of the memory with respect to the processor requesting memory allocation.

...read moreread less

155 citations

Patent•

Memory adapted to provide dedicated and or shared memory to multiple processors and method therefor

[...]

Eugene P. Matter¹, Ramkarthik Ganesan¹•Institutions (1)

Intel¹

13 Nov 2002

TL;DR: In this article, a portable communication device may have multiple processors and a memory, and some portions of the memory may only be accessible by one of the processors, while others are accessible by all the processors.

...read moreread less

Abstract: Briefly, in accordance with one embodiment of the invention, a portable communication device may have multiple processors and a memory. Portions of the memory may only be accessible by one of the processors.

...read moreread less

130 citations

Proceedings Article•10.1145/513918.513974•

Exploiting shared scratch pad memory space in embedded multiprocessor systems

[...]

Mahmut Kandemir¹, J. Ramanujam², Alok Choudhary³•Institutions (3)

Pennsylvania State University¹, Louisiana State University², Northwestern University³

10 Jun 2002

TL;DR: An optimization algorithm is proposed that targets the reduction of extra off-chip memory accesses caused by inter-processor communication by increasing the application-wide reuse of data that resides in the scratch-pad memories of processors.

...read moreread less

Abstract: In this paper, we present a compiler strategy to optimize data accesses in regular array-intensive applications running on embedded multiprocessor environments. Specifically, we propose an optimization algorithm that targets the reduction of extra off-chip memory accesses caused by inter-processor communication. This is achieved by increasing the application-wide reuse of data that resides in the scratch-pad memories of processors. Our experimental results obtained on four array-intensive image processing applications indicate that exploiting inter-processor data sharing can reduce the energy-delay product by as much as 33.8% (and 24.3% on average) on a four-processor embedded system. The results also show that the proposed strategy is robust in the sense that it gives consistently good results over a wide range of several architectural parameters.

...read moreread less

99 citations

Book•

Fundamentals of Parallel Processing

[...]

L. E. Jordan, Gita Alaghband

26 Aug 2002

TL;DR: This book gives readers a fundamental understanding of parallel processing application and system development and provides them with the level of understanding they need to evaluate and select the products.

...read moreread less

Abstract: From the Publisher: Rapid changes in the field of parallel processing make this book especially important for professionals who are faced daily with new productsand provides them with the level of understanding they need to evaluate and select the products. It gives readers a fundamental understanding of parallel processing application and system development. Chapter topics include parallel machines and computations, potential for parallel computations, vector algorithms and architectures, MIMD computers and multiprocessors, distributed memory processors, interconnection networks, data dependence and parallelism, implementing synchronization and data sharing, parallel processor performance, temporal behavior of parallel programs, and parallel I/O. For computational scientists, software engineers, computer architects, and computer engineers.

...read moreread less

88 citations

Journal Article•10.1002/CPE.701•

Deadlock detection in MPI programs

[...]

Glenn R. Luecke¹, Yan Zou¹, James Coyle¹, Jim Hoekstra¹, Marina Kraeva¹ - Show less +1 more•Institutions (1)

Iowa State University¹

25 Aug 2002-Concurrency and Computation: Practice and Experience

TL;DR: The methods used in MPI‐CHECK 2.0 are presented to detect many situations where actual and potential deadlocks occur when using blocking and non‐blocking point‐to‐point routines as well as when using collective routines.

...read moreread less

Abstract: SUMMARY The Message-Passing Interface (MPI) is commonly used to write parallel programs for distributed memory parallel computers. MPI-CHECK is a tool developed to aid in the debugging of MPI programs that are written in free or fixed format Fortran 90 and Fortran 77. This paper presents the methods used in MPI-CHECK 2.0 to detect many situationswhereactual andpotential deadlocksoccur whenusingblocking and non-blocking point-to-point routines as well as when using collective routines. Copyright  2002 John Wiley & Sons, Ltd.

...read moreread less

74 citations

Journal Article•10.1016/S0167-8191(02)00103-5•

A software architecture for user transparent parallel image processing

[...]

Frank J. Seinstra¹, Dennis C. Koelma¹, Jan-Mark Geusebroek¹•Institutions (1)

University of Amsterdam¹

1 Aug 2002

TL;DR: Results indicate that the core of the architecture forms a powerful basis for automatic parallelization and optimization of a wide range of imaging software.

...read moreread less

Abstract: This paper describes a software architecture that allows image processing researchers to develop parallel applications in a transparent manner. The architecture's main component is an extensive library of data parallel low level image operations capable of running on homogeneous distributed memory MIMD-style multicomputers. Since the library has an application programming interface identical to that of an existing sequential library, all parallelism is completely hidden from the user.The first part of the paper discusses implementation aspects of the parallel library, and shows how sequential as well as parallel operations are implemented on the basis of so-called parallelizable patterns. A library built in this manner is easily maintainable, as extensive code redundancy is avoided. The second part of the paper describes the application of performance models to ensure efficiency of execution on all target platforms. Experiments show that for a realistic application performance predictions are highly accurate. These results indicate that the core of the architecture forms a powerful basis for automatic parallelization and optimization of a wide range of imaging software.

...read moreread less

74 citations

Patent•

An interface for integrating reconfigurable processors into a general purpose computing system

[...]

Daniel Poznanovic¹•Institutions (1)

University of Colorado Colorado Springs¹

5 Apr 2002

TL;DR: In this paper, the authors describe a method and system for an interface for integrating reconfigurable processors into a general purpose computing system, which includes a command processor, a command list memory, various registers, a direct memory access engine, a translation look-aside buffer, a dedicated section of common memory, and a dedicated memory.

...read moreread less

Abstract: The present invention describes a method and system for an interface for integrating reconfigurable processors into a general purpose computing system. In particular, the system resides in a computer system containing standard instruction processors, as well as reconfigurable processors. The interface includes a command processor, a command list memory, various registers, a direct memory access engine, a translation look-aside buffer, a dedicated section of common memory, and a dedicated memory (Figure 2, 12, 40, 42, 43, 44, 45, 46, 47, 48, 52, 54, 60, 62(1), 62(2(, 64). The interface is controlled via commands from a command list that is created during compilation of a user application, or various direct commands.

...read moreread less

70 citations

Patent•

Shared memory multiprocessor memory model verification system and method

[...]

Sudheendra Hangal¹, Durgam Vahia¹, Juin-Yeu Lu¹, Chaiyasit Manovit¹•Institutions (1)

Sun Microsystems¹

30 Sep 2002

TL;DR: In this article, a system and method for verifying a memory consistency model for a shared memory multiprocessor computer systems generates random instructions to run on the processors, saves the results of the running of the instructions, and analyzes the results to detect a memory subsystem error if the results fall outside of the space of possible outcomes consistent with the memory consistency models.

...read moreread less

Abstract: A system and method for verifying a memory consistency model for a shared memory multiprocessor computer systems generates random instructions to run on the processors, saves the results of the running of the instructions, and analyzes the results to detect a memory subsystem error if the results fall outside of the space of possible outcomes consistent with the memory consistency model. A precedence relationship of the results is determined by uniquely identifying results of a store location with each result distinct to allow association of a read result value to the instruction that created the read result value. A precedence graph with static, direct and derived edges identifies errors when a cycle is detected that indicates results that are inconsistent with memory consistency model rules.

...read moreread less

67 citations

Journal Article•10.1137/S1064827597325165•

A Parallel Implementation of the Nonsymmetric QR Algorithm for Distributed Memory Architectures

[...]

Greg Henry, David S. Watkins, Jack Dongarra

01 Jan 2002-SIAM Journal on Scientific Computing

TL;DR: An approach to parallelizing the QR algorithm that greatly improves scalability is discussed, a theoretical analysis indicates that the algorithm is ultimately not scalable, but the nonscalability does not become evident until the matrix dimension is enormous.

...read moreread less

Abstract: One approach to solving the nonsymmetric eigenvalue problem in parallel is to parallelize the QR algorithm. Not long ago, this was widely considered to be a hopeless task. Recent efforts have led to significant advances, although the methods proposed up to now have suffered from scalability problems. This paper discusses an approach to parallelizing the QR algorithm that greatly improves scalability. A theoretical analysis indicates that the algorithm is ultimately not scalable, but the nonscalability does not become evident until the matrix dimension is enormous. Experiments on the Intel Paragon system, the IBM SP2 supercomputer, the SGI Origin 2000, and the Intel ASCI Option Red supercomputer are reported.

...read moreread less

67 citations

Patent•

Signal processing resource for selective series processing of data in transit on communications paths in multi-processor arrangements

[...]

Winthrop W. Smith

18 Jul 2002

TL;DR: In this article, a multi-processor configuration with configurable signal processing logic is presented, where each processor is provided with a local memory which can be accessed by the local processor as well as by the other processors via the communications paths.

...read moreread less

Abstract: A multi-processor arrangement having an interprocessor communication path between each of every possible pair of processors, in addition to I/O paths to and from the arrangement, having signal processing functions configurably embedded in series with the communication paths and/or the I/O paths. Each processor is provided with a local memory which can be accessed by the local processor as well as by the other processors via the communications paths. This allows for efficient data movement from one processor's local memory to another processor's local memory, such as commonly done during signal processing corner turning operations. The configurable signal processing logic may be configured to host one or more signal processing functions to allow data to be processed prior to its deposit into local memory.

...read moreread less

Journal Article•10.1016/S0045-7930(01)00016-0•

Computational fluid dynamics applications on parallel-vector computers: computations of stirred vessel flows

[...]

C. Bartels¹, Michael Breuer¹, K. Wechsler¹, Franz Durst¹•Institutions (1)

University of Erlangen-Nuremberg¹

01 Jan 2002-Computers & Fluids

TL;DR: It is shown that computational fluid dynamics simulations provide reliable results and yields a detailed and accurate picture of the complex flow phenomena observed in stirred-tank reactors.

...read moreread less

Proceedings Article•10.1145/513918.514070•

Automatic generation of embedded memory wrapper for multiprocessor SoC

[...]

Ferid Gharsalli, Samy Meftali, Frederic Rousseau, Ahmed Amine Jerraya

10 Jun 2002

TL;DR: This paper presents a new methodology for embedded memory design in the case of application specific multiprocessor system-on-chip, and gives also a generic architecture to produce this memory wrapper.

...read moreread less

Abstract: Embedded memory plays a critical role to improve performances of systems-on-chip (SoC). In this paper, we present a new methodology for embedded memory design in the case of application specific multiprocessor system-on-chip. This approach facilitates the integration of standard memory components. The concept of memory wrapper allows automatic adaptation of physical memory interfaces to a communication network that may have a different number of access ports. We give also a generic architecture to produce this memory wrapper. This approach has successfully been applied on a low-level image processing application.

...read moreread less

Journal Article•10.1145/509705.509706•

Automatic data and computation decomposition on distributed memory parallel computers

[...]

PeiZong Lee¹, Zvi M. Kedem²•Institutions (2)

Academia Sinica¹, New York University²

01 Jan 2002-ACM Transactions on Programming Languages and Systems

TL;DR: In this paper, the authors propose a method for handling computation and data synergistically to minimize the overall execution time on distributed memory parallel computers (DMPCs), based on a number of novel techniques, also presented in this article.

...read moreread less

Abstract: To exploit parallelism on shared memory parallel computers (SMPCs), it is natural to focus on decomposing the computation (mainly by distributing the iterations of the nested Do-Loops). In contrast, on distributed memory parallel computers (DMPCs), the decomposition of computation and the distribution of data must both be handled---in order to balance the computation load and to minimize the migration of data. We propose and validate experimentally a method for handling computations and data synergistically to minimize the overall execution time on DMPCs. The method is based on a number of novel techniques, also presented in this article. The core idea is to rank the "importance" of data arrays in a program and specify some of the dominant. The intuition is that the dominant arrays are the ones whose migration would be the most expensive. Using the correspondence between iteration space mapping vectors and distributed dimensions of the dominant data array in each nested Do-loop, allows us to design algorithms for determining data and computation decompositions at the same time. Based on data distribution, computation decomposition for each nested Do-loop is determined based on either the "owner computes" rule or the "owner stores" rule with respect to the dominant data array. If all temporal dependence relations across iteration partitions are regular, we use tiling to allow pipelining and the overlapping of computation and communication. However, in order to use tiling on DMPCs, we needed to extend the existing techniques for determining tiling vectors and tile sizes, as they were originally suited for SMPCs only. The overall method is illustrated on programs for the 2D heat equation, for the Gaussian elimination with pivoting, and for the 2D fast Fourier transform on a linear processor array and on a 2D processor grid.

...read moreread less

Reference Entry•10.1002/0471214426.PAS0202•

Models of Memory

[...]

Jeroen G. W. Raaijmakers¹, Richard M. Shiffrin²•Institutions (2)

University of Amsterdam¹, Indiana University²

15 Jul 2002

TL;DR: In this article, the authors argue that such a generalization across tasks becomes possible because the models specify basic mechanisms rather than simple equations that describe the data in a specific task and illustrate this by a brief review of the developments since the 1950s, followed by a discussion of the major theoretical frameworks that have been developed over the past 25 years.

...read moreread less

Abstract: Over the past 50 years, models for human memory have developed from simple data descriptions for specific tasks to general frameworks that can and have been generalized to most of the major paradigms that are used in memory research. We argue that such a generalization across tasks becomes possible because the models specify basic mechanisms rather than simple equations that describe the data in a specific task. We illustrate this by a brief review of the developments since the 1950s, followed by a discussion of the major theoretical frameworks that have been developed over the past 25 years. The early models such as Estes’ Stimulus Sampling Theory focused on learning, but in the 1960s the emphasis gradually shifted to memory and especially the distinction between short-term and long-term memory. In the 1980s a number of global models were developed that dealt with data from a variety of memory tasks. Although these global memory models have been quite successful there remained some problems, most notably the explanation for the lack of list-strength effects in recognition. Recent developments show that models based on a Bayesian or rational approach (ACT-R, REM) may provide a unified framework for explicit as well as implicit memory. Keywords: associative networks; ACT; distributed memory models; global memory models; mathematical models; SAM; TODAM

...read moreread less

Proceedings Article•10.1145/774789.774806•

Hardware support for real-time embedded multiprocessor system-on-a-chip memory management

[...]

Mohamed Shalan¹, Vincent J. Mooney¹•Institutions (1)

Georgia Institute of Technology¹

6 May 2002

TL;DR: This paper shows how to modify an existing Real-Time Operating System (RTOS) to support the new proposed System-on-a-Chip Dynamic Memory Management Unit (SoCDMMU), which presents a paradigm shift in the way designers look at on-chip dynamic memory allocation.

...read moreread less

Abstract: The aggressive evolution of the semiconductor industry --- smaller process geometries, higher densities, and greater chip complexity --- has provided design engineers the means to create complex high-performance Systems-on-a-Chip (SoC) designs. Such SoC designs typically have more than one processor and huge memory, all on the same chip. Dealing with the global on- chip memory allocation/de-allocation in a dynamic yet deterministic way is an important issue for the upcoming billion transistor multiprocessor SoC designs. To achieve this, we propose a memory management hierarchy we call Two-Level Memory Management. To implement this memory management scheme --- which presents a paradigm shift in the way designers look at on-chip dynamic memory allocation --- we present a System-on-a-Chip Dynamic Memory Management Unit (SoCDMMU) for allocation of the global on-chip memory, which we refer to as Level Two memory management (Level One is the operating system management of memory allocated to a particular on-chip Processing Element). In this way, processing elements (heterogeneous or non-heterogeneous hardware or software) in an SoC can request and be granted portions of the global memory in a fast and deterministic time (for an example of a four processing element SoC, the dynamic memory allocation of the global on-chip memory takes sixteen cycles per allocation/deallocation in the worst case). In this paper, we show how to modify an existing Real-Time Operating System (RTOS) to support the new proposed SoCDMMU. Our example shows a multiprocessor SoC that utilizes the SoCDMMU has 440% overall speedup of the application transition time over fully shared memory that does not utilize the SoCDMMU.

...read moreread less

Journal Article•10.1137/S1064827501386237•

Parallel Algebraic Multigrid Methods on Distributed Memory Computers

[...]

Gundolf Haase, Michael Kuhn, Stefan Reitzinger

01 Feb 2002-SIAM Journal on Scientific Computing

TL;DR: A general parallel algebraic multigrid algorithm for finite element discretizations based on domain decomposition ideas which is well suited for distributed memory computers is proposed and results show the high efficiency of the approach.

...read moreread less

Abstract: Algebraic multigrid methods are well suited as preconditioners for iterative solvers. We consider linear systems of equations which are sparse and symmetric positive definite and which stem from a finite element discretization of a second order self-adjoint elliptic partial differential equation or a system of them. Since preconditioners based on algebraic multigrid are very efficient, additional speedup can only be achieved by parallelization. In this paper, we propose a general parallel algebraic multigrid algorithm for finite element discretizations based on domain decomposition ideas which is well suited for distributed memory computers. This paper pays special attention to the coarsening strategy which has to be adapted in the parallel case. Moreover, a general framework of data distribution gives rise to a construction scheme for the prolongation operators. Results of numerical studies on parallel computers with distributed memory are presented which show the high efficiency of the approach.

...read moreread less

Journal Article•10.1006/JPDC.2001.1815•

Integrating Loop and Data Transformations for Global Optimization

[...]

Michael O'Boyle¹, P.M.W. Knijnenburg²•Institutions (2)

University of Edinburgh¹, Leiden University²

01 Apr 2002-Journal of Parallel and Distributed Computing

TL;DR: A new technique to allow the static application of global data transformations, such as partitioning, to reshaped arrays is presented, eliminating the need for expensive temporary copies and hence eliminating any communication and synchronization.

...read moreread less

Patent•

Compiler for multiple processor and distributed memory architectures

[...]

Clifford Liem, Francois Breant, Alex Wu

25 Jan 2002

TL;DR: In this paper, a compiler for multiple processor and distributed memory architectures is described, which uses a high-level language to represent a task-level network of behaviors that describes an embedded system.

...read moreread less

Abstract: A compiler for multiple processor and distributed memory architectures is described. The compiler uses a high-level language to represent a task-level network of behaviors that describes an embedded system. The compiler maps a plurality of tasks and data onto a multiple processor, distributed memory hardware architecture. The mapping includes describing a task-level network of behaviors, each of the task-level network of behaviors being related through control and data flow. The mapping further includes predicting a schedule of tasks for the task-level network of behaviors and allocating the plurality of tasks and data to at least one of the multiple processors and to at least one of distributed memory, respectively, in response to the predicted schedule of tasks.

...read moreread less

NOBLE : A Non-Blocking Inter-Process Communication Library

[...]

Håkan Sundell, Philippas Tsigas

1 Jan 2002

TL;DR: This paper introduces a library support for multi-process non-blocking synchronization called NOBLE, which provides an inter-process communication interface that allows the user to select synchronisation methods transparently to the one that suits best for the current application.

...read moreread less

Abstract: Many applications on shared memory multi-processor machines can benefit from the exploitation of parallelism that non-blocking synchronization offers. In this paper, we introduce a library support for multi-process non-blocking synchronization called NOBLE. Noble provides an inter-process communication interface that allows the user to select synchronisation methods transparently to the one that suits best for the current application. The selection can take place even at run-time. The library provides a collection of the most commonly used data types and protocols in a form, which allows them to be used by non-experts. We describe the functionality and the implementation of the library functions and illustrate the library programming style with example programs. The examples show that using the library can considerably reduce the runtime on distributed memory machines.

...read moreread less

Patent•

System and method for implementing shared memory regions in distributed shared memory systems

[...]

Chia Y. Wu¹, John D. Acton²•Institutions (2)

Sun Microsystems¹, Oracle Corporation²

13 Dec 2002

TL;DR: In this paper, a distributed shared memory system may involve implementing several different shared memory regions in each distributed node, and each node may reflect write access requests targeting those regions to one or more other nodes, depending on which shared region is targeted (e.g., requests targeting one region may be reflected to a single other node while requests targeting other regions may be reflect to more than one other node).

...read moreread less

Abstract: Various embodiments of systems and methods for implementing shared memory regions in a distributed shared memory system may involve implementing several different shared memory regions in each distributed shared memory node. Each node may reflect write access requests targeting those shared memory regions to one or more other nodes, depending on which shared region is targeted (e.g., requests targeting one region may be reflected to a single other node while requests targeting other regions may be reflected to more than one other node). A node's completion of the requested write access locally may be dependent on the completion of the write access in the other nodes, depending on which shared memory region is targeted.

...read moreread less

Journal Article•10.1016/S0169-2607(01)00121-3•

A parallel Monte Carlo code for planar and SPECT imaging: implementation, verification and applications in 131I SPECT

[...]

Yuni K. Dewaraja¹, Michael Ljungberg², Amitava Majumdar³, Abhijit Bose¹, Kenneth F. Koral¹ - Show less +1 more•Institutions (3)

University of Michigan¹, Lund University², University of California, San Diego³

01 Feb 2002-Computer Methods and Programs in Biomedicine

TL;DR: The implementation of the SIMIND Monte Carlo code on an IBM SP2 distributed memory parallel computer uses the Message Passing Interface (MPI) library for interprocessor communication and the Scalable Parallel Random Number Generator (SPRNG) to generate uncorrelated random number streams.

...read moreread less

Patent•

Method and system for efficient emulation of multiprocessor address translation on a multiprocessor host

[...]

Erik R. Altman¹, Ravi Nair¹, John Kevin Patrick O'Brien¹, Kathryn M. O'Brien¹, Oden Peter Howland¹, Prener Daniel Arthur¹, Sumeda Wasudeo Sathaye¹ - Show less +3 more•Institutions (1)

IBM¹

17 Sep 2002

TL;DR: In this paper, a method for mapping a memory addressing of a multiprocessing system when it is emulated using a virtual memory addressing using a VM addressing of another multi-core system is presented.

...read moreread less

Abstract: A method (and structure) of mapping a memory addressing of a multiprocessing system when it is emulated using a virtual memory addressing of another multiprocessing system includes accessing a local lookaside table (LLT) on a target processor with a target virtual memory address. Whether there is a “miss” in the LLT is determined and, with the miss determined in the LLT, a lock for a global page table is obtained.

...read moreread less

Patent•

Method and system to identify a memory corruption source within a multiprocessor system

[...]

Christopher Harry Austen¹, Van Hoa Lee¹, II Milton Devon Miller¹, Douglas W. Oliver¹•Institutions (1)

IBM¹

27 Feb 2002

TL;DR: In this article, a method and system for identifying a source of corrupt data in a memory in a multiprocessor computer system is presented, where the corrupt data and its address are identified.

...read moreread less

Abstract: A method and system for identifying a source of a corrupt data in a memory in a multiprocessor computer system. When a computer program stores corrupt data causing a program failure or a system crash, the corrupt data and its address are identified. The multiprocessor computer system is shut down, and the corrupt data is cleared from the memory. Before fully re-booting the multiprocessor computer system, a processor is selected from the multiprocessor computer system to load and run monitor code designed to monitor the location where the corrupt data was stored. The program that previously stored the corrupt data is restarted, and the selected processor detects any re-storage of the corrupt data in the same memory address. All processors in the computer system are then immediately suspended. The registers of all processors suspected of storing corrupt data are inspected to determine the source of the corrupt data.

...read moreread less

Proceedings Article•10.1109/ICME.2002.1035522•

Multi-level memory prefetching for media and stream processing

[...]

Jason E. Fritts¹•Institutions (1)

Washington University in St. Louis¹

7 Nov 2002

TL;DR: Results show that combining prefetching at the L1 and DRAM memory levels provides the most effectivePrefetching with minimal extra bandwidth, enabling more efficient memory performance for media and stream processing.

...read moreread less

Abstract: This paper presents a multi-level memory prefetch hierarchy for media and stream processing applications Two major bottlenecks in the performance of multimedia and network applications are long memory latencies and limited off-chip processor bandwidth Aggressive prefetching can be used to mitigate the memory latency problem, but overly aggressive prefetching may overload the limited external processor bandwidth To accommodate both problems, we propose multilevel memory prefetching The multi-level organization enables conservative prefetching on-chip and more aggressive prefetching off-chip The combination provides aggressive prefetching while minimally impacting off-chip bandwidth, enabling more efficient memory performance for media and stream processing This paper presents preliminary results for multi-level memory prefetching, which show that combining prefetching at the L1 and DRAM memory levels provides the most effective prefetching with minimal extra bandwidth

...read moreread less

Proceedings Article•10.1145/545056.545062•

A multi-agent platform for a corporate semantic web

[...]

Fabien Gandon¹, Laurent Berthelot¹, Rose Dieng-Kuntz¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

15 Jul 2002

TL;DR: The technical choices and the design of a multi-agents software architecture to manage a corporate memory in the form of a corporate semantic web are described and the approach to tackle a distributed memory and distributed queries is presented.

...read moreread less

Abstract: We describe the technical choices and the design of a multi-agents software architecture to manage a corporate memory in the form of a corporate semantic web. We then present our approach to tackle a distributed memory and distributed queries.

...read moreread less

Patent•

Embedded symmetric multiprocessor system with arbitration control of access to shared resources

[...]

Steven R. Jahnke¹•Institutions (1)

Texas Instruments¹

27 Sep 2002

TL;DR: In this paper, the authors propose a shared memory access arbitration logic (SMAAL) for embedded symmetric multiprocessor (ESMP) architectures, which can arbitrate among central processing units for access.

...read moreread less

Abstract: A single chip, embedded symmetric multiprocessor (ESMP) having parallel multiprocessing architecture composed of identical processors includes a single program memory. Program access arbitration logic supplies an instruction to a single requesting central processing unit at a time. Shared memory access arbitration logic can supply data from separate simultaneously accessible memory banks or arbitrate among central processing units for access. The system may simulate an atomic read/modify/write instruction by prohibiting access to the one address by another central processing unit for a predetermined number of memory cycles following a read access to one of a predetermined set of addresses in said shared memory.

...read moreread less

Proceedings Article•10.1109/HPCSA.2002.1019151•

Parallel Gaussian elimination using OpenMP and MPI

[...]

S.F. McGinn¹, R.E. Shaw¹•Institutions (1)

University of New Brunswick¹

16 Jun 2002

TL;DR: A parallel algorithm for Gaussian elimination is presented: in both a shared memory environment using OpenMP, and in a distributedMemory environment using MPI.

...read moreread less

Abstract: In this paper, we present a parallel algorithm for Gaussian elimination: in both a shared memory environment using OpenMP, and in a distributed memory environment using MPI. Parallel LU and Gaussian algorithms for linear systems are studied extensively, and the the results of examining various load balancing schemes on both platforms are presented. The results show an improvement in many cases over the default implementation.

...read moreread less

Journal Article•10.1016/S1571-0661(05)80390-1•

A Distributed Algorithm for Strong Bisimulation Reduction of State Spaces

[...]

Stefan Blom, Simona Orzan

01 Oct 2002-Electronic Notes in Theoretical Computer Science

TL;DR: This work designed and implemented a bisimulation reduction algorithm for distributed memory settings using message passing communication, and shows that the algorithm scales up with the number of workers.

...read moreread less

Patent•

Memory manager for a common memory

[...]

Barry J. Oldfield¹, Robert A. Rust¹•Institutions (1)

Hewlett-Packard¹

7 Mar 2002

TL;DR: In this paper, a hardware-based memory management technology manages memory access requests to a common memory shared by multiple requesting entities, including prioritizing and arbitrating such requests, and minimizing latency of such requests.

...read moreread less

Abstract: The memory management technology controls, as described herein, access to and monitors availability of common memory resources. In particular, this hardware-based, memory-management technology manages memory access requests to a common memory shared by multiple requesting entities. This includes prioritizing and arbitrating such requests. It further includes minimizing latency of such requests. This abstract itself is not intended to limit the scope of this patent. The scope of the present invention is pointed out in the appending claims.

...read moreread less

...

Expand