Top 5 papers published in the topic of Distributed memory in 2024

Showing papers on "Distributed memory published in 2024"

Journal Article•10.1016/j.jpdc.2024.104944•

PPB-MCTS: A Novel Distributed-Memory Parallel Partial-Backpropagation Monte Carlo Tree Search Algorithm

[...]

Yashar Naderzadeh, Daniel Grosu, Ratna Babu Chinnam

01 Jun 2024-Journal of Parallel and Distributed Computing

TL;DR: PPB-MCTS is a novel distributed-memory parallel MCTS algorithm designed to significantly reduce communication overhead and maintain or improve performance in combinatorial optimization problems.

...read moreread less

Abstract: Monte-Carlo Tree Search (MCTS) is an adaptive and heuristic tree-search algorithm designed to uncover sub-optimal actions at each decision-making point. This method progressively constructs a search tree by gathering samples throughout its execution. Predominantly applied within the realm of gaming, MCTS has exhibited exceptional achievements. Additionally, it has displayed promising outcomes when employed to solve NP-hard combinatorial optimization problems. MCTS has been adapted for distributed-memory parallel platforms. The primary challenges associated with distributed-memory parallel MCTS are the substantial communication overhead and the necessity to balance the computational load among various processes. In this work, we introduce a novel distributed-memory parallel MCTS algorithm with partial backpropagations, referred to as Parallel Partial-Backpropagation MCTS (PPB-MCTS). Our design approach aims to significantly reduce the communication overhead while maintaining, or even slightly improving, the performance in the context of combinatorial optimization problems. To address the communication overhead challenge, we propose a strategy involving transmitting an additional backpropagation message. This strategy avoids attaching an information table to the communication messages exchanged by the processes, thus reducing the communication overhead. Furthermore, this approach contributes to enhancing the decision-making accuracy during the selection phase. The load balancing issue is also effectively addressed by implementing a shared transposition table among the parallel processes. Furthermore, we introduce two primary methods for managing duplicate states within distributed-memory parallel MCTS, drawing upon techniques utilized in addressing duplicate states within sequential MCTS. Duplicate states can transform the conventional search tree into a Directed Acyclic Graph (DAG). To evaluate the performance of our proposed parallel algorithm, we conduct an extensive series of experiments on solving instances of the Job-Shop Scheduling Problem (JSSP) and the Weighted Set-Cover Problem (WSCP). These problems are recognized for their complexity and classified as NP-hard combinatorial optimization problems with considerable relevance within industrial applications. The experiments are performed on a cluster of computers with many cores. The empirical results highlight the enhanced scalability of our algorithm compared to that of the existing distributed-memory parallel MCTS algorithms. As the number of processes increases, our algorithm demonstrates increased rollout efficiency while maintaining an improved load balance across processes.

...read moreread less

Journal Article•10.1109/scm62608.2024.10554130•

Improving the MPI Remote Memory Access Model for Distributed-memory Systems by Implementing One-sided Broadcast

[...]

Mohamed Abuelsoud, Alexey A. Paznikov

22 May 2024

TL;DR: Implementing one-sided broadcast collectives in MPI RMA significantly improves performance compared to traditional methods.

...read moreread less

Abstract: Currently, processing large volumes of expanding data efficiently and consistently is a significant challenge. Traditional distributed-memory high-performance computers (HPC) based on message-passing model struggle with inherent synchronization difficulties, limiting their ability to keep pace. Remote Memory Access (RMA, also known as one-sided MPI communications) allows a process to directly read from or write to the memory of another process, bypassing the need for message exchange. Unfortunately, there is no collective operation interface in the current MPI RMA standard. However, RMA has the potential to reduce synchronization costs by enabling concurrent access to shared data structures, distributed among MPI processes' memories. Existing onesided MPI standards offer a linear interface only that hampers parallelization and far from efficient. To bridge this gap, we propose an algorithm design for efficient collective (parallelizable) operations in the RMA paradigm. Our study primarily examines the benefits of collective operations using the broadcast algorithm as an example. Our implementations surpass traditional methods, demonstrating the promising potential of this technique, as more performance tests indicate.

...read moreread less

Journal Article•10.1142/s2591728524500221•

Long-range hydroacoustic propagation modelling schemes on distributed memory parallel computers

[...]

Noriyuki Kushida, Ying-Tsong Lin

27 Dec 2024-Journal of theoretical and computational acoustics

Preprint•10.1145/3650200.3656632•

Distributed Ranges: A Model for Distributed Data Structures, Algorithms, and Views

[...]

Billy C. Brock, Robert Cohn, Suyash Bakshi, Tuomas Kärnä, Jeongnim Kim, Mateusz Nowak, Łukasz Ślusarczyk, Kacper Stefanski, Timothy G. Mattson - Show less +5 more

30 May 2024

TL;DR: Distributed ranges provide a model for distributed data structures, algorithms, and views, enabling high-level parallel programming with interoperability and performance.

...read moreread less

Abstract: Data structures and algorithms are essential building blocks for programs, and distributed data structures, which automatically partition data across multiple memory locales, are essential to writing high-level parallel programs. While many projects have designed and implemented C++ distributed data structures and algorithms, there has not been widespread adoption of an interoperable model allowing algorithms and data structures from different libraries to work together. This paper introduces distributed ranges, which is a model for building generic data structures, views, and algorithms. A distributed range extends a C++ range, which is an iterable sequence of values, with a concept of segmentation, thus exposing how the distributed range is partitioned over multiple memory locales. Distributed data structures provide this distributed range interface, which allows them to be used with a collection of generic algorithms implemented using the distributed range interface. The modular nature of the model allows for the straightforward implementation of distributed views, which are lightweight objects that provide a lazily evaluated view of another range. Views can be composed together recursively and combined with algorithms to implement computational kernels using efficient, flexible, and high-level standard C++ primitives. We evaluate the distributed ranges model by implementing a set of standard concepts and views as well as two execution runtimes, a multi-node, MPI-based runtime and a single-process, multi-GPU runtime. We demonstrate that high-level algorithms implemented using generic, high-level distributed ranges can achieve performance competitive with highly-tuned, expert-written code.

...read moreread less

Repository•10.48550/arxiv.2406.00158•

Distributed Ranges: A Model for Distributed Data Structures, Algorithms, and Views

[...]

Brock, Benjamin, Cohn Robert, Karna Tuomas, Kim, Jeongnim, Nowak, Mateusz, Mattson, Timothy G. - Show less +2 more

4 Jun 2024

Abstract: Data structures and algorithms are essential building blocks for programs, and \emph{distributed data structures}, which automatically partition data across multiple memory locales, are essential to writing high-level parallel programs. While many projects have designed and implemented C++ distributed data structures and algorithms, there has not been widespread adoption of an interoperable model allowing algorithms and data structures from different libraries to work together. This paper introduces distributed ranges, which is a model for building generic data structures, views, and algorithms. A distributed range extends a C++ range, which is an iterable sequence of values, with a concept of segmentation, thus exposing how the distributed range is partitioned over multiple memory locales. Distributed data structures provide this distributed range interface, which allows them to be used with a collection of generic algorithms implemented using the distributed range interface. The modular nature of the model allows for the straightforward implementation of \textit{distributed views}, which are lightweight objects that provide a lazily evaluated view of another range. Views can be composed together recursively and combined with algorithms to implement computational kernels using efficient, flexible, and high-level standard C++ primitives. We evaluate the distributed ranges model by implementing a set of standard concepts and views as well as two execution runtimes, a multi-node, MPI-based runtime and a single-process, multi-GPU runtime. We demonstrate that high-level algorithms implemented using generic, high-level distributed ranges can achieve performance competitive with highly-tuned, expert-written code.

...read moreread less