Journal Article10.1016/J.JOCS.2013.05.007
Layout-aware scientific computing: A case study using the MILC code
1
TL;DR: A straightforward model to optimize the layout for scientific applications by minimizing inter-node communication cost is proposed that takes into account the latency and bandwidth of the network and associates them with the dominant layout variables of the application.
read more
About: This article is published in Journal of Computational Science. The article was published on 01 Nov 2013. The article focuses on the topics: Scalability & Bottleneck.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Towards scalable mathematics and scalable algorithms for extreme scale computing
TL;DR: This special issue focuses on selected papers from the calable Algorithms for Large Scale Systems Workshop (ScalA’11), eld at Supercomputing 2011, as well as selected papers of some of he keynotes of ScalA‘10 workshop.
4
References
Thousand core chips: a technology perspective
Shekhar Borkar
- 04 Jun 2007
TL;DR: The many-core architecture, with hundreds to thousands of small cores, is presented to deliver unprecedented compute performance in an affordable power envelope and fine grain power management, memory bandwidth, on die networks, and system resiliency are discussed.
1K
Topology-aware task mapping for reducing communication contention on large parallel machines
Tarun Kumar Agarwal,Aakriti Sharma,A. Laxmikant,Laxmikant V. Kale +3 more
- 25 Apr 2006
TL;DR: A process mapping strategy is demonstrated that minimizes the impact of topology by heuristically minimizing the total number of hop-bytes communicated and is implemented in an adaptive runtime system in Charm++ and adaptive MPI.
Designing High Performance and Scalable MPI Intra-node Communication Support for Clusters
Lei Chai,A. Hartono,Dhabaleswar K. Panda +2 more
- 25 Sep 2006
TL;DR: A new design for MPI intra-node communication that aims to achieve both high performance and good scalability in a cluster environment and utilizes the cache efficiently and requires no locking mechanisms to achieve optimal performance even with large system size is presented.
Optimizing task layout on the Blue Gene/L supercomputer
TL;DR: A heuristic map is implemented that attempts to sequentially map a domain and its communication neighbors either to the same BG/L node or to near-neighbor nodes on theBG/L torus, while keeping the number of domains mapped to a BG/ L node constant.
Application-specific topology-aware mapping for three dimensional topologies
Abhinav Bhatele,Laxmikant V. Kale +1 more
- 14 Apr 2008
TL;DR: Preliminary work utilizing topology-aware mapping by the runtime using similar ideas for a molecular dynamics application, NAMD, and performance improvements resulting from it in the context of a n-dimensional k-point stencil program are described.