Quantifying locality effect in data access delay: memory logP

doi:10.1109/IPDPS.2003.1213137

Proceedings Article10.1109/IPDPS.2003.1213137

Quantifying locality effect in data access delay: memory logP

Kirk W. Cameron, +1 more

- 22 Apr 2003

- pp 48

43

TL;DR: This work presents a simple and useful model of point-to-point memory communication to predict and analyze the latency of memory copy, pack and unpack and uses the model to isolate contributions of hardware, middleware, and software to data transfers on Intel- and MIPS-based platforms.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Proceedings Article•10.1145/2462902.2462916

Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi

Sabela Ramos, +1 more

- 17 Jun 2013

TL;DR: An intuitive performance model for cache-coherent architectures is developed and used to develop several optimal and optimized algorithms for complex parallel data exchanges that beat the performance of the highly-tuned vendor-specific Intel OpenMP and MPI libraries.

...read moreread less

90

•Proceedings Article•10.1109/CLUSTR.2003.1253341

Improving the performance of MPI derived datatypes by optimizing memory-access cost

Byna, +3 more

- 01 Jan 2003

TL;DR: This paper presents performance results for a matrix-transpose example that demonstrate that the implementation of derived datatypes significantly outperforms both manual packing by the user and the existing derived-datatype code in the MPI implementation (MPICH).

...read moreread less

56

Journal Article•10.1007/S11227-009-0296-3

Performance analysis and optimization of MPI collective operations on multi-core clusters

Bibo Tu, +3 more

- 01 Apr 2012

- The Journal of Supercomputing

TL;DR: A new parallel computation model to unitedly abstract memory hierarchy on multi-core clusters in vertical and horizontal levels is proposed, which provides the theoretical underpinning for the optimal design of MPI collective operations.

...read moreread less

45

Journal Article•10.1109/TC.2007.38

$\log_{\rm n}{\rm P}$ and $\log_{3}{\rm P}$: Accurate Analytical Models of Point-to-Point Communication in Distributed Systems

Kirk W. Cameron, +2 more

- 01 Mar 2007

- IEEE Transactions on Computers

TL;DR: This work presents a general software-parameterized model of point-to-point communication for use in performance prediction and evaluation, and illustrates the utility of the model in three ways: to derive a simplified, useful, more accurate, and to express, compare, and contrast existing communication models.

...read moreread less

44

Journal Article•10.1007/S11704-007-0016-1

Models of parallel computation: a survey and classification

Zhang Yunquan, +3 more

- 05 Jun 2007

- Frontiers of Computer Science

TL;DR: The state-of-the-art parallel computational model research is reviewed and various models that were developed during the past decades are introduced, according to their targeting architecture features, especially memory organization, into three generations.

...read moreread less

26

...

Expand

References

Journal Article•10.1145/79173.79181

A bridging model for parallel computation

Leslie G. Valiant

- 01 Aug 1990

- Communications of The ACM

TL;DR: The bulk-synchronous parallel (BSP) model is introduced as a candidate for this role, and results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.

...read moreread less

4.1K

Journal Article•10.1145/240455.240477

LogP: a practical model of parallel computation

David E. Culler, +7 more

- 01 Nov 1996

- Communications of The ACM

TL;DR: Enough to be generally useful and to keep the algorithm analysis tractable to produce a better program in practice.

...read moreread less

344

•Book Chapter•10.1007/3-540-48158-3_2

Reproducible Measurements of MPI Performance Characteristics

William Gropp, +1 more

- 26 Sep 1999

TL;DR: The mpptest suite of performance measurement programs developed at Argonne National Laboratory as mentioned in this paper attempts to avoid such mistakes and obtain reproducible measures of MPI performance that can be useful to both MPI implementors and MPI application writers.

...read moreread less

241

Journal Article•10.1109/12.869323

Memory hierarchy considerations for cost-effective cluster computing

Xing Du, +2 more

- 01 Sep 2000

- IEEE Transactions on Computers

TL;DR: This study shows that the depth of the memory hierarchy is the most sensitive factor affecting the execution time for many types of workloads, and presents quantitative recommendations for building cost-effective clusters for different workloads.

...read moreread less

34

Proceedings Article•10.1109/IPDPS.2002.1016562

Exploiting transparent remote memory access for non-contiguous- and one-sided-communication

J. Worringen, +2 more

- 15 Apr 2002

TL;DR: This paper presents two of the most recent optimizations in SCI-MPICH, an MPICH variant for the SCI interconnect, which make use of the global shared memory provided by this interconnect: efficient communication with non-contiguous MPI datatypes and one-sided communication according to the MPI-2 standard.

...read moreread less

30