Journal Article10.1145/128738.128740
Efficient trace-driven simulation methods for cache performance analysis
Wen-Hann Wang,Jean-Loup Baer +1 more
90
TL;DR: This work reduces the program traces to the extent that exact performance can still be obtained from the reduced traces and devise an algorithm that can produce performance results for a variety of metrics for a large number of set-associative write-back caches in just a single simulation run.
read more
Abstract: We propose improvements to current trace-driven cache simulation methods to make them faster and mnre economical. We attack the large time and space demands of cache simulation in two nays. First, we reduce the program traces to the extent that exact performance can still be obtained from the reduced traces. Second, we devise an algorithm that can produce performance results for a variety of metrics (hit ratio, write-back counts, bus traffic) for a large number of set-associative write-back caches in just a single simulation run. The trace reduction and the efficient simulation techniques are extended to parallel multiprocessor cache simulations. Our simulation results show that our approach substantially reduces the disk space needed to store the program traces and can dramatically speedup cache simulations and still produce the exact results.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Trace-driven memory simulation: a survey
Richard Uhlig,Trevor Mudge +1 more
TL;DR: A survey and analysis of trace-driven memory simulation tools can be found in this article, where the authors discuss the strengths and weaknesses of different approaches and show that no single method is best when all criteria, including accuracy, speed, memory, flexibility, portability, expense, and ease of use are considered.
332
Dynamic tracking of page miss ratio curve for memory management
Pin Zhou,Vivek Pandey,Jagadeesan Sundaresan,Anand Raghuraman,Yuanyuan Zhou,Sanjeev Kumar +5 more
- 07 Oct 2004
TL;DR: The real system experiments on Linux with applications including Apache Web Server show that the MRC-directed memory allocation can speed up the applications' execution/response time by up to a factor of 5.86 and reduce the number of page faults byUp to 63.1%.
269
A comparison of trace-sampling techniques for multi-megabyte caches
TL;DR: The paper compares the trace-sampling techniques of set sampling and time sampling using the multi-billion reference traces of A.A. Borg et al. (1990) and applies both techniques to multi-megabyte caches, where sampling is most valuable, to find that set sampling meets the 10% sampling goal, while time sampling does not.
Quantifying software performance, reliability and security
TL;DR: In this paper, an architecture-based unified hierarchical model for software performance, reliability, security and cache behavior prediction is proposed, which employs discrete time Markov chains (DTMCs) to model software systems and provides expressions for predicting the overall behavior of the system based on its architecture as well as the characteristics of individual components.
112
Performance analysis and its impact on design
Pradip Bose,Thomas M. Conte +1 more
TL;DR: This work focuses on architectural performance, typically measured in cycles per instruction, and covers some of the advances in dealing with modern problems in performance analysis.
109
References
Evaluation techniques for storage hierarchies
TL;DR: A new and efficient method of determining, in one pass of an address trace, performance measures for a large class of demand-paged, multilevel storage systems utilizing a variety of mapping schemes and replacement algorithms.
1.4K
Available instruction-level parallelism for superscalar and superpipelined machines
Norman P. Jouppi,David W. Wall +1 more
- 01 Apr 1989
TL;DR: A parameterizable code reorganization and simulation system was developed and used to measure instruction-level parallelism and the average degree of superpipelining metric is introduced, suggesting that this metric is already high for many machines.
A class of compatible cache consistency protocols and their support by the IEEE futurebus
P. Sweazey,Alan Jay Smith +1 more
- 01 May 1986
TL;DR: This paper defines a class of compatible consistency protocols supported by the current IEEE Futurebus design, referred to as the MOESI class of protocols, which has the property that any system component can select (dynamically) any action permitted by any protocol in the class, and be assured that consistency is maintained throughout the system.
•Book
Available instruction-level parallelism for superscalar and superpipelined machines
Norman P. Jouppi,David W. Wall +1 more
- 01 Mar 1995
TL;DR: A parameterizable code reorganization and simulation system was developed and used to measure instruction-level parallelism and the average degree of superpipelining metric is introduced, suggesting that this metric is already high for many machines.
285
A case for direct-mapped caches
TL;DR: Direct-mapped caches are defined, and it is shown that trends toward larger cache sizes and faster hit times favor their use.
Related Papers (5)
John L. Hennessy,David A. Patterson +1 more
- 01 Dec 1989
Wen-Hann Wang,Jean-Loup Baer +1 more
- 01 Apr 1990
[...]