Conference
Computing Frontiers
About: Computing Frontiers is an academic conference. The conference publishes majorly in the area(s): Computer science & Cache. Over the lifetime, 834 publications have been published by the conference receiving 12365 citations.
Topics: Computer science, Cache, Scalability, Speedup, Multi-core processor
Papers published on a yearly basis
Papers
14 Apr 2004
TL;DR: The short Computer Architecture News note that coined the phrase "Memory Wall" is reviewed, including the motivation behind the note, the context in which it was written, and the controversy it sparked.
Abstract: This paper looks at the evolution of the "Memory Wall" problem over the past decade. It begins by reviewing the short Computer Architecture News note that coined the phrase, including the motivation behind the note, the context in which it was written, and the controversy it sparked. What has changed over the years? Are we hitting the Memory Wall? And if so, for what types of applications?
439 citations
3 May 2006
TL;DR: This work introduces a performance model for Cell and applies it to several key scientific computing kernels: dense matrix multiply, sparse matrix vector multiply, stencil computations, and 1D/2D FFTs, and proposes modest microarchitectural modifications that could significantly increase the efficiency of double-precision calculations.
Abstract: The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. As a result, the high performance computing community is examining alternative architectures that address the limitations of modern cache-based designs. In this work, we examine the potential of using the forthcoming STI Cell processor as a building block for future high-end computing systems. Our work contains several novel contributions. First, we introduce a performance model for Cell and apply it to several key scientific computing kernels: dense matrix multiply, sparse matrix vector multiply, stencil computations, and 1D/2D FFTs. The difficulty of programming Cell, which requires assembly level intrinsics for the best performance, makes this model useful as an initial step in algorithm design and evaluation. Next, we validate the accuracy of our model by comparing results against published hardware results, as well as our own implementations on the Cell full system simulator. Additionally, we compare Cell performance to benchmarks run on leading superscalar (AMD Opteron), VLIW (Intel Itanium2), and vector (Cray X1E) architectures. Our work also explores several different mappings of the kernels and demonstrates a simple and effective programming model for Cell's unique architecture. Finally, we propose modest microarchitectural modifications that could significantly increase the efficiency of double-precision calculations. Overall results demonstrate the tremendous potential of the Cell architecture for scientific computations in terms of both raw performance and power efficiency.
373 citations
3 May 2006
TL;DR: It is argued that the benefits of heterogeneous CMPs are bolstered by the usage of a dynamic assignment policy, i.e., a runtime mechanism which observes the behavior of the running threads and exploits thread migration between the cores.
Abstract: In a multi-programmed computing environment, threads of execution exhibit different runtime characteristics and hardware resource requirements. Not only do the behaviors of distinct threads differ, but each thread may also present diversity in its performance and resource usage over time. A heterogeneous chip multiprocessor (CMP) architecture consists of processor cores and caches of varying size and complexity. Prior work has shown that heterogeneous CMPs can meet the needs of a multi-programmed computing environment better than a homogeneous CMP system. In fact, the use of a combination of cores with different caches and instruction issue widths better accommodates threads with different computational requirements.A central issue in the design and use of heterogeneous systems is to determine an assignment of tasks to processors which better exploits the hardware resources in order to improve performance. In this paper we argue that the benefits of heterogeneous CMPs are bolstered by the usage of a dynamic assignment policy, i.e., a runtime mechanism which observes the behavior of the running threads and exploits thread migration between the cores. We validate our analysis by means of simulation. Specifically, our model assumes a combination of Alpha EV5 and Alpha EV6 processors and of integer and floating point programs from the SPEC2000 benchmark suite. We show that a dynamic assignment can outperform a static one by 20% to 40% on average and by as much as 80% in extreme cases, depending on the degree of multithreading simulated.
257 citations
14 May 2013
TL;DR: An experimental study of a novel computing system (algorithm plus platform) that carries out quantum annealing, a type of adiabatic quantum computation, to solve optimization problems to solve NP-hard problem domains is described.
Abstract: This paper describes an experimental study of a novel computing system (algorithm plus platform) that carries out quantum annealing, a type of adiabatic quantum computation, to solve optimization problems. We compare this system to three conventional software solvers, using instances from three NP-hard problem domains. We also describe experiments to learn how performance of the quantum annealing algorithm depends on input.
241 citations
20 May 2014
TL;DR: A general framework for integrating FPGAs into the cloud is proposed and a prototype of the framework is implemented based on OpenStack, Linux-KVM and Xilinx FPGA, which enables isolation between multiple processes in multiple VMs, precise quantitative acceleration resource allocation, and priority-based workload scheduling.
Abstract: Cloud computing is becoming a major trend for delivering and accessing infrastructure on demand via the network. Meanwhile, the usage of FPGAs (Field Programmable Gate Arrays) for computation acceleration has made significant inroads into multiple application domains due to their ability to achieve high throughput and predictable latency, while providing programmability, low power consumption and time-to-value. Many types of workloads, e.g. databases, big data analytics, and high performance computing, can be and have been accelerated by FPGAs. As more and more workloads are being deployed in the cloud, it is appropriate to consider how to make FPGAs and their capabilities available in the cloud. However, such integration is non-trivial due to issues related to FPGA resource abstraction and sharing, compatibility with applications and accelerator logics, and security, among others. In this paper, a general framework for integrating FPGAs into the cloud is proposed and a prototype of the framework is implemented based on OpenStack, Linux-KVM and Xilinx FPGAs. The prototype enables isolation between multiple processes in multiple VMs, precise quantitative acceleration resource allocation, and priority-based workload scheduling. Experimental results demonstrate the effectiveness of this prototype, an acceptable overhead, and good scalability when hosting multiple VMs and processes.
228 citations
Performance Metrics
| Year | Papers |
|---|---|
| 2021 | 31 |
| 2020 | 39 |
| 2019 | 60 |
| 2018 | 59 |
| 2017 | 63 |
| 2016 | 66 |