Computing Frontiers

Conference Tools

Papers published on a yearly basis

Papers

Proceedings Article•10.1145/977091.977115•

Reflections on the memory wall

[...]

Sally A. McKee¹•Institutions (1)

Cornell University¹

14 Apr 2004

TL;DR: The short Computer Architecture News note that coined the phrase "Memory Wall" is reviewed, including the motivation behind the note, the context in which it was written, and the controversy it sparked.

...read moreread less

Abstract: This paper looks at the evolution of the "Memory Wall" problem over the past decade. It begins by reviewing the short Computer Architecture News note that coined the phrase, including the motivation behind the note, the context in which it was written, and the controversy it sparked. What has changed over the years? Are we hitting the Memory Wall? And if so, for what types of applications?

...read moreread less

439 citations

Proceedings Article•10.1145/1128022.1128027•

The potential of the cell processor for scientific computing

[...]

Samuel Williams¹, John Shalf¹, Leonid Oliker¹, Shoaib Kamil¹, Parry Husbands¹, Katherine Yelick¹ - Show less +2 more•Institutions (1)

Lawrence Berkeley National Laboratory¹

3 May 2006

TL;DR: This work introduces a performance model for Cell and applies it to several key scientific computing kernels: dense matrix multiply, sparse matrix vector multiply, stencil computations, and 1D/2D FFTs, and proposes modest microarchitectural modifications that could significantly increase the efficiency of double-precision calculations.

...read moreread less

Abstract: The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. As a result, the high performance computing community is examining alternative architectures that address the limitations of modern cache-based designs. In this work, we examine the potential of using the forthcoming STI Cell processor as a building block for future high-end computing systems. Our work contains several novel contributions. First, we introduce a performance model for Cell and apply it to several key scientific computing kernels: dense matrix multiply, sparse matrix vector multiply, stencil computations, and 1D/2D FFTs. The difficulty of programming Cell, which requires assembly level intrinsics for the best performance, makes this model useful as an initial step in algorithm design and evaluation. Next, we validate the accuracy of our model by comparing results against published hardware results, as well as our own implementations on the Cell full system simulator. Additionally, we compare Cell performance to benchmarks run on leading superscalar (AMD Opteron), VLIW (Intel Itanium2), and vector (Cray X1E) architectures. Our work also explores several different mappings of the kernels and demonstrates a simple and effective programming model for Cell's unique architecture. Finally, we propose modest microarchitectural modifications that could significantly increase the efficiency of double-precision calculations. Overall results demonstrate the tremendous potential of the Cell architecture for scientific computations in terms of both raw performance and power efficiency.

...read moreread less

373 citations

Proceedings Article•10.1145/1128022.1128029•

Dynamic thread assignment on heterogeneous multiprocessor architectures

[...]

Michela Becchi¹, Patrick Crowley¹•Institutions (1)

Washington University in St. Louis¹

3 May 2006

TL;DR: It is argued that the benefits of heterogeneous CMPs are bolstered by the usage of a dynamic assignment policy, i.e., a runtime mechanism which observes the behavior of the running threads and exploits thread migration between the cores.

...read moreread less

Abstract: In a multi-programmed computing environment, threads of execution exhibit different runtime characteristics and hardware resource requirements. Not only do the behaviors of distinct threads differ, but each thread may also present diversity in its performance and resource usage over time. A heterogeneous chip multiprocessor (CMP) architecture consists of processor cores and caches of varying size and complexity. Prior work has shown that heterogeneous CMPs can meet the needs of a multi-programmed computing environment better than a homogeneous CMP system. In fact, the use of a combination of cores with different caches and instruction issue widths better accommodates threads with different computational requirements.A central issue in the design and use of heterogeneous systems is to determine an assignment of tasks to processors which better exploits the hardware resources in order to improve performance. In this paper we argue that the benefits of heterogeneous CMPs are bolstered by the usage of a dynamic assignment policy, i.e., a runtime mechanism which observes the behavior of the running threads and exploits thread migration between the cores. We validate our analysis by means of simulation. Specifically, our model assumes a combination of Alpha EV5 and Alpha EV6 processors and of integer and floating point programs from the SPEC2000 benchmark suite. We show that a dynamic assignment can outperform a static one by 20% to 40% on average and by as much as 80% in extreme cases, depending on the degree of multithreading simulated.

...read moreread less

257 citations

Proceedings Article•10.1145/2482767.2482797•

Experimental evaluation of an adiabiatic quantum system for combinatorial optimization

[...]

Catherine C. McGeoch¹, Cong Wang²•Institutions (2)

Amherst College¹, Simon Fraser University²

14 May 2013

TL;DR: An experimental study of a novel computing system (algorithm plus platform) that carries out quantum annealing, a type of adiabatic quantum computation, to solve optimization problems to solve NP-hard problem domains is described.

...read moreread less

Abstract: This paper describes an experimental study of a novel computing system (algorithm plus platform) that carries out quantum annealing, a type of adiabatic quantum computation, to solve optimization problems. We compare this system to three conventional software solvers, using instances from three NP-hard problem domains. We also describe experiments to learn how performance of the quantum annealing algorithm depends on input.

...read moreread less

241 citations

Proceedings Article•10.1145/2597917.2597929•

Enabling FPGAs in the cloud

[...]

Fei Chen¹, Yi Shan², Yu Zhang¹, Yu Wang², Hubertus Franke, Xiaotao Chang¹, Kun Wang³ - Show less +3 more•Institutions (3)

IBM¹, Tsinghua University², Microsoft³

20 May 2014

TL;DR: A general framework for integrating FPGAs into the cloud is proposed and a prototype of the framework is implemented based on OpenStack, Linux-KVM and Xilinx FPGA, which enables isolation between multiple processes in multiple VMs, precise quantitative acceleration resource allocation, and priority-based workload scheduling.

...read moreread less

Abstract: Cloud computing is becoming a major trend for delivering and accessing infrastructure on demand via the network. Meanwhile, the usage of FPGAs (Field Programmable Gate Arrays) for computation acceleration has made significant inroads into multiple application domains due to their ability to achieve high throughput and predictable latency, while providing programmability, low power consumption and time-to-value. Many types of workloads, e.g. databases, big data analytics, and high performance computing, can be and have been accelerated by FPGAs. As more and more workloads are being deployed in the cloud, it is appropriate to consider how to make FPGAs and their capabilities available in the cloud. However, such integration is non-trivial due to issues related to FPGA resource abstraction and sharing, compatibility with applications and accelerator logics, and security, among others. In this paper, a general framework for integrating FPGAs into the cloud is proposed and a prototype of the framework is implemented based on OpenStack, Linux-KVM and Xilinx FPGAs. The prototype enables isolation between multiple processes in multiple VMs, precise quantitative acceleration resource allocation, and priority-based workload scheduling. Experimental results demonstrate the effectiveness of this prototype, an acceptable overhead, and good scalability when hosting multiple VMs and processes.

...read moreread less

228 citations

...

Expand

Year	Papers
2021	31
2020	39
2019	60
2018	59
2017	63
2016	66

Conference Tools

Papers published on a yearly basis

Papers

Reflections on the memory wall

The potential of the cell processor for scientific computing

Dynamic thread assignment on heterogeneous multiprocessor architectures

Experimental evaluation of an adiabiatic quantum system for combinatorial optimization

Enabling FPGAs in the cloud

Performance Metrics