Topic

Instructions per second

About: Instructions per second is a research topic. Over the lifetime, 207 publications have been published within this topic receiving 4739 citations. The topic is also known as: IPS.

...read moreread less

Topic Tools

Find unexplored research gaps

Generate a literature review

Explore related concepts

Papers published on a yearly basis

Papers

Proceedings Article•10.1109/ISCA.2018.00014•

Firesim: FPGA-accelerated cycle-exact scale-out system simulation in the public cloud

[...]

Sagar Karandikar¹, Howard Mao¹, Donggyu Kim¹, David Biancolin¹, Alon Amid¹, Dayeol Lee¹, Nathan Pemberton¹, Emmanuel Amaro¹, Colin Schmidt¹, Aditya Chopra¹, Qijing Huang¹, Kyle Kovacs¹, Borivoje Nikolic¹, Randy H. Katz¹, Jonathan Bachrach¹, Krste Asanovic¹ - Show less +12 more•Institutions (1)

University of California¹

2 Jun 2018

TL;DR: FireSim is presented, an open-source simulation platform that enables cycle-exact microarchitectural simulation of large scale-out clusters by combining FPGA-accelerated simulation of silicon-proven RTL designs with a scalable, distributed network simulation.

...read moreread less

Abstract: We present FireSim, an open-source simulation platform that enables cycle-exact microarchitectural simulation of large scale-out clusters by combining FPGA-accelerated simulation of silicon-proven RTL designs with a scalable, distributed network simulation. Unlike prior FPGA-accelerated simulation tools, FireSim runs on Amazon EC2 F1, a public cloud FPGA platform, which greatly improves usability, provides elasticity, and lowers the cost of large-scale FPGA-based experiments. We describe the design and implementation of FireSim and show how it can provide sufficient performance to run modern applications at scale, to enable true hardware-software co-design. As an example, we demonstrate automatically generating and deploying a target cluster of 1,024 3.2 GHz quad-core server nodes, each with 16 GB of DRAM, interconnected by a 200 Gbit/s network with 2 microsecond latency, which simulates at a 3.4 MHz processor clock rate (less than 1,000x slowdown over real-time). In aggregate, this FireSim instantiation simulates 4,096 cores and 16 TB of memory, runs ∼14 billion instructions per second, and harnesses 12.8 million dollars worth of FPGAs---at a total cost of only ∼$100 per simulation hour to the user. We present several examples to show how FireSim can be used to explore various research directions in warehouse-scale machine design, including modeling networks with high-bandwidth and low-latency, integrating arbitrary RTL designs for a variety of commodity and specialized datacenter nodes, and modeling a variety of datacenter organizations, as well as reusing the scale-out FireSim infrastructure to enable fast, massively parallel cycle-exact single-node microarchitectural experimentation.

...read moreread less

232 citations

Journal Article•10.1109/JSSC.2011.2170635•

An 8-Bit, 40-Instructions-Per-Second Organic Microprocessor on Plastic Foil

[...]

Kris Myny¹, E. van Veenendaal, Gerwin H. Gelinck, Jan Genoe¹, Wim Dehaene¹, Paul Heremans¹ - Show less +2 more•Institutions (1)

Katholieke Universiteit Leuven¹

01 Jan 2012-IEEE Journal of Solid-state Circuits

TL;DR: An 8-bit microprocessor made from plastic electronic technology directly on flexible plastic foil that can execute user-defined programs and is attractive features for integration on everyday objects where it could be programmed as a calculator, timer, or game controller.

...read moreread less

Abstract: Forty years after the first silicon microprocessors, we demonstrate an 8-bit microprocessor made from plastic electronic technology directly on flexible plastic foil. The operation speed is today limited to 40 instructions per second. The power consumption is as low as 100 μW. The ALU-foil operates at a supply voltage of 10 V and back-gate voltage of 50 V. The microprocessor can execute user-defined programs: we demonstrate the execution of the multiplication of two 4-bit numbers and the calculation of the moving average of a string of incoming 6-bit numbers. To execute such dedicated tasks on the microprocessor, we create small plastic circuits that generate the sequences of appropriate instructions. The near transparency, mechanical flexibility, and low power consumption of the processor are attractive features for integration on everyday objects, where it could be programmed as, amongst other items, a calculator, timer, or game controller.

...read moreread less

203 citations

Proceedings Article•10.1145/1362622.1362688•

Bounding energy consumption in large-scale MPI programs

[...]

Barry Rountree¹, David K. Lowenthal¹, Shelby Funk¹, Vincent W. Freeh², Bronis R. de Supinski³, Martin Schulz³ - Show less +2 more•Institutions (3)

University of Georgia¹, North Carolina State University², Lawrence Livermore National Laboratory³

10 Nov 2007

TL;DR: A system that determines a bound on the energy savings for an application is developed that applies to three scientific programs, two of which exhibit load imbalance---particle simulation and UMT2K.

...read moreread less

Abstract: Power is now a first-order design constraint in large-scale parallel computing. Used carefully, dynamic voltage scaling can execute parts of a program at a slower CPU speed to achieve energy savings with a relatively small (possibly zero) time delay. However, the problem of when to change frequencies in order to optimize energy savings is NP-complete, which has led to many heuristic energy-saving algorithms. To determine how closely these algorithms approach optimal savings, we developed a system that determines a bound on the energy savings for an application. Our system uses a linear programming solver that takes as inputs the application communication trace and the cluster power characteristics and then outputs a schedule that realizes this bound. We apply our system to three scientific programs, two of which exhibit load imbalance---particle simulation and UMT2K. Results from our bounding technique show particle simulation is more amenable to energy savings than UMT2K.

...read moreread less

185 citations

Book Chapter•10.1007/3-540-36574-5_10•

An Overview of Cache Optimization Techniques and Cache-Aware Numerical Algorithms

[...]

Markus Kowarschik¹, Christian Weiß²•Institutions (2)

University of Erlangen-Nuremberg¹, Technische Universität München²

01 Jan 2003-Lecture Notes in Computer Science

TL;DR: In this article, the authors focus on optimization techniques for enhancing cache performance by hiding both the low main memory bandwidth and the latency of main memory accesses which is slow in contrast to the floating-point performance of the CPUs.

...read moreread less

Abstract: In order to mitigate the impact of the growing gap between CPU speed and main memory performance, today’s computer architectures implement hierarchical memory structures. The idea behind this approach is to hide both the low main memory bandwidth and the latency of main memory accesses which is slow in contrast to the floating-point performance of the CPUs. Usually, there is a small and expensive high speed memory sitting on top of the hierarchy which is usually integrated within the processor chip to provide data with low latency and high bandwidth; i.e., the CPU registers. Moving further away from the CPU, the layers of memory successively become larger and slower. The memory components which are located between the processor core and main memory are called cache memories or caches. They are intended to contain copies of main memory blocks to speed up accesses to frequently needed data [378], [392]. The next lower level of the memory hierarchy is the main memory which is large but also comparatively slow. While external memory such as hard disk drives or remote memory components in a distributed computing environment represent the lower end of any common hierarchical memory design, this paper focuses on optimization techniques for enhancing cache performance.

...read moreread less

180 citations

Proceedings Article•10.5555/800263.809185•

The Yorktown Simulation Engine: Introduction

[...]

Gregory Francis Pfister¹•Institutions (1)

IBM¹

1 Jan 1982

TL;DR: The Yorktown Simulation Engine is a special-purpose, highly-parallel programmable machine for the gate-level simulation of logic that can simulate up to one million gates at a speed of over two billion gate simulations per second.

...read moreread less

Abstract: The Yorktown Simulation Engine (YSE) is a special-purpose, highly-parallel programmable machine for the gate-level simulation of logic. It can simulate up to one million gates at a speed of over two billion gate simulations per second; it is estimated that the IBM 3081 processor could have been simulated on the YSE at a rate of 1000 instructions per second. This is far beyond the capabilities of existing register-level software simulators. The YSE has been designed and is being constructed at the IBM T. J. Watson Research Center. This paper introduces the YSE and describes its top-level architecture.

...read moreread less

175 citations

...

Expand

Performance Metrics

207

Papers

2,448

Citations

No. of papers in the topic in previous years
Year	Papers
2022	1
2021	1
2020	5
2019	6
2018	9
2017	10

Instructions per second

Topic Tools

Papers published on a yearly basis

Papers

Firesim: FPGA-accelerated cycle-exact scale-out system simulation in the public cloud

An 8-Bit, 40-Instructions-Per-Second Organic Microprocessor on Plastic Foil

Bounding energy consumption in large-scale MPI programs

An Overview of Cache Optimization Techniques and Cache-Aware Numerical Algorithms

The Yorktown Simulation Engine: Introduction

Related Topics (5)

Performance Metrics