Manycore processor

Topic Tools

Papers published on a yearly basis

Papers

Journal Article•10.1109/MM.2018.112130359•

Loihi: A Neuromorphic Manycore Processor with On-Chip Learning

[...]

Michael Davies¹, Narayan Srinivasa, Tsung-Han Lin¹, Gautham N. Chinya¹, Cao Yongqiang¹, Sri Harsha Choday¹, Georgios D. Dimou, Prasad Joshi¹, Nabil Imam¹, Shweta Jain¹, Yuyun Liao¹, Chit-Kwan Lin¹, Andrew Lines¹, Ruokun Liu¹, Deepak A. Mathaikutty¹, Steven McCoy¹, Arnab Paul¹, Jonathan Tse¹, Guruguhanathan Venkataramanan¹, Yi-Hsin Weng¹, Andreas Wild¹, Yoon Seok Yang¹, Hong Wang¹ - Show less +19 more•Institutions (1)

Intel¹

16 Jan 2018-IEEE Micro

TL;DR: Loihi is a 60-mm2 chip fabricated in Intels 14-nm process that advances the state-of-the-art modeling of spiking neural networks in silicon, and can solve LASSO optimization problems with over three orders of magnitude superior energy-delay-product compared to conventional solvers running on a CPU iso-process/voltage/area.

...read moreread less

Abstract: Loihi is a 60-mm2 chip fabricated in Intels 14-nm process that advances the state-of-the-art modeling of spiking neural networks in silicon. It integrates a wide range of novel features for the field, such as hierarchical connectivity, dendritic compartments, synaptic delays, and, most importantly, programmable synaptic learning rules. Running a spiking convolutional form of the Locally Competitive Algorithm, Loihi can solve LASSO optimization problems with over three orders of magnitude superior energy-delay-product compared to conventional solvers running on a CPU iso-process/voltage/area. This provides an unambiguous example of spike-based computation, outperforming all known conventional solutions.

...read moreread less

3,591 citations

Proceedings Article•10.1145/1669112.1669172•

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

[...]

Sheng Li¹, Jung Ho Ahn², Richard Strong³, Jay B. Brockman¹, Dean M. Tullsen³, Norman P. Jouppi⁴ - Show less +2 more•Institutions (4)

University of Notre Dame¹, Seoul National University², University of California, San Diego³, Hewlett-Packard⁴

12 Dec 2009

TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taking into account configuring clusters with 4 cores gives thebest EDA2P and EDAP.

...read moreread less

Abstract: This paper introduces McPAT, an integrated power, area, and timing modeling framework that supports comprehensive design space exploration for multicore and manycore processor configurations ranging from 90nm to 22nm and beyond. At the microarchitectural level, McPAT includes models for the fundamental components of a chip multiprocessor, including in-order and out-of-order processor cores, networks-on-chip, shared caches, integrated memory controllers, and multiple-domain clocking. At the circuit and technology levels, McPAT supports critical-path timing modeling, area modeling, and dynamic, short-circuit, and leakage power modeling for each of the device types forecast in the ITRS roadmap including bulk CMOS, SOI, and double-gate transistors. McPAT has a flexible XML interface to facilitate its use with many performance simulators. Combined with a performance simulator, McPAT enables architects to consistently quantify the cost of new ideas and assess tradeoffs of different architectures using new metrics like energy-delay-area2 product (EDA2P) and energy-delay-area product (EDAP). This paper explores the interconnect options of future manycore processors by varying the degree of clustering over generations of process technologies. Clustering will bring interesting tradeoffs between area and performance because the interconnects needed to group cores into clusters incur area overhead, but many applications can make good use of them due to synergies of cache sharing. Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taken into account configuring clusters with 4 cores gives the best EDA2P and EDAP.

...read moreread less

2,657 citations

Journal Article•10.1145/2445572.2445577•

The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing

[...]

Sheng Li¹, Jung Ho Ahn², Richard Strong³, Jay B. Brockman⁴, Dean M. Tullsen³, Norman P. Jouppi¹ - Show less +2 more•Institutions (4)

Hewlett-Packard¹, Seoul National University², University of California, San Diego³, University of Notre Dame⁴

01 Apr 2013-ACM Transactions on Architecture and Code Optimization

TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks for manycore designs at the 22nm technology shows that 8-core clustering gives the best energy-delay product, whereas when die area is taken into account, 4-core clusters give the best EDA2P and EDAP.

...read moreread less

Abstract: This article introduces McPAT, an integrated power, area, and timing modeling framework that supports comprehensive design space exploration for multicore and manycore processor configurations ranging from 90nm to 22nm and beyond. At microarchitectural level, McPAT includes models for the fundamental components of a complete chip multiprocessor, including in-order and out-of-order processor cores, networks-on-chip, shared caches, and integrated system components such as memory controllers and Ethernet controllers. At circuit level, McPAT supports detailed modeling of critical-path timing, area, and power. At technology level, McPAT models timing, area, and power for the device types forecast in the ITRS roadmap. McPAT has a flexible XML interface to facilitate its use with many performance simulators.Combined with a performance simulator, McPAT enables architects to accurately quantify the cost of new ideas and assess trade-offs of different architectures using new metrics such as Energy-Delay-Area2 Product (EDA2P) and Energy-Delay-Area Product (EDAP). This article explores the interconnect options of future manycore processors by varying the degree of clustering over generations of process technologies. Clustering will bring interesting trade-offs between area and performance because the interconnects needed to group cores into clusters incur area overhead, but many applications can make good use of them due to synergies from cache sharing. Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks for manycore designs at the 22nm technology shows that 8-core clustering gives the best energy-delay product, whereas when die area is taken into account, 4-core clustering gives the best EDA2P and EDAP.

...read moreread less

233 citations

Proceedings Article•10.1145/2872362.2872414•

OpenPiton: An Open Source Manycore Research Framework

[...]

Jonathan Balkind¹, Michael McKeown¹, Yaosheng Fu¹, Tri Nguyen¹, Yanqi Zhou¹, Alexey Lavrov¹, Mohammad Shahrad¹, Adi Fuchs¹, Samuel Payne², Xiaohua Liang¹, Matthew Matl¹, David Wentzlaff¹ - Show less +8 more•Institutions (2)

Princeton University¹, Nvidia²

25 Mar 2016

TL;DR: OpenPiton is the world's first open source, general-purpose, multithreaded manycore processor and framework that leverages the industry hardened OpenSPARC T1 core with modifications and builds upon it with a scratch-built, scalable uncore creating a flexible, modern manycore design.

...read moreread less

Abstract: Industry is building larger, more complex, manycore processors on the back of strong institutional knowledge, but academic projects face difficulties in replicating that scale. To alleviate these difficulties and to develop and share knowledge, the community needs open architecture frameworks for simulation, synthesis, and software exploration which support extensibility, scalability, and configurability, alongside an established base of verification tools and supported software. In this paper we present OpenPiton, an open source framework for building scalable architecture research prototypes from 1 core to 500 million cores. OpenPiton is the world's first open source, general-purpose, multithreaded manycore processor and framework. OpenPiton leverages the industry hardened OpenSPARC T1 core with modifications and builds upon it with a scratch-built, scalable uncore creating a flexible, modern manycore design. In addition, OpenPiton provides synthesis and backend scripts for ASIC and FPGA to enable other researchers to bring their designs to implementation. OpenPiton provides a complete verification infrastructure of over 8000 tests, is supported by mature software tools, runs full-stack multiuser Debian Linux, and is written in industry standard Verilog. Multiple implementations of OpenPiton have been created including a taped-out 25-core implementation in IBM's 32nm process and multiple Xilinx FPGA prototypes.

...read moreread less

209 citations

Proceedings Article•10.1109/HPEC.2013.6670342•

A clustered manycore processor architecture for embedded and accelerated applications

[...]

Benoît Dupont de Dinechin, Renaud Ayrignac, Pierre-Edouard Beaucamps, Patrice Couvert, Benoit Ganne, Pierre Guironnet de Massas, Francois Jacquet, Samuel Jones, Nicolas Morey Chaisemartin, Frederic Riss, Thierry Strudel - Show less +7 more

1 Sep 2013

TL;DR: This work demonstrates that the MPPA-256 processor clustered manycore architecture is effective on two different classes of applications: embedded computing, with the implementation of a professional H.264 video encoder that runs in real-time at low power; and high-performance computing,with the acceleration of a financial option pricing application.

...read moreread less

Abstract: The Kalray MPPA-256 processor integrates 256 user cores and 32 system cores on a chip with 28nm CMOS technology. Each core implements a 32-bit 5-issue VLIW architecture. These cores are distributed across 16 compute clusters of 16+1 cores, and 4 quad-core I/O subsystems. Each compute cluster and I/O subsystem owns a private address space, while communication and synchronization between them is ensured by data and control Networks-On-Chip (NoC). The MPPA-256 processor is also fitted with a variety of I/O controllers, in particular DDR, PCI, Ethernet, Interlaken and GPIO. We demonstrate that the MPPA-256 processor clustered manycore architecture is effective on two different classes of applications: embedded computing, with the implementation of a professional H.264 video encoder that runs in real-time at low power; and high-performance computing, with the acceleration of a financial option pricing application. In the first case, a cyclostatic dataflow programming environment is utilized, that automates application distribution over the execution resources. In the second case, an explicit parallel programming model based on POSIX processes, threads, and NoC-specific IPC is used.

...read moreread less

202 citations

...

Expand

Topic Tools

Papers published on a yearly basis

Papers

Loihi: A Neuromorphic Manycore Processor with On-Chip Learning

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing

OpenPiton: An Open Source Manycore Research Framework

A clustered manycore processor architecture for embedded and accelerated applications

Related Topics (5)

Performance Metrics

No. of papers in the topic in previous years
Year	Papers
2021	6
2020	5
2019	13
2018	10
2017	20
2016	15