Bit-serial architecture

Topic Tools

Papers published on a yearly basis

Papers

Proceedings Article•10.1109/ISCA.2018.00040•

Neural cache: bit-serial in-cache acceleration of deep neural networks

[...]

Charles Eckert¹, Xiaowei Wang¹, Jingcheng Wang¹, Arun Subramaniyan¹, Ravi Iyer², Dennis Sylvester¹, David Blaaauw¹, Reetuparna Das¹ - Show less +4 more•Institutions (2)

University of Michigan¹, Intel²

2 Jun 2018

TL;DR: The Neural Cache architecture as mentioned in this paper re-purposes cache structures to transform them into massively parallel compute units capable of running inferences for deep neural networks, which is capable of fully executing convolutional, fully connected, and pooling layers in-cache.

...read moreread less

Abstract: This paper presents the Neural Cache architecture, which re-purposes cache structures to transform them into massively parallel compute units capable of running inferences for Deep Neural Networks. Techniques to do in-situ arithmetic in SRAM arrays, create efficient data mapping and reducing data movement are proposed. The Neural Cache architecture is capable of fully executing convolutional, fully connected, and pooling layers in-cache. The proposed architecture also supports quantization in-cache. Our experimental results show that the proposed architecture can improve inference latency by 18.3X over state-of-art multi-core CPU (Xeon E5), 7.7X over server class GPU (Titan Xp), for Inception v3 model. Neural Cache improves inference throughput by 12.4X over CPU (2.2X over GPU), while reducing power consumption by 50% over CPU (53% over GPU).

...read moreread less

320 citations

Journal Article•10.1364/AO.31.003213•

Bit-serial architecture for optical computing.

[...]

Vincent P. Heuring¹, Harry F. Jordan¹, Jonathan P. Pratt¹•Institutions (1)

University of Colorado Boulder¹

10 Jun 1992-Applied Optics

TL;DR: The design of a complete, stored-program digital optical computer is described, and a fully functional, proof-of-principle prototype can be achieved by using LiNbO(3) directional couplers as logic elements and fiber-optic delay lines as memory elements.

...read moreread less

Abstract: The design of a complete, stored-program digital optical computer is described. A fully functional, proof-of-principle prototype can be achieved by using LiNbO(3) directional couplers as logic elements and fiber-optic delay lines as memory elements. The key design issues are computation in a realm where propagation delays are much greater than logic delays and implementation of circuits without fip-flops. The techniques developed to address these issues yield architectures that do not change as their clocking speed is scaled upward and the size is scaled downward proportionally; these are called speed-scalable architectures. Signal amplitude restoration and resynchronization are accomplished by the novel technique of switching in a fresh copy of the system clock. Device characteristics that are important to the proof-of-principle demonstration are discussed, including the special properties and limitations that are important when designing with them. Design principles are exemplified by the design of an n-bit counter. Following this, the design for a stored-program bit-serial computer is described. We estimate that the described prototype architecture can be operated in the 100-MHz region with off-the-shelf components, and in the O. 1-1-THz region with foreseeable future components.

...read moreread less

60 citations

Proceedings Article•10.1109/FCCM.2007.38•

Integer Factorization Based on Elliptic Curve Method: Towards Better Exploitation of Reconfigurable Hardware

[...]

G. de Meulenaer¹, F. Gosset¹, G.M. de Dormale¹, Jean-Jacques Quisquater¹•Institutions (1)

Université catholique de Louvain¹

23 Apr 2007

TL;DR: This work explores another approach, based on the exploitation of embedded multipliers available in modern FPGAs and the use of high-performances FPGA, which exhibits a 15-fold improvement over throughput/hardware cost ratio of previously published results.

...read moreread less

Abstract: Currently, the best known algorithm for factorizing modulus of the RSA public key cryptosystem is the Number Field Sieve. One of its important phases usually combines a sieving technique and a method for checking smoothness of mid-size numbers. For this factorization, the Elliptic Curve Method (ECM) is an attractive solution. As ECM is highly regular and many parallel computations are required, hardware-based platforms were shown to be more cost-effective than software solutions. The few papers dealing with implementation of ECM on FPGA are all based on bit-serial architectures. They use only general-purpose logic and low-cost FPGAs which appear as the best performance/cost solution. This work explores another approach, based on the exploitation of embedded multipliers available in modern FPGAs and the use of high-performances FPGAs. The proposed architecture - based on a fully parallel and pipelined modular multiplier circuit - exhibits a 15-fold improvement over throughput/hardware cost ratio of previously published results.

...read moreread less

50 citations

Journal Article•10.1109/MDT.2003.1246161•

Design and characterization of null convention self-timed multipliers

[...]

Satish K. Bandapati¹, Scott C. Smith¹, M. Choi¹•Institutions (1)

University of Missouri¹

01 Nov 2003-IEEE Design & Test of Computers

TL;DR: This study serves as a good reference for designers who wish to accomplish high-performance, low-power implementations of clockless digital VLSI circuits.

...read moreread less

Abstract: We present various 4-bit /spl times/ 4-bit unsigned multipliers designed using the delay-insensitive null convention logic (NCL) paradigm. They represent bit-serial, iterative, and fully parallel multiplication architectures. NCL is a self-timed logic paradigm in which control is inherent in each datum. NCL follows the so-called weak conditions of Seitz's delay-insensitive signaling scheme. Like other delay-insensitive logic methods, the NCL paradigm assumes that forks in wires are isochronic. NCL uses symbolic completeness of expression to achieve delay-insensitive behavior. Simulation results show a large variance in circuit performance in terms of power, area, and speed. This study serve as a good reference for designers who wish to accomplish high-performance, low-power implementations of clockless digital VLSI circuits.

...read moreread less

35 citations

Proceedings Article•10.1109/AFRCON.2002.1146859•

Neural network implementation on a FPGA

[...]

Y. Chen¹, W. P. du Plessis¹•Institutions (1)

University of Pretoria¹

2 Oct 2002

TL;DR: This work implemented a feedforward neural network on a FPGA (field programmable gate array) to find the minimum precision required to maintain a recognition rate of at least 95% for two characters within an optical character recognition application.

...read moreread less

Abstract: This work implemented a feedforward neural network on a FPGA (field programmable gate array) A study was conducted to find the minimum precision required to maintain a recognition rate of at least 95% for two characters within an optical character recognition application To reduce the circuit size, the bit serial architecture was realised to perform the arithmetic operation This resulted in an optimal use of the FPGA resources

...read moreread less

33 citations

...

Expand

Year	Papers
2018	1
2015	2
2014	2
2013	2
2012	1
2011	1

Topic Tools

Papers published on a yearly basis

Papers

Neural cache: bit-serial in-cache acceleration of deep neural networks

Bit-serial architecture for optical computing.

Integer Factorization Based on Elliptic Curve Method: Towards Better Exploitation of Reconfigurable Hardware

Design and characterization of null convention self-timed multipliers

Neural network implementation on a FPGA

Related Topics (5)

Performance Metrics