TL;DR: This work presents an open-source catalog of RISC-V cores for use on FPGAs that have been wrapped as drop-in compatible processing elements and can be used either standalone, or integrated into the TaPaSCo SoC composition framework.
Abstract: With the increasing popularity of RISC-V in the academic and industrial world, an ever growing number of open-source implementations of the instruction set have become available. However, it is not an easy task to compare the cores to one another, as they employ different interconnects, build systems and so on. This work presents an open-source catalog of RISC-V cores for use on FPGAs. All of these cores have been wrapped as drop-in compatible processing elements and can be used either standalone, or integrated into the TaPaSCo SoC composition framework. By using TaPaSCo, details of the bitstream generation flow and user-space interfaces are abstracted away, allowing the user to focus on the needs of the concrete applications when exploring the RISC-V landscape. All of the catalog's cores have been synthesized for a number of hardware platforms, and are evaluated against each other using state-of-the-art embedded processor benchmarks such as Dhrystone, Embench and CoreMark. The results show that the cores have a huge degree in performance variability. The slowest cores achieve less than 100MHz on large UltraScale+ devices, while better FPGA-optimized cores run in excess of 500 MHz. Accordingly, the benchmarks show a wide spread of performance ranging from less than 0.5 CoreMark/MHz up to over 2.5 CoreMark/MHz.
TL;DR: A novel processor micro-architecture which is capable of achieving high-performance processors with very low power requirement and can outperform many existing commercial and open-source cores is presented.
Abstract: Design of high-performance processors with very low power requirement is the primary goal of many contemporary and futuristic applications. This brief presents a novel processor micro-architecture which is capable of achieving these requirements. The micro-architecture is based on RISC-V Instruction Set Architecture (ISA). The core is implemented and verified on Xilinx Virtex-7 FPGA board with a resource requirement of 7617 LUTs and 2319 FFs. This core could achieve a Dhrystone benchmark score of 1.71 DMIPS per MHz which is higher than ARM Cortex-M3 (1.50 DMIPS per MHz) and ARM Cortex-M4 (1.52 DMIPS per MHz). The Coremark benchmark is also tested on this core and it gives 4.13 Coremark per MHz. The physical design result of the core using commercial tools shows that it can achieve a maximum frequency of 198.02 MHz with 0.036 mm2 area and $17.36~\mu \text{W}$ /MHz power requirement at UMC 40 nm technology node. The core consumes a dynamic power of $19.75~\mu \text{W}$ /MHz at UMC 90nm which is 36% and 40% better than ARM Cortex-M3 and Cortex-M4 respectively and also lower than many others cores. The results show that this core can outperform many existing commercial and open-source cores.
TL;DR: In this paper, the performance and power consumption of the RISC-V architecture were compared with other architectures using application and kernel benchmarks, including the ARM Cortex-A9 SHREC Space Processor, the ARM-A53 Boeing High Performance Space Computing platform, and the Power e5500 BAE Systems RAD5545 processor.
Abstract: When designing embedded systems, especially for space-computing needs, finding the ideal balance between size, weight, power, and cost (SWaP-C) is a primary goal in the processor selection process. One variable that can have a significant impact on the tradeoffs between performance and power consumption is the processor architecture. Widely adopted architectures such as the ARM Cortex-A series have gained popularity due to their favorable combination of high performance and low power consumption. The RISC-V architecture presents a compelling alternative in part due to its modular instruction set, collaborative development approach, and open-source nature. The recent introduction of a RISC-V processor in the Microchip PolarFire SoC enables performance and power consumption comparisons with competing architectures using application and kernel benchmarks. For application benchmarking, this research employs several image-processing applications, including a histogram equalizer, Sobel filter, and image tiler, to describe real-world device performance. To gain additional insight into a processor’s architectural characteristics, kernel benchmarks that perform common operations in sensor processing such as matrix multiplication and convolution are used. In addition, the CoreMark synthetic benchmark suite is used to help quantify overall performance. This study considers several architectures and space-grade computer facsimiles, including the ARM Cortex-A9 SHREC Space Processor, the ARM Cortex-A53 Boeing High Performance Space Computing platform, and the Power e5500 BAE Systems RAD5545 processor. Both single- and multi-core performance are considered. The PolarFire SoC achieves approximately 3.13 CoreMarks per MHz and 15.63 CoreMarks per milliwatt, demonstrating competitive performance and power consumption characteristics under single-threaded workloads. However, RISC-V presents mixed results in kernel and application benchmarks incorporating multiprocessing, with execution times that are average at best. Additionally, while matrix multiplication and addition yield high parallel efficiencies, matrix convolution and transpose are less efficient. Dynamic energy consumption results for the PolarFire SoC were generally average, but the platform does achieve significant reductions in dynamic energy consumption during increased parallel workloads in some tests. Dynamic energy consumption variability was also very low for the PolarFire SoC during most benchmarks. While the RISC-V architecture does not present ideal benchmark results, it provides a competitive balance between performance and power consumption, with future extensions to the instruction set only further enabling its potential for space applications.
TL;DR: An extension of the RVCoreP soft processor is proposed to support RISC-V M-extension instructions and it is found that RV32IM is 1.87 and 3.13 times better in performance for radix-4 and DSP multiplier, respectively.
Abstract: RISC-V, an open instruction set architecture, is getting the attention of soft processor developers. Implementing only a basic 32-bit integer instruction set of RISC-V, which is defined as RV32I, might be satisfactory for embedded systems. However, multiplication and division instructions are not present in RV32I, rather than defined as M-extension. Several research projects have proposed both RV32I and RV32IM processor. However, there is no indication of how much performance can be improved by adding M-extension to RV32I. In other words, when we should consider adding M-extension into the soft processor and how much hardware resource requirements will increase.
In this paper, we propose an extension of the RVCoreP soft processor (which implements RV32I instruction set only) to support RISC-V M-extension instructions. A simple fork-join method is used to expand the execution capability to support M-extension instructions as well as a possible future enhancement. We then perform the benchmark using Dhrystone, Coremark, and Embench programs. We found that RV32IM is 1.87 and 3.13 times better in performance for radix-4 and DSP multiplier, respectively. In addition to that, our RV32IM implementation is 13\% better than the equivalent RISC-V processor.
TL;DR: Stdcbench as discussed by the authors is a C benchmark for small systems that tries to give a more balanced reflection of performance than Whetstone, Dhillon and Coremark, and it is intended to be usable for a wide range of C implementations.
Abstract: Benchmark programs are useful for measuring performance. Benchmarks written in C effectively benchmark the performance of a C implementation consisting of hardware, compiler and standard library. For small systems (i.e. systems with just a few KB of memory) three well-known and widely used benchmarks are Whetstone, Dhrystone and Coremark. However, all three have their shortcomings. Whetstone scores depend heavily on the performance of floating-point functions from the standard library. Dhrystone scores depend heavily on the performance of just a few string processing functions from the standard library. Coremark intentionally avoids using the standard library and the scores heavily depend on the performance of matrix multiplications. All three thus highly depend on a single aspect of the C implementation each, so that optimizations targeting that aspect have a huge effect on scores.stdcbench is a benchmark for small systems that tries to give a more balanced reflection of performance. It is intended to be usable for a wide range of C implementations for small systems. We present the design of stdcbench, and discuss a few benchmark results also in comparison to Dhrystone and Coremark.