TL;DR: This work introduces multithreaded programming model for reconfigurable computing based on a unified virtual-memory image for both software and hardware application parts and addresses the challenge of achieving seamless hardware-software interfacing and portability with minimal performance penalties.
Abstract: Ideally, reconfigurable-system programmers and designers should code algorithms and write hardware accelerators independently of the underlying platform. To realize this scenario, the authors propose a portable, hardware-agnostic programming paradigm, which delegates platform-specific tasks to a system-level virtualization layer. This layer supports a chosen programming model and hides platform details from users much as general-purpose computers do. We introduce multithreaded programming model for reconfigurable computing based on a unified virtual-memory image for both software and hardware application parts. We also address the challenge of achieving seamless hardware-software interfacing and portability with minimal performance penalties.
TL;DR: In this paper, a system, method and article of manufacture for processing instructions of an embedded application is described, where a microprocessor is emulated in reconfigurable logic, and control functions are also implemented in reconfigureable logic.
Abstract: A system, method and article of manufacture are provided for processing instructions of an embedded application. A microprocessor is emulated in reconfigurable logic. Control functions are also implemented in reconfigurable logic. The emulated microprocessor processes the instructions. Each instruction is processed in a minimum number of clock cycles required for accessing an external instruction and data memory.
TL;DR: A high-speed constant-time hardware implementation of NTRUEncrypt Short Vector Encryption Scheme (SVES), fully compliant with the IEEE 1363.1 Standard Specification for Public Key Cryptographic Techniques Based on Hard Problems over Lattices is presented.
Abstract: In this paper, we present a high-speed constant-time hardware implementation of NTRUEncrypt Short Vector Encryption Scheme (SVES), fully compliant with the IEEE 1363.1 Standard Specification for Public Key Cryptographic Techniques Based on Hard Problems over Lattices. Our implementation follows an earlier proposed Post-Quantum Cryptography (PQC) Hardware Application Programming Interface (API), which facilitates its fair comparison with implementations of other PQC schemes. The paper contains the detailed flow and block diagrams, timing analysis, as well as results in terms of latency (in clock cycles), maximum clock frequency, and resource utilization in modern high-performance Field Programmable Gate Arrays (FPGAs). Our design takes full advantage of the ability to parallelize the major operation of NTRU, polynomial multiplication, in hardware. As a result, the execution time bottleneck shifts to the hash function, SHA-256, which is sequential in nature and as a result cannot be easily sped up in hardware. The obtained FPGA results for NTRU Encrypt SVES are compared with the equivalent results for Classic McEliece, a competing, well-established Post-Quantum Cryptography encryption scheme, with a long history of unsuccessful attempts at breaking. Our code for NTRUEncrypt SVES is being made open-source to speed-up further design-space exploration and benchmarking on multiple hardware platforms.
TL;DR: A new mechanism which eliminates the trust dependence on third party processes is proposed, consisting of a self-provisioning stage, where keys are generated internal to the FPGA and never exposed externally, coupled with a secure update mechanism which allows updates to be governed by a policy defined by the secure hardware application.
Abstract: Modern CPU designs are beginning to incorporate secure hardware features, but leave developers with little control over both the set of features and when and whether updates are available. Reconfigurable logic (e.g., FPGAs) has been proposed as an alternative as it is both hardware, so can have similar capabilities at a reasonable performance degradation, and programmable, allowing customization of the secure hardware. This programmability, however, opens new attack vectors that allow an adversary to re-program the FPGA. Past attempts to solve this rely on a party maintaining a shared key with the FPGA, but these business processes to keep that key secret have been shown to be quite vulnerable. In this paper, we propose a new mechanism which eliminates the trust dependence on third party processes. This new mechanism consists of a self-provisioning stage, where keys are generated internal to the FPGA and never exposed externally, coupled with a secure update mechanism which allows updates to be governed by a policy defined by the secure hardware application. To demonstrate, we fully implemented these mechanisms on a Xilinx Zynq UltraScale+ FPGA along with an example secure co-processor with remote attestation with a flexible root of trust (in contrast to Intel SGX which fixes the root of trust to be Intel). Our performance evaluation of two applications, a password manager and a contact matching application, illustrates using FPGAs is practical.
TL;DR: A universal hardware Application Programming Interface (API) for authenticated ciphers is proposed, intended to meet the requirements of all algorithms submitted to the CAESAR competition, and composed of the specification of an interface of the authenticated cipher core, and the communication protocol describing the exact format of all inputs and outputs.
Abstract: In this paper, we propose a universal hardware Application Programming Interface (API) for authenticated ciphers. In particular, our API is intended to meet the requirements of all algorithms submitted to the CAESAR competition. Two major parts of the API, the interface and the communication protocol, were developed with the goal of reducing any potential biases in benchmarking of authenticated ciphers in hardware. Our high-speed implementation of the proposed hardware API includes universal, open-source pre-processing and post-processing units, common for all CAESAR candidates and the current standards, such as AES-GCM and AES-CCM. Apart from the full documentation, examples, and the source code of the pre-processing and post-processing units, we have made available in public domain a) a universal testbench to verify the functionality of any CAESAR candidate implemented using our hardware API, b) a Python script used to automatically generate test vectors for this testbench, c) VHDL wrappers used to determine the maximum clock frequency and the resource utilization of all implementations, and d) RTL VHDL source codes of high-speed implementations of AES and the Keccak Permutation F, which may be used as building blocks in implementations of related ciphers. We hope that the existence of these resources will substantially reduce the time necessary to develop hardware implementations of all CAESAR candidates for the purpose of evaluation, comparison, and future deployment in real products. 1 Motivation The CAESAR competition [1], launched in 2014, aims at identifying a portfolio of future authenticated ciphers with security, performance, and flexibility exceeding that of the current standards, such as AES-GCM [2] and AES-CCM [3]. Although security is commonly accepted to be the most important criterion in all cryptographic contests, it is rarely by itself sufficient to determine a winner. This is because multiple candidates generally offer adequate security, and a tradeoff between security and performance must be investigated. The focus of this paper is to facilitate the comparison of modern authenticated ciphers in terms of their performance and cost in hardware, and in particular in FPGAs, All Programmable Systems on Chip, and ASICs. As a starting point for such a comparison we propose defining hardware API, composed of the specification of an interface of the authenticated cipher core, and the communication protocol describing the exact format of all inputs and outputs, as well as the timing dependencies among all data and control signals passing through the specified interface. Similarly to the case of previous contests, software implementations of the CAESAR candidates are being compared using a uniform API, clearly defined in the call for submissions [1]. So far, no similar hardware API has been proposed, not to mention accepted by the cryptographic community. As a result any attempt at the comparison of existing hardware implementations is highly dependent on specific assumptions about the hardware API, made independently by various hardware designers. These assumptions can have potentially a very high influence on all major performance measures of the developed implementations. Additionally, a hardware API is typically much more difficult to modify than a software API, making any last minute standardization efforts and code adjustments highly inefficient and questionable. Therefore, there is a clear need for a proposal regarding a uniform hardware API, which could be further modified and improved using feedback from the cryptographic community, and eventually endorsed by the CAESAR Committee, and adopted by majority of future hardware developers. Our goal is to address this issue by providing the exact specification of the proposed interface, as well as multiple supporting materials, such as open-source codes of pre-processing and post-processing units, a universal testbench, and uniform ways of generating optimized results. 2 Proposed Features The proposed features of our hardware API are as follows: – inputs of arbitrary size in bytes (but a multiple of a byte only) – size of the entire message/ciphertext does not need to be known before the encryption/decryption starts (unless required by the algorithm itself) – wide range of data port widths, 8 ≤ w ≤ 256 – independent data and key inputs – simple high-level communication protocol – support for the burst mode – possible overlap among processing the current input block, reading the next input block, and storing the previous output block – storing decrypted messages internally, until the result of authentication is known – support for encryption and decryption within the same core – ability to communicate with very simple, passive devices, such as FIFOs – ease of extension to support existing communication interfaces and protocols, such as AMBA-AXI4 – a de-facto standard for the System-on-Chip (SoC) buses [4], and PCI Express – high-bandwidth serial communication between PCs and hardware accelerator boards [5].