TL;DR: A new FPGA architecture (reconfigurable datapath architecture, rDPA) for word-oriented datapaths is presented, which has been developed to support a variety of Xputer architectures.
Abstract: A new FPGA architecture (reconfigurable datapath architecture, rDPA) for word-oriented datapaths is presented, which has been developed to support a variety of Xputer architectures. In contrast to von Neumann machines an Xputer architecture strongly supports the concept of the “soft ALU” (reconfigurable ALU). Fine grained parallelism is achieved by using simple reconfigurable processing elements which are called datapath units (DPUs). The word-oriented datapath simplifies the mapping of applications onto the architecture. Pipelining is supported by the architecture. It is extendable to almost arbitrarily large arrays and is in-system dynamically reconfigurable. The programming environment allows automatic mapping of the operators from high level descriptions. The corresponding scheduling techniques for I/O operations are explained. The rDPA can be used as a reconfigurable ALU for bus-oriented host based systems as well as for rapid prototyping of high speed datapaths.
TL;DR: CoDe-X performs profiling-driven host/accelerator partitioning at the first level, for performance optimization, and resource-driven sequential/structural partitioning of the accelerator source code at the second level, to optimize the utilization of its reconfigurable resources.
Abstract: Presents the parallelizing compilation environment CoDe-X (Co-Design for Xputers) for the simultaneous programming of transputer (Xputer) based accelerators and their hosts. This paper introduces its hardware/software co-design strategies at two levels of partitioning. CoDe-X performs profiling-driven host/accelerator partitioning at the first level, for performance optimization, and resource-driven sequential/structural partitioning of the accelerator source code at the second level, to optimize the utilization of its reconfigurable resources. To stress the significance of this application development methodology, the paper first gives an introduction to the underlying hardware platform.
TL;DR: An operating system (OS) for custom computing machines (CCMs) based on the Xputer paradigm that raises programming and using CCMs to levels close to modem OSes for sequential von Neumann processors.
Abstract: The paper presents an operating system (OS) for custom computing machines (CCMs) based on the Xputer paradigm. Custom computing tries to combine traditional computing with programmable hardware, attempting to gain from the benefits of both adaptive software and optimized hardware. The OS running as an extension to the actual host OS allows a greater flexibility in deciding what parts of the application should run on the configurable hardware with structural code and what on the host-hardware with conventional software. This decision can be taken late - at run-time - and dynamically, in contrast to early partitioning and deciding at compile-time as used currently on CCMs. Thus the CCM can be used concurrently by multiple users or applications without knowledge of each other. This raises programming and using CCMs to levels close to modem OSes for sequential von Neumann processors.
TL;DR: The MoM-3 is introduced as a reconfigurable accelerator for high perform-ance computing at a moderate price by using a new machine paradigm to trigger the operations in the MoC-3, which is especially suited to scienti⬁c algorithms, where the hardware structure can be conflgured to match the structure of the algorithm.
Abstract: This paper introduces the MoM-3 as a reconfigurable accelerator for high perform-ance computing at a moderate price. By using a new machine paradigm to trigger theoperations in the MoM-3, this accelerator is especially suited to scientific algorithms,where the hardware structure can be configured to match the structure of the algorithm.The MoM-3 efficiently uses reconfigurable logic devices to provide a fine-grain parallel-ism, and multiple address generators to have the complete memory bandwidth free fordata transfers (instead of fetching address computing instructions).Speed-up factors up to 82, compared to state-of-the-art workstations, are demon-strated by means of an Ising spin system simulation example. Adding the MoM-3 as anaccelerator allows to achieve supercomputer performance from a low-cost workstation. 1. Introduction Scientific computing provides the greatest challenges to modern workstations and even supercom-puters. A lot of different computer architectures have been presented, which take into account char-acteristics, that are common to many scientific algorithms. Vector processors [4] speed upoperations on large arrays of data by the use of pipelining techniques. Parallel multiprocessor archi-tectures [15] benefit from the fact, that many operations on large amounts of data are independentfrom each other. This allows to distribute these operations onto different processors (or processingelements) and execute them in parallel. But all of these architectures basically still follow thevon Neumann machine paradigm with a fixed instruction set, where the sequence of instructionstriggers the accesses to data in memory and the data manipulations.The Map-oriented Machine 3 (MoM-3) is an architecture based on the Xputer machine paradigm[3]. Instead of a hardwired ALU with a fixed instruction set, an Xputer has a reconfigurable ALUbased on field-programmable devices. All data manipulations, which are performed in the loopbodies of an algorithm, are combined to a set of compound operators. Each compound operatormatches a single loop body and takes several data words as input to produce a number of resultingdata words. The compound operators are configured into the field-programmable devices. Afterconfiguration, an Xputer’s “instruction set” consists only of the compound operators as they arerequired by the algorithm actually running on the Xputer. The combination of several operations ofa high level language description to one compound operator allows to introduce pipelining and finegrain parallelism to a larger extend, as can be done in fixed instruction set processors. E.g. interme-diate results can be passed along in the pipeline, instead of writing them back to the register fileafter every instruction. Since many scientific algorithms compute array indices in several nestedloops, the sequence of data addresses in a program trace shows a regular pattern. This leads to theidea to have complex address generators compute such address sequences from a small parameterset, which describes the address pattern. And instead of an instruction sequencer as a centralizedcontrol to trigger the operations in the reconfigurable ALU, the address generators themselves serveas a decentralized control. They automatically activate the appropriate compound operator, eachtime a new set of input data is fetched from memory and the previous results have been writtenback. This so-called data sequencing mechanism directly matches the loop structure of the algo-
TL;DR: This paper presents Sphinx: a High Level Synthesis System for DSP ASIC N.X. Ramakrishna, M.M. Potknonjak, J.K. Parhi, and the MARS High-Level DSP Synthesis system, which aims to explore the Algorithmic Design Space using high level Synthesis.
Abstract: Foreword. Preface. 1. Sphinx: a High Level Synthesis System for DSP ASIC N. Ramakrishna, M.A. Bayoumi. 2. Synthesizing Optimal Application-Specific DSP Architectures C.H. Gebotys. 3. Synthesis of Multiple Bus Architectures for DSP Applications B.S. Haroun, M.I. Elmasry. 4. Exploring the Algorithmic Design Space using High Level Synthesis M. Potknonjak, J. Rabaey. 5. The MARS High-Level DSP Synthesis System C.-Y. Wang, K.K. Parhi. 6. High Performance Architecture Synthesis System P. Duncan, S. Swamy, S. Sprouse, D. Potasz, R. Jain. 7. Modeling Data Flow and Control Flow for DSP System Synthesis M.F.X.B. Swaaij, F.H.M. Franssen, F.V.M. Catthoor, H.J. DeMan. 8. Automatic Synthesis of Vision Automata B. Zavidovique, C. Fortunel, G. Quenot, A. Safir, J. Serot, F. Verdier. 9. Architectures and Building Blocks for Data Stream DSP Processors G.A. Jullien. 10. A General Purposer Xputer Architecture derived from DSP and Image Processing A. Ast, R.W. Hartenstein, H. Reinig, K. Schmidt, M. Weber. Index.