TL;DR: The MAJC architecture enhances application performance by exploiting parallelism at multiple levels-instruction, data, thread, and process and treats all data types similarly.
Abstract: The MAJC architecture enhances application performance by exploiting parallelism at multiple levels-instruction, data, thread, and process. Supporting vertical multithreading, speculative multithreading, and chip multiprocessors, the scalable VLIW architecture is also capable of advanced speculation and predication and treats all data types similarly.
TL;DR: Hardware support for TLS is investigated, a technique which empowers the compiler to optimistically create parallel threads despite uncertainty as to whether those threads are actually independent, confirming TLS as a promising way to exploit the naturally-multithreaded processing resources of future computer systems.
Abstract: Novel architectures that support multithreading, for example chip multiprocessors, have become increasingly commonplace over the past decade: examples include the Sun MAJC, IBM Power4, Alpha 21464, and Intel Xeon, HP PA-8800. However, only workloads composed of independent threads can take advantage of these processors—to improve the performance of a single application, that application must be transformed into a parallel version. Unfortunately, the process of parallelization is extremely difficult: the compiler must prove that potential threads are independent, which is not possible for many general-purpose programs (e.g., spreadsheets, web software, graphics codes, etc.) due to their abundant use of pointers, complex control flow, and complex data structures. This dissertation investigates hardware support for Thread-Level Speculation (TLS), a technique which empowers the compiler to optimistically create parallel threads despite uncertainty as to whether those threads are actually independent.
The basic idea behind the approach to thread-level speculation investigated in this dissertation is as follows. First, the compiler uses its global knowledge of control flow to decide how to break a program into speculative threads as well as transform and optimize the code for speculative execution; new architected instructions serve as the interface between software and hardware to manage this new form of parallel processing. Hardware support performs the run-time tasks of tracking data dependences between speculative threads, buffering speculative state from the regular memory system, and recovering from failed speculation. The hardware support for TLS presented in this dissertation is unique because it scales seamlessly both within and beyond chip boundaries—allowing this single unified design to apply to a wide variety of multithreaded processors and larger systems that use those processors as building blocks. Overall, this cooperative and unified approach has many advantages over previous approaches that focus on a specific scale of underlying architecture, or use either software or hardware in isolation.
This dissertation: (i) defines the roles of compiler and hardware support for TLS, as well as the interface between them; (ii) presents the design and evaluation of a unified mechanism for supporting thread-level speculation which can handle arbitrary memory access patterns and which is appropriate for any scale of architecture with parallel threads; (iii) provides a comprehensive evaluation of techniques for enhancing value communication between speculative threads, and quantifies the impact of compiler optimization on these techniques. All proposed mechanisms and techniques are evaluated in detail using a fully-automatic, feedback-directed compilation infrastructure and a realistic simulation platform. For the regions of code that are speculatively parallelized by the compiler and executed on the baseline hardware support, the performance of two of 15 general-purpose applications studied improves by more than twofold and nine others by more than 25%, and the performance of four of the six numeric applications studied improves by more than twofold, and the other two by more than 60%—confirming TLS as a promising way to exploit the naturally-multithreaded processing resources of future computer systems.
TL;DR: The MAJC 5200 is a dual 32b microprocessor system-on-a-chip, utilizing 0.22 /spl mu/m CMOS with all-Cu interconnect, delivering GGFLOPS and 13GOPS at 500 MHz.
Abstract: The MAJC 5200 is a dual 32b microprocessor system-on-a-chip, utilizing 022 /spl mu/m CMOS with all-Cu interconnect Two CPUs, delivering GGFLOPS and 13GOPS at 500 MHz, are tightly coupled through a shared, coherent, 4-way set associative 16 KB data cache, and an on-chip 4 GB/s switch Each CPU is a 4-issue VLIW engine
TL;DR: The newly inroduced Microprocessor Arc hitecture for Java Computing (MAJC) supports parallelism in a hierarchy of levels: multiprocessors on chip, vertical micro threading, instruction level parallelism via a very long instruction word architecture (VLIW) and SIMD.
Abstract: The newly in troduced Microprocessor Arc hitecture for Java Computing (MAJC) supports parallelism in a hierarchy of levels: multiprocessors on chip, vertical micro threading, instruction level parallelism via a very long instruction word architecture (VLIW) and SIMD. The first implemen tation, MAJC-5200, includes some key features of MAJC to realize a high performance m ultimedia processor. Two CPUs running at 500 MHz are in tegrated into the chip to provide 6.16 GFLOPS and 12.33 GOPS with high speed in terfaces providing a peak input-output (I/O) data rate of more than 4.8 G Bytes/second. The c hip is suitable for a num ber of applications including graphics/m ultimedia processing for high-end set-top boxes, digital voice processing for telecomm unications, and advanced imaging.