TL;DR: In a previous paper as mentioned in this paper, we reported the successful use of graph coloring techniques for doing global register allocation in an experimental PL/I optimizing compiler, when the compiler cannot color the...
Abstract: In a previous paper we reported the successful use of graph coloring techniques for doing global register allocation in an experimental PL/I optimizing compiler. When the compiler cannot color the ...
TL;DR: This paper describes two improvements to Chaitin-style graph coloring register allocators, and provides a detailed description of optimistic coloring and rematerialization, and presents experimental data to show the performance of several versions of the register allocator on a suite of FORTRAN programs.
Abstract: We describe two improvements to Chaitin-style graph coloring register allocators. The first, optimistic coloring, uses a stronger heuristic to find a k-coloring for the interference graph. The second extends Chaitin's treatment of rematerialization to handle a larger class of values. These techniques are complementary. Optimistic coloring decreases the number of procedures that require spill code and reduces the amount of spill code when spilling is unavoidable. Rematerialization lowers the cost of spilling some values. This paper describes both of the techniques and our experience building and using register allocators that incorporate them. It provides a detailed description of optimistic coloring and rematerialization. It presents experimental data to show the performance of several versions of the register allocator on a suite of FORTRAN programs. It discusses several insights that we discovered only after repeated implementation of these allocators.
TL;DR: A fundamentally new approach to global register allocation is presented that optimally allocates registers and optimally places spill code, significantly decreasing spill code overhead compared with the traditional graph-coloring approach.
Abstract: This paper presents a fundamentally new approach to global register allocation that optimally allocates registers and optimally places spill code, significantly decreasing spill code overhead compared with the traditional graph-coloring approach. The Optimal Register Allocation (ORA) approach formulates global register allocation as a 0–1 integer programming problem, incorporating all aspects of register allocation within a unified framework, including copy elimination, live range splitting, rematerialization, callee and caller register spilling, special instruction-operand requirements, and paired registers. A prototype ORA allocator is built into the Gnu C Compiler (GCC). For the SPEC92 integer benchmarks, the ORA allocator actually produces a net decrease of more than 100 million cycles across the entire benchmark set, because the dynamic copies the ORA allocator removes exceed the dynamic loads and stores that are inserted. In contrast, the GCC allocator and a Chaitin-style graph-coloring allocator each cause a net increase of more than 1 billion cycles. Because global register allocation is NP-complete, optimal register allocation has been considered intractable. However, the run-time complexity of the ORA approach is shown experimentally to be O(n3). A profile-guided hybrid allocation approach is proposed that uses the ORA allocator for the performance critical regions in the performance critical functions, while using a graph-coloring allocator for the non-critical functions and regions. An ORA-GCC hybrid allocator takes an average of 4.6 seconds per function to produce an allocation that is within 1% of optimal for 97% of the SPEC92 integer benchmark functions, showing that the hybrid allocator is practical as an advanced optimization for performance-critical codes.
TL;DR: This paper describes funct on materzakation as an optimization concept in object-oriented databases and concludes with a quantitative analysis of function materialization based on a sample performance benchmark obtained from the experimental object base system GOM.
Abstract: We describe funct$on materzakation as an optimization concept in object-oriented databases. Exploiting the object-oriented paradigm—namely class lficatton, object identzty, and encapsalatzon-—facilitates a rather easy incorporation of function materialization mto (existing) object-oriented systems. Furthermore, the exploitation of encapsulation (information hiding) and object identity provides for additional performance tuning measures which drastically decrease the rematerialization overhead incurred by updates in the object base. The paper concludes with a quantitative analysis of function materialization based on a sample performance benchmark obtained from our experimental object base system GOM.
TL;DR: In this paper, the authors propose hardware techniques for optimizations of hyperdimensional computing, in a synthesizable VHDL library, to enable co-located implementation of both learning and classification tasks on only a small portion of Xilinx(R) UltraScale(TM) FPGAs.
Abstract: Brain-inspired hyperdimensional (HD) computing models neural activity patterns of the very size of the brain's circuits with points of a hyperdimensional space, that is, with hypervectors. Hypervectors are $D$-dimensional (pseudo)random vectors with independent and identically distributed (i.i.d.) components constituting ultra-wide holographic words: $D = 10,000$ bits, for instance. At its very core, HD computing manipulates a set of seed hypervectors to build composite hypervectors representing objects of interest. It demands memory optimizations with simple operations for an e cient hardware realization. In this paper, we propose hardware techniques for optimizations of HD computing, in a synthesizable VHDL library, to enable co-located implementation of both learning and classification tasks on only a small portion of Xilinx(R) UltraScale(TM) FPGAs: (1) We propose simple logical operations to rematerialize the hypervectors on the fly rather than loading them from memory. These operations massively reduce the memory footprint by directly computing the composite hypervectors whose individual seed hypervectors do not need to be stored in memory. (2) Bundling a series of hypervectors over time requires a multibit counter per every hypervector component. We instead propose a binarized back-to-back bundling without requiring any counters. This truly enables on-chip learning with minimal resources as every hypervector component remains binary over the course of training to avoid otherwise multibit component. (3) For every classification event, an associative memory is in charge of finding the closest match between a set of learned hypervectors and a query hypervector by using a distance metric. This operator is proportional to [...]