TL;DR: In this paper, a large capacity (8 memory banks of 524K error-corrected 36 bit words stored) pipelined (8 deep request pipeline) random access memory store simultaneously (to the limit of bank addressing conflicts) services intermixed requests from an internal exerciser plus ported requestors (up to 10) of plural types (3 types), which requestors are not of the same interface cycle time (30 nsec vs. 60 nsec).
Abstract: A large capacity (8 memory banks of 524K error-corrected 36 bit words stored) high performance (latency as low as 240 nanoseconds, 12.8 gigabits/second aggregate data transfer capability with up to 11.4 gigabits/second utilized) pipelined (8 deep request pipeline) random access memory store simultaneously (to the limit of bank addressing conflicts) services intermixed requests from an internal exerciser plus ported requestors (up to 10) of plural types (3 types), which requestors are not of the same interface cycle time (30 nsec vs. 60 nsec). Furthermore, to such nonuniform interface cycle times, the bit-width of the data transfer interfaces (ports) to the requestors of plural types is also not uniform, but is actually wider (4 interface words of 36 bits each=144 bits) to faster (30 nanosecond) requestors than is that data transfer bit-width (2 interface words=72 bits) to slower (60 nsec) requestors. Three differing individual requestor bandwidths=data-transfer-bit width/data-transfer-period (144 bits/30 nanosecond, 72 bits/120 nanosecond or 248 bits/240 nanosecond) are intermixedly and simultaneously (to the limit of addressing possibility) supported on 8 memory banks each of which does output 144 bits per 90 nanoseconds.
TL;DR: In this paper, the authors present a method and structure for implementing a 64/8 ECC algorithm on a SIMM using a computer which has a 32-bit bus and is configured with a 36-bit wide memory.
Abstract: The present invention relates to a method and structure for implementing a 64/8 ECC algorithm on a SIMM using a computer which has a 32-bit bus and is configured with a 36-bit wide memory. This is accomplished by writing two successive 4 byte words from the system to latches, to form an 8 byte quad word, and writing 8 check bits utilizing the entire 64 bits of the quad word. One-half of the quad word (i.e., 32 bits) together with 4 of the 8 check bits for a total of 36 bits is stored at one address location in memory, and the remaining 32 bits of the quad word, together with the remaining 4 check bits, are stored at another, preferably the successive 36 bit, address location in memory. When the quad word and check bits are read from the memory, they are read serially, i.e., the first 32 bits and 4 associated check bits are read and latched, followed by the second 32 bits and the 4 associated check bits being read and combined with the first 32 bits of data and 4 check bits so as to essentially "reconstitute" the original 64-bit quad word with 8 check bits. From the "reconstituted" 64-bit data word and 8 check bits, the error correction is performed. The 64-bit quad word with the corrected data is latched and asserted successively on the data bus as two 32-bit words. Also, preferably logic and circuitry to perform a read-modify-write (R-M-W) function are provided.
TL;DR: The Gauss-Laguerre quadrature is proposed as a numerical method for calculating the correction factor integrals that occur in spreading resistance calculations, and it has been found that the CPU time taken to correct the entire profile of 57 data points is 1.0 min.
Abstract: The Gauss-Laguerre quadrature is proposed as a numerical method for calculating the correction factor integrals that occur in spreading resistance calculations. The method is very efficient in terms of computation time and memory storage, requiring only 33 integrand values for each integral evaluation. The accuracy of the method has been investigated for a variety of graded structures, and found to be better than 5%. As a test of its practical utility, the method has been used in the correction of the spreading resistance profile of a practical buried layer structure, and it has been found that the CPU time taken to correct the entire profile of 57 data points is 1.0 min on an IBM 1130 System with a 16K word (16 bit) memory or 0.4 sec on a UNIVAC 1100/10 Multiprocessor System with a 393K word (36 bit) memory. These times are a factor of 6 to 8 less than those required by using the previously proposed adaptive Simpson's rule to compute the correction factor integrals.
TL;DR: In this article, a storage control system for data transfer between a first memory 10 and second circulating store 12 having periodically accessible sectors 62 separated by normally unutilized guard bands 61 such that information may be additionally stored in and retrieved from the guard bands.
Abstract: 1,265,756. Storage control systems. GENERAL ELECTRIC CO. 6 June, 1969 [14 June, 1968], No. 28844/69. Heading G4C. A storage control system controls the transfer of data between a first memory 10 (Fig. 2) and second circulating store 12 having periodically accessible sectors 62 separated by normally unutilized guard bands 61 such that information may be additionally stored in and retrieved from the guard bands. To initiate transfer of data a processor executes a " connect " instruction to retrieve a peripheral control word from magnetic core memory 10 and enter it in extended memory controller 16. A decoder 42 receiving bits 18, 19 of the 36 bit peripheral control word feeds a signal to register 46 holding bits 0-17 of the word (that is the address in memory 10 of a data control word) so that the data control word at that address, comprising two 36 bit words, is read one word at a time into decoder 46. Each data control word comprises four parts-(a) the working store cell address of the information to be transferred, (b) the auxiliary store sector address of the information to be transferred, (c) the working store address of the next data control word to be retrieved and (d) a function portion. The latter signifies whether data is to be stored in or retrieved from magnetic disc or drum memory 12 and whether a " normal mode " or " mode 1 " operation is to be performed, i.e. whether sectors or guard bands are to be accessed. During " normal mode " operation blocks of 64 data words are transferred in blocks of four whilst during a " mode 1 " operation 8 data words are transferred. The function portion of the data control word is fed to decoder 46 controlling a main memory control 44, sync. control 48, write amplifier 68, track address selection matrix 50, data transfer control matrix 156 and mode control 43. To transfer data from memory 10 to memory 12, sync. control 48 compares the address of memory 12 as it rotates with the address of the data control word during which time four words are transferred into holding registers 174. At equality of address the contents of the registers 174 are transferred in parallel via gates 172 into shift registers 64, the contents of which are fed serially to write amplifier 68 to be entered into the memory. To transfer data from memory 12 to memory 10, sync. control 48 compares the addresses of the memory 10 and the data control word, at equality read amplifier 66 being enabled to serially read in four words to shift register 64. The contents of the register 64 are transferred in parallel to registers 174 and then to memory control 18 together with an address signal for storage in main memory 10.
TL;DR: The design and implementation of an address generator for stream-based computation that can generate addresses by a 1, 2 or 3-dimensional mapping from a linear data string in memory is described.
Abstract: This paper describes the design and implementation of an address generator for stream-based computation. The unit can generate addresses by a 1, 2 or 3-dimensional mapping from a linear data string in memory. A processing unit will get the required data in a continuous stream without empty time slots, even when switching between addressing algorithms. Each algorithm is specified by a set of parameters loaded into FIFOs in background. The unit is specified by VHDL, simulated, synthesized and implemented on an FPGA of type Xilinx Virtex-II Pro. A speed of 144 MHz is obtained for generating 36 bit addresses. Ideas for expanding the flexibility of the unit are discussed.