About: Uncacheable speculative write combining is a research topic. Over the lifetime, 15 publications have been published within this topic receiving 280 citations.
TL;DR: The write-combining buffer as discussed by the authors combines data from separate data write operations into cache-line-sized buffer units for uncacheable types of data, such as frame buffer data.
Abstract: The write-combining buffer combines data from separate data write operations into cache-line-sized buffer units for uncacheable types of data, such as frame buffer data. The write-combining buffer is implemented within a microprocessor having a data cache unit storing cacheable data within cache-lines. The data cache unit includes components and circuitry provided for efficiently inputting and outputting cache-line-sized units of data. By combining many uncacheable data write operations within a single cache-line-sized buffer, the circuitry and techniques employed for processing cache-lines are exploited in the processing of uncacheable data as well. A particular implementation is described wherein uncacheable data units corresponding to graphics write operations within an out-of-order microprocessor are combined into cache-line-sized buffers, then transmitted to a frame buffer using a burst mode eviction. Processor ordering requirements are ignored and global observability is relaxed for the graphics write operations. If the cache line sized buffer is not full when evicted, then a sequence of one or more burst-mode partial writes are employed to evict all data within the cache line sized buffer. If partial writes are employed, no delay between the partial writes is required.
TL;DR: In this article, a processor includes a decoder to decode instructions and a circuit, in response to a decoded instruction, detects an incoming write back or write through streaming store instruction that misses a cache and allocates a buffer in write combining mode.
Abstract: A processor is disclosed. The processor includes a decoder to decode instructions and a circuit, in response to a decoded instruction, detects an incoming write back or write through streaming store instruction that misses a cache and allocates a buffer in write combining mode. The circuit, in response to a second decoded instruction, detects either an uncacheable speculative write combining store instruction or a second write back streaming store or write through streaming store instruction that hits the buffer and merges the second decoded instruction with the buffer.
TL;DR: In this article, the authors present a system and method for enabling a graphics processor to operate with a CPU that reorders write instructions without requiring expensive hardware and which does not significantly reduce the performance of the driver operating on the CPU.
Abstract: A system and method for enabling a graphics processor to operate with a CPU that reorders write instructions without requiring expensive hardware and which does not significantly reduce the performance of the driver operating on the CPU. The invention allows the graphics processor to evaluate the data sent to it by software running on the CPU in its intended and proper order, even if the CPU transmits the data to the graphics processor in an order different from that generated by the software. The invention works regardless of the particular write reordering technique used by the CPU, and is a very low-cost addition to the graphics processor, requiring only a few registers and a small state machine. The invention identifies the number of "holes" in the reordered write instructions and when the number of holes becomes zero a set of received data is made available for execution by the graphics processor.
TL;DR: In this article, the authors present a virtual machine system and a method for sharing a graphics card among virtual machines, which enables the GOSs to access the real graphics card, and also enable switching among a plurality of virtual machines.
Abstract: The present invention provides a virtual machine system and a method for sharing a graphics card amongst virtual machines. A VMM of the virtual machine system is provided with a resource-converting module, which converts data exchanged between a graphics card drive module of a GOS in the foreground and the graphics card based on a resource-converting table, and also intercepts accesses to the real graphics card by a GOS in the background and then responds to its operations on the graphics card. The VMM is further provided with a switching module, which alters a state of a VM based on a command for switching the VM, saves a graphics card state before the VM is switched to the background and restores the stored graphics card state to the graphics card when the VM is switched back to the foreground. Further, the GOSs each comprise a graphics card drive module corresponding to the real graphics card for accessing the real graphics card. The systems and the methods according to the present invention enable the GOSs to access the real graphics card, and also enable switching among a plurality of virtual machines.
TL;DR: In this paper, a graphics request stream is transferred from a host processor to a graphics card via a host bus so that the stream traverses the host bus no more than once.
Abstract: A graphics request stream is transferred from a host processor to a graphics card via a host bus so that the stream traverses the host bus no more than once. To that end, the graphics card has a graphics card memory, and the host processor has a host memory configured in a first memory configuration. The graphics card memory may be configured in the first memory configuration, and the graphics request stream is received directly in a message from the host processor (via the host bus). Upon receipt by the graphics card, the graphics request stream is written to the graphics card memory.