About: Direct memory access is a research topic. Over the lifetime, 2878 publications have been published within this topic receiving 30886 citations. The topic is also known as: DMA.
TL;DR: This book discusses the role of the Device Driver, the Kernel Classes of Devices and Modules, and more about how Mounting and Unmounting works.
Abstract: Preface. Chapter 1. An Introduction to Device Drivers The Role of the Device Driver Splitting the Kernel Classes of Devices and Modules Security Issues Version Numbering License Terms Joining the Kernel Development Community Overview of the Book. Chapter 2. Building and Running Modules Kernel Modules Versus Applications Compiling and Loading The Kernel Symbol Table Initialization and Shutdown Using Resources Automatic and Manual Configuration Doing It in User Space Backward Compatibility Quick Reference. Chapter 3. Char Drivers The Design of scull Major and Minor Numbers File Operations The file Structure open and release scull's Memory Usage A Brief Introduction to Race Conditions read and write Playing with the New Devices The Device Filesystem Backward Compatibility Quick Reference. Chapter 4. Debugging Techniques Debugging by Printing Debugging by Querying Debugging by Watching Debugging System Faults Debuggers and Related Tools. Chapter 5. Enhanced Char Driver Operations ioctl Blocking I/O poll and select Asynchronous Notification Seeking a Device Access Control on a Device File Backward Compatibility Quick Reference. Chapter 6. Flow of Time Time Intervals in the Kernel Knowing the Current Time Delaying Execution Task Queues Kernel Timers Backward Compatibility Quick Reference. Chapter 7. Getting Hold of Memory The Real Story of kmalloc Lookaside Caches get_free_page and Friends vmalloc and Friends Boot-Time Allocation Backward Compatibility Quick Reference Chapter 8. Hardware Management I/O Ports and I/O Memory Using I/O Ports Using Digital I/O Ports Using I/O Memory Backward Compatibility Quick Reference. Chapter 9. Interrupt Handling Overall Control of Interrupts Preparing the Parallel Port Installing an Interrupt Handler Implementing a Handler Tasklets and Bottom-Half Processing Interrupt Sharing Interrupt-Driven I/O Race Conditions Backward Compatibility Quick Reference. Chapter 10. Judicious Use of Data Types Use of Standard C Types Assigning an Explicit Size to Data Items Interface-Specific Types Other Portability Issues Linked Lists Quick Reference. Chapter 11. kmod and Advanced Modularization Loading Modules on Demand Intermodule Communication Version Control in Modules Backward Compatibility Quick Reference. Chapter 12. Loading Block Drivers Registering the Driver The Header File blk.h Handling Requests: A Simple Introduction Handling Requests: The Detailed View How Mounting and Unmounting Works The ioctl Method Removable Devices Partitionable Devices Interrupt-Driven Block Drivers Backward Compatibility Quick Reference. Chapter 13. mmap and DMA Memory Management in Linux The mmap Device Operation The kiobuf Interface Direct Memory Access and Bus Mastering Backward Compatibility Quick Reference. Chapter 14. Network Drivers How snull Is Designed Connecting to the Kernel The net_device Structure in Detail Opening and Closing Packet Transmission Packet Reception The Interrupt Handler Changes in Link State The Socket Buffers MAC Address Resolution Custom ioctl Commands Statistical Information Multicasting Backward Compatibility Quick Reference. Chapter 15. Overview of Peripheral Buses The PCI Interface A Look Back: ISA PC/104 and PC/104+ Other PC Buses SBus NuBus External Buses Backward Compatibility Quick Reference. Chapter 16. Physical Layout of the Kernel Source Booting the Kernel Before Booting The init Process The kernel Directory The fs Directory The mm Directory The net directory ipc and lib include and arch Drivers. Glossary. Index
TL;DR: The nn-X system is presented, a scalable, low-power coprocessor for enabling real-time execution of deep neural networks, able to achieve a peak performance of 227 G-ops/s, which translates to a performance per power improvement of 10 to 100 times that of conventional mobile and desktop processors.
Abstract: Deep networks are state-of-the-art models used for understanding the content of images, videos, audio and raw input data. Current computing systems are not able to run deep network models in real-time with low power consumption. In this paper we present nn-X: a scalable, low-power coprocessor for enabling real-time execution of deep neural networks. nn-X is implemented on programmable logic devices and comprises an array of configurable processing elements called collections. These collections perform the most common operations in deep networks: convolution, subsampling and non-linear functions. The nn-X system includes 4 high-speed direct memory access interfaces to DDR3 memory and two ARM Cortex-A9 processors. Each port is capable of a sustained throughput of 950 MB/s in full duplex. nn-X is able to achieve a peak performance of 227 G-ops/s, a measured performance in deep learning applications of up to 200 G-ops/s while consuming less than 4 watts of power. This translates to a performance per power improvement of 10 to 100 times that of conventional mobile and desktop processors.
TL;DR: In this paper, a high speed digital video network apparatus is implemented on a single integrated circuit chip, and includes a network protocol processing system interconnection, compression/decompression circuits, and encoder/decoder circuits.
Abstract: A high speed digital video network apparatus is implemented on a single integrated circuit chip, and includes a network protocol processing system interconnection, compression/decompression circuits, and encoder/decoder circuits. The interconnection includes a packet conversion logic which converts between a network protocol, such as Asynchronous Transfer Mode (ATM) packets, and the data protocol used to handle large data streams, such as Motion Picture Experts Group (MPEG) packets. The interconnection further includes a Virtual Channel Memory (VCM) for storing ATM cells for segmentation and reassembly, a Direct Memory Access (DMA) controller for connecting the VCR to the compression/decompression circuits, a Parallel Cell Interface (PCI) for connecting the VCM to an ATM network, a Pacing Rate Unit (PCU) for automatically reducing the maximum transmission rate in response to a sensed congestion condition in the network, and a Reduced Instruction Set Computer (RISC) microprocessor for controlling the DMA controller and transfers between the memory, a host and the ATM network, for performing segmentation and reassembly of Conversion Sublayer Payload Data Units (CD-PDUs), and for performing conversion between the ATM Protocol and the MPEG protocol. The compression/decompression and decoder/encoder circuits may utilize MPEG to compress digitized images and motion video into compact data streams that can be moved across networks with bandwidths too narrow to accommodate the uncompressed data. The operating program for the RISC microprocessor is stored in a volatile Instruction Random Access Memory (IRAM) in the form of firmware which can be downloaded at initialization.
TL;DR: In this paper, a DMA engine is provided that is suitable for higher performance System On a Chip (SOC) devices that have multiple concurrent on-chip/off-chip memory spaces.
Abstract: A DMA engine is provided that is suitable for higher performance System On a Chip (SOC) devices that have multiple concurrent on-chip/off-chip memory spaces. The DMA engine operates either on logical addressing method or physical addressing method and provides random and sequential mapping function from logical address to physical address while supporting frequent context switching among a large number of logical address spaces. Embodiments of the present invention utilize per direction (source-destination) queuing and an internal switch to support non-blocking concurrent transfer of data on multiple directions. A caching technique can be incorporated to reduce the overhead of address translation.
TL;DR: In this paper, a digital computer with the capability of incorporating multiple central processing units (CPU's), utilizes an address and data bus between each central processing unit and from one to fifteen intelligent composite memory and input/output modules (MIO).
Abstract: A digital computer with the capability of incorporating multiple central processing units (CPU's), utilizes an address and data bus between each central processing unit and from one to fifteen intelligent composite memory and input/output modules (MIO). Data is transferred to and from each MIO and the CPU synchronously by a bus during one phase of a three phase clocking cycle. During a second phase of the clocking cycle data on one or more low speed serial data channels within each MIO is transferred to and from the MIO and external devices. During the third phase of the clocking cycle data on a high speed direct memory access channel (DMA) is transferred to and from the MIO and one or more external devices. Additional CPU's can be interconnected with the first CPU by means of an inter-processor buffer module (IPB) which interconnects to the bus at one end and the additional CPU, by means of a bus, at its other end. The IPB may be a software modifiable MIO and can store data addressable by the two interconnected CPU's. In turn, the additional CPU and its associated bus interconnects by the second bus with from one to fifteen additional MIO's or IPB's, allowing cascading of CPU's and associated MIO's and IPB's. Since all data transfers to and from the MIO's and external devices occur at time phases separate from the first time phase in which the CPU communicates with the MIO's and IPB's, the computational speed of any CPU is independent of the quantity of data transferred between the MIO's and IPB's and associated external devices or additional CPU's.