TL;DR: The Click IP router can forward 64-byte packets at 73,000 packets per second, just 10% slower than Linux alone, and is easy to extend by adding additional elements, which are demonstrated with augmented configurations.
Abstract: Click is a new software architecture for building flexible and configurable routers. A Click router is assembled from packet processing modules called elements. Individual elements implement simple router functions like packet classification, queueing, scheduling, and interfacing with network devices. Complete configurations are built by connecting elements into a graph; packets flow along the graph's edges. Several features make individual elements more powerful and complex configurations easier to write, including pull processing, which models packet flow driven by transmitting interfaces, and flow-based router context, which helps an element locate other interesting elements.We demonstrate several working configurations, including an IP router and an Ethernet bridge. These configurations are modular---the IP router has 16 elements on the forwarding path---and easy to extend by adding additional elements, which we demonstrate with augmented configurations. On commodity PC hardware running Linux, the Click IP router can forward 64-byte packets at 73,000 packets per second, just 10% slower than Linux alone.
TL;DR: This paper examines fundamental limits to control plane propagation latency on an upcoming Internet2 production deployment, then expands the scope to over 100 publicly available WAN topologies and finds that one controller location is often sufficient to meet existing reaction-time requirements.
Abstract: Network architectures such as Software-Defined Networks (SDNs) move the control logic off packet processing devices and onto external controllers. These network architectures with decoupled control planes open many unanswered questions regarding reliability, scalability, and performance when compared to more traditional purely distributed systems. This paper opens the investigation by focusing on two specific questions: given a topology, how many controllers are needed, and where should they go? To answer these questions, we examine fundamental limits to control plane propagation latency on an upcoming Internet2 production deployment, then expand our scope to over 100 publicly available WAN topologies. As expected, the answers depend on the topology. More surprisingly, one controller location is often sufficient to meet existing reaction-time requirements (though certainly not fault tolerance requirements).
TL;DR: The RMT (reconfigurable match tables) model is proposed, a new RISC-inspired pipelined architecture for switching chips, and the essential minimal set of action primitives to specify how headers are processed in hardware are identified.
Abstract: In Software Defined Networking (SDN) the control plane is physically separate from the forwarding plane. Control software programs the forwarding plane (e.g., switches and routers) using an open interface, such as OpenFlow. This paper aims to overcomes two limitations in current switching chips and the OpenFlow protocol: i) current hardware switches are quite rigid, allowing ``Match-Action'' processing on only a fixed set of fields, and ii) the OpenFlow specification only defines a limited repertoire of packet processing actions. We propose the RMT (reconfigurable match tables) model, a new RISC-inspired pipelined architecture for switching chips, and we identify the essential minimal set of action primitives to specify how headers are processed in hardware. RMT allows the forwarding plane to be changed in the field without modifying hardware. As in OpenFlow, the programmer can specify multiple match tables of arbitrary width and depth, subject only to an overall resource limit, with each table configurable for matching on arbitrary fields. However, RMT allows the programmer to modify all header fields much more comprehensively than in OpenFlow. Our paper describes the design of a 64 port by 10 Gb/s switch chip implementing the RMT model. Our concrete design demonstrates, contrary to concerns within the community, that flexible OpenFlow hardware switch implementations are feasible at almost no additional cost or power.
TL;DR: Netmap as discussed by the authors is a framework that enables commodity operating systems to handle the millions of packets per seconds traversing 110 Gbit/s links, without requiring custom hardware or changes to applications.
Abstract: Many applications (routers, traffic monitors, firewalls, etc) need to send and receive packets at line rate even on very fast links In this paper we present netmap, a novel framework that enables commodity operating systems to handle the millions of packets per seconds traversing 110 Gbit/s links, without requiring custom hardware or changes to applications
In building netmap, we identified and successfully reduced or removed three main packet processing costs: per-packet dynamic memory allocations, removed by preallocating resources; system call overheads, amortized over large batches; and memory copies, eliminated by sharing buffers and metadata between kernel and userspace, while still protecting access to device registers and other kernel memory areas Separately, some of these techniques have been used in the past The novelty in our proposal is not only that we exceed the performance of most of previouswork, but also that we provide an architecture that is tightly integrated with existing operating system primitives, not tied to specific hardware, and easy to use and maintain
Netmap has been implemented in FreeBSD and Linux for several 1 and 10 Gbit/s network adapters In our prototype, a single core running at 900 MHz can send or receive 1488 Mpps (the peak packet rate on 10 Gbit/s links) This is more than 20 times faster than conventional APIs Large speedups (5× and more) are also achieved on user-space Click and other packet forwarding applications using a libpcap emulation library running on top of netmap
TL;DR: The novelty in the proposal is not only that it exceeds the performance of most of previous work, but also that it provides an architecture that is tightly integrated with existing operating system primitives, not tied to specific hardware, and easy to use and maintain.
Abstract: Many applications (routers, traffic monitors, firewalls, etc.) need to send and receive packets at line rate even on very fast links. In this paper we present netmap, a novel framework that enables commodity operating systems to handle the millions of packets per seconds traversing 1..10 Gbit/s links, without requiring custom hardware or changes to applications.
In building netmap, we identified and successfully reduced or removed three main packet processing costs: per-packet dynamic memory allocations, removed by preallocating resources; system call overheads, amortized over large batches; and memory copies, eliminated by sharing buffers and metadata between kernel and userspace, while still protecting access to device registers and other kernel memory areas. Separately, some of these techniques have been used in the past. The novelty in our proposal is not only that we exceed the performance of most of previouswork, but also that we provide an architecture that is tightly integrated with existing operating system primitives, not tied to specific hardware, and easy to use and maintain.
Netmap has been implemented in FreeBSD and Linux for several 1 and 10 Gbit/s network adapters. In our prototype, a single core running at 900 MHz can send or receive 14.88 Mpps (the peak packet rate on 10 Gbit/s links). This is more than 20 times faster than conventional APIs. Large speedups (5× and more) are also achieved on user-space Click and other packet forwarding applications using a libpcap emulation library running on top of netmap.