Bisection bandwidth

Topic Tools

Papers published on a yearly basis

Papers

Proceedings Article•10.5555/1855711.1855730•

Hedera: dynamic flow scheduling for data center networks

[...]

Mohammad Al-Fares¹, Sivasankar Radhakrishnan¹, Barath Raghavan², Nelson Huang¹, Amin Vahdat¹ - Show less +1 more•Institutions (2)

University of California, San Diego¹, Williams College²

28 Apr 2010

TL;DR: Hedera is presented, a scalable, dynamic flow scheduling system that adaptively schedules a multi-stage switching fabric to efficiently utilize aggregate network resources and delivers bisection bandwidth that is 96% of optimal and up to 113% better than static load-balancing methods.

...read moreread less

Abstract: Today's data centers offer tremendous aggregate bandwidth to clusters of tens of thousands of machines. However, because of limited port densities in even the highest-end switches, data center topologies typically consist of multi-rooted trees with many equal-cost paths between any given pair of hosts. Existing IP multipathing protocols usually rely on per-flow static hashing and can cause substantial bandwidth losses due to long-term collisions.In this paper, we present Hedera, a scalable, dynamic flow scheduling system that adaptively schedules a multi-stage switching fabric to efficiently utilize aggregate network resources. We describe our implementation using commodity switches and unmodified hosts, and show that for a simulated 8,192 host data center, Hedera delivers bisection bandwidth that is 96% of optimal and up to 113% better than static load-balancing methods.

...read moreread less

1,811 citations

Journal Article•10.1109/JSSC.2007.910957•

An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS

[...]

Sriram R. Vangal¹, Jason Howard¹, Greg Ruhl¹, Saurabh Dighe¹, H. Wilson¹, James W. Tschanz¹, D. Finan¹, A. Singh¹, Tiju Jacob¹, Shailendra Jain¹, Vasantha Erraguntla¹, Clark Roberts¹, Yatin Hoskote¹, Nitin Borkar¹, Shekhar Borkar¹ - Show less +11 more•Institutions (1)

Intel¹

28 Jan 2008-IEEE Journal of Solid-state Circuits

TL;DR: In this paper, an integrated network-on-chip architecture containing 80 tiles arranged as an 8x10 2D array of floating-point cores and packet-switched routers, both designed to operate at 4 GHz.

...read moreread less

Abstract: This paper describes an integrated network-on-chip architecture containing 80 tiles arranged as an 8x10 2-D array of floating-point cores and packet-switched routers, both designed to operate at 4 GHz. Each tile has two pipelined single-precision floating-point multiply accumulators (FPMAC) which feature a single-cycle accumulation loop for high throughput. The on-chip 2-D mesh network provides a bisection bandwidth of 2 Terabits/s. The 15-FO4 design employs mesochronous clocking, fine-grained clock gating, dynamic sleep transistors, and body-bias techniques. In a 65-nm eight-metal CMOS process, the 275 mm2 custom design contains 100 M transistors. The fully functional first silicon achieves over 1.0 TFLOPS of performance on a range of benchmarks while dissipating 97 W at 4.27 GHz and 1.07 V supply.

...read moreread less

740 citations

Journal Article•10.1109/MM.2007.4378783•

A 5-GHz Mesh Interconnect for a Teraflops Processor

[...]

Hoskote, Vangal, Singh, Borkar

01 Jan 2007-IEEE Micro

493 citations

Journal Article•10.1145/844128.844154•

Scalability and accuracy in a large-scale network emulator

[...]

Amin Vahdat¹, Kenneth Yocum¹, Kevin Walsh¹, Priya Mahadevan¹, Dejan Kostic¹, Jeffrey S. Chase¹, David Becker¹ - Show less +3 more•Institutions (1)

Duke University¹

9 Dec 2002

TL;DR: The current ModelNet prototype is able to accurately subject thousands of instances of a distrbuted application to Internet-like conditions with gigabits of bisection bandwidth, including novel techniques to balance emulation accuracy against scalability.

...read moreread less

Abstract: This paper presents ModelNet, a scalable Internet emulation environment that enables researchers to deploy unmodified software prototypes in a configurable Internet-like environment and subject them to faults and varying network conditions. Edge nodes running user-specified OS and application software are configured to route their packets through a set of ModelNet core nodes, which cooperate to subject the traffic to the bandwidth, congestion constraints, latency, and loss profile of a target network topology.This paper describes and evaluates the ModelNet architecture and its implementation, including novel techniques to balance emulation accuracy against scalability. The current ModelNet prototype is able to accurately subject thousands of instances of a distrbuted application to Internet-like conditions with gigabits of bisection bandwidth. Experiments with several large-scale distributed services demonstrate the generality and effectiveness of the infrastructure.

...read moreread less

492 citations

Proceedings Article•10.1145/2890955.2890968•

HULA: Scalable Load Balancing Using Programmable Data Planes

[...]

Naga Praveen Kumar Katta¹, Mukesh Hira², Changhoon Kim, Anirudh Sivaraman³, Jennifer Rexford¹ - Show less +1 more•Institutions (3)

Princeton University¹, VMware², Massachusetts Institute of Technology³

14 Mar 2016

TL;DR: HULA is presented, a data-plane load-balancing algorithm that outperforms a scalable extension to CONGA in average flow completion time and is designed for emerging programmable switches and programed in P4 to demonstrate that HULA could be run on such programmable chipsets, without requiring custom hardware.

...read moreread less

Abstract: Datacenter networks employ multi-rooted topologies (e.g., Leaf-Spine, Fat-Tree) to provide large bisection bandwidth. These topologies use a large degree of multipathing, and need a data-plane load-balancing mechanism to effectively utilize their bisection bandwidth. The canonical load-balancing mechanism is equal-cost multi-path routing (ECMP), which spreads traffic uniformly across multiple paths. Motivated by ECMP's shortcomings, congestion-aware load-balancing techniques such as CONGA have been developed. These techniques have two limitations. First, because switch memory is limited, they can only maintain a small amount of congestion-tracking state at the edge switches, and do not scale to large topologies. Second, because they are implemented in custom hardware, they cannot be modified in the field. This paper presents HULA, a data-plane load-balancing algorithm that overcomes both limitations. First, instead of having the leaf switches track congestion on all paths to a destination, each HULA switch tracks congestion for the best path to a destination through a neighboring switch. Second, we design HULA for emerging programmable switches and program it in P4 to demonstrate that HULA could be run on such programmable chipsets, without requiring custom hardware. We evaluate HULA extensively in simulation, showing that it outperforms a scalable extension to CONGA in average flow completion time (1.6 x at 50% load, 3 x at 90% load).

...read moreread less

431 citations

...

Expand

Year	Papers
2022	1
2021	13
2020	9
2019	12
2018	4
2017	13

Topic Tools

Papers published on a yearly basis

Papers

Hedera: dynamic flow scheduling for data center networks

An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS

A 5-GHz Mesh Interconnect for a Teraflops Processor

Scalability and accuracy in a large-scale network emulator

HULA: Scalable Load Balancing Using Programmable Data Planes

Related Topics (5)

Performance Metrics