LADS: optimizing data transfers using layout-aware data scheduling
Youngjae Kim,Scott Atchley,Geoffroy Vallée,Galen M. Shipman +3 more
- 16 Feb 2015
- pp 67-80
TL;DR: This paper identifies the issues that lead to congestion on the path of an end-to-end data transfer in the terabit network environment, and presents a new bulk data movement framework called LADS for terabit networks.
read more
Abstract: While future terabit networks hold the promise of significantly improving big-data motion among geographically distributed data centers, significant challenges must be overcome even on today's 100 gigabit networks to realize end-to-end performance. Multiple bottlenecks exist along the end-to-end path from source to sink. Data storage infrastructure at both the source and sink and its interplay with the wide-area network are increasingly the bottleneck to achieving high performance. In this paper, we identify the issues that lead to congestion on the path of an end-to-end data transfer in the terabit network environment, and we present a new bulk data movement framework called LADS for terabit networks. LADS exploits the underlying storage layout at each endpoint to maximize throughput without negatively impacting the performance of shared storage resources for other users. LADS also uses the Common Communication Interface (CCI) in lieu of the sockets interface to use zero-copy, OS-bypass hardware when available. It can further improve data transfer performance under congestion on the end systems using buffering at the source using flash storage. With our evaluations, we show that LADS can avoid congested storage elements within the shared storage resource, improving I/O bandwidth, and data transfer rates across the high speed networks.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Predicting Output Performance of a Petascale Supercomputer
Bing Xie,Yezhou Huang,Jeffrey S. Chase,Jong Youl Choi,Scott Klasky,Jay Lofstead,Sarp Oral +6 more
- 26 Jun 2017
TL;DR: A predictive model useful for output performance prediction of supercomputer file systems under production load of Titan and its Lustre-based multi-stage write path is developed, using feature transformations to capture non-linear relationships.
65
TRIO: Burst Buffer Based I/O Orchestration
Teng Wang,Sarp Oral,Michael Pritchard,Bin Wang,Weikuan Yu +4 more
- 08 Sep 2015
TL;DR: This paper proposes a burst buffer based I/O orchestration framework, named TRIO, to intercept and reshape the bursty writes for better sequential write traffic to storage servers, and demonstrates that TRIO could efficiently utilize storage bandwidth and reduce the average job I-O time by 37% on average for data-intensive applications in typical checkpointing scenarios.
49
SciSpace: A scientific collaboration workspace for geo-distributed HPC data centers
TL;DR: SciSpace provides a global view of information shared from multiple geo-distributed HPC data centers under a single workspace that supports native data-access to gain high-performance when data read or write is required in native data center namespace and is evaluated using real scientific datasets and applications.
21
Optimizing End-to-End Big Data Transfers over Terabits Network Infrastructure
TL;DR: This paper identifies the issues that lead to congestion on the path of an end-to-end data transfer in the terabit network environment, and presents a new bulk data movement framework for terabit networks, called LADS, which can avoid congested storage elements within the shared storage resource, improving input/output bandwidth, and data transfer rates across the high speed networks.
17
References
Ceph: a scalable, high-performance distributed file system
Sage A. Weil,Scott A. Brandt,Ethan L. Miller,Darrell D. E. Long,Carlos Maltzahn +4 more
- 06 Nov 2006
TL;DR: Performance measurements under a variety of workloads show that Ceph has excellent I/O performance and scalable metadata management, supporting more than 250,000 metadata operations per second.
•Proceedings Article
GPFS: A Shared-Disk File System for Large Computing Clusters
Frank B. Schmuck,Roger L. Haskin +1 more
- 28 Jan 2002
TL;DR: GPFS is IBM's parallel, shared-disk file system for cluster computers, available on the RS/6000 SP parallel supercomputer and on Linux clusters, and discusses how distributed locking and recovery techniques were extended to scale to large clusters.
The Globus Striped GridFTP Framework and Server
William Allcock,John Bresnahan,Rajkumar Kettimuthu,Michael Link,Catalin Dumitrescu,Ioan Raicu,Ian Foster +6 more
- 12 Nov 2005
TL;DR: It is argued that this combination of performance and modular structure make the Globus GridFTP framework both a good foundation on which to build tools and applications, and a unique testbed for the study of innovative data management techniques and network protocols.
•Proceedings Article
Scalable performance of the Panasas parallel file system
Brent B. Welch,Marc Unangst,Zainul Abbasi,Garth A. Gibson,Brian Mueller,Jason Small,Jim Zelenka,Bin Zhou +7 more
- 26 Feb 2008
TL;DR: Performance measures of I/O, metadata, and recovery operations for storage clusters that range in size from 10 to 120 storage nodes, 1 to 12 metadata nodes, and with file system client counts ranging from 1 to 100 compute nodes are presented.
Managing Variability in the IO Performance of Petascale Storage Systems
Jay Lofstead,Fang Zheng,Qing Liu,Scott Klasky,Ron A. Oldfield,Todd Kordenbrock,Karsten Schwan,Matthew Wolf +7 more
- 13 Nov 2010
TL;DR: These measurements motivate developing a 'managed' IO approach using adaptive algorithms varying the IO system workload based on current levels and use areas, which achieves higher overall performance and less variability in both a typical usage environment and with artificially introduced levels of 'noise'.
193