I/O bound

Topic Tools

Papers

Proceedings Article•10.1145/2486767.2486771•

Scalable I/O-bound parallel incremental gradient descent for big data analytics in GLADE

[...]

Chengie Qin¹, Florin Rusu¹•Institutions (1)

23 Jun 2013

TL;DR: Empirical evidence is provided that the proposed scalable and efficient parallel solution for incremental gradient descent in GLADE is limited only by the physical hardware characteristics, uses effectively the available resources, and achieves maximum scalability.

...read moreread less

Abstract: Incremental gradient descent is a general technique to solve a large class of convex optimization problems arising in many machine learning tasks. GLADE is a parallel infrastructure for big data analytics providing a generic task specification interface. In this paper, we present a scalable and efficient parallel solution for incremental gradient descent in GLADE. We provide empirical evidence that our solution is limited only by the physical hardware characteristics, uses effectively the available resources, and achieves maximum scalability. When deployed in the cloud, our solution has the potential to dramatically reduce the cost of complex analytics over massive datasets.

...read moreread less

23 citations

Journal Article•10.1145/1462586.1462592•

Compute Bound and I/O Bound Cellular Automata Simulations on FPGA Logic

[...]

Syed Shariyar Murtaza¹, Alfons G. Hoekstra¹, Peter M. A. Sloot¹•Institutions (1)

University of Amsterdam¹

01 Jan 2009-ACM Transactions on Reconfigurable Technology and Systems

TL;DR: This article presents a methodology to categorize a specified CA algorithm as a compute bound or an I/O bound, and takes rigorous analysis for each of the two cases identifying the various parameters that control the mapping process and are defined both by the Cellular Automata algorithm and the given FPGA hardware specifications.

...read moreread less

Abstract: FPGA-based computation engines have been used as Cellular Automata accelerators in the scientific community for some time now. With the recent availability of more advanced FPGA logic it becomes necessary to better understand the mapping of Cellular Automata to these systems. There are many trade-offs to consider when mapping a Cellular Automata algorithm from an abstract system to the physical implementation using FPGA logic. The trade-offs include both the available FPGA resources and the Cellular Automata algorithm's execution time. The most important aspect is to fully understand the behavior of the specified CA algorithm in terms of its execution times which are either compute bound or I/O bound. In this article, we present a methodology to categorize a specified CA algorithm as a compute bound or an I/O bound. We take the methodology further by presenting rigorous analysis for each of the two cases identifying the various parameters that control the mapping process and are defined both by the Cellular Automata algorithm and the given FPGA hardware specifications. This methodology helps to predict the performance of running Cellular Automata algorithms on specific FPGA hardware and to determine optimal values for the various parameters that control the mapping process. The model is validated for both compute and I/O bound two-dimensional Cellular Automata algorithms. We find that our model predictions are accurate within 7p.

...read moreread less

14 citations

Journal Article•10.1016/J.PARCO.2011.12.002•

Investigation into scaling I/O bound streaming applications productively with an all-FPGA cluster

[...]

Andrew G. Schmidt¹, Siddhartha Datta¹, Ashwin A. Mendon¹, Ron Sass¹•Institutions (1)

University of North Carolina at Charlotte¹

1 Aug 2012

TL;DR: This work presents an investigation into accelerating I/O bound streaming applications through the coupling of custom computing cores, a hardware filesystem, and an integrated on-chip and off-chip network on the all-FPGA node cluster.

...read moreread less

Abstract: The Reconfigurable Computing Cluster project is exploring novel parallel computing architectures in high performance computing with FPGA devices. Although there are no discrete microprocessors in the system, highly-integrated FPGAs (with embedded processors) are capable of hosting Linux-based systems and can run arbitrary MPI applications. This work present an investigation into accelerating I/O bound streaming applications through the coupling of custom computing cores, a hardware filesystem, and an integrated on-chip and off-chip network on the all-FPGA node cluster. Such an infrastructure enables productivity by minimizing hardware design while maintaining high performance. A hardware implementation of the BLASTn algorithm is used to demonstrate the performance gains and scalability of the custom computing cores across the Spirit cluster. Results show linear speedup across multiple nodes while supporting productivity by eliminating modifications to the original hardware core when scaling up to 512 parallel cores on the cluster.

...read moreread less

12 citations

Proceedings Article•10.1109/ICCCNT.2013.6726680•

Resource aware scheduling in Hadoop for heterogeneous workloads based on load estimation

[...]

B. Sutariya Kapil¹, S Sowmya Kamath¹•Institutions (1)

National Institute of Technology, Karnataka¹

4 Jul 2013

TL;DR: A new scheduling algorithm for Hadoop based distributed system is proposed, based on the classification of workloads to assign a specific category to a particular cluster according to current load of the cluster, which increases the performance of both CPU and I/O resources in a cluster under heterogeneous workloads.

...read moreread less

Abstract: Currently, most cloud based applications require large scale data processing capability. Data to be processed is growing at a rate much faster than available computing power. Hadoop is used to enable distributed processing on large clusters of commodity hardware. In large clusters, the workloads may be heterogeneous in nature, that is, I/O bound, CPU bound or network intensive jobs that demand different types of resources requirement so as to run simultaneously on large cluster. Hadoops job scheduling is based on FIFO where, parallelization based on types of job has not been taken into account for scheduling. In this paper, we propose a new scheduling algorithm for Hadoop based distributed system, based on the classification of workloads to assign a specific category to a particular cluster according to current load of the cluster. The proposed scheduler increases the performance of both CPU and I/O resources in a cluster under heterogeneous workloads, by approximately 12% when compared to Hadoops FIFO scheduler.

...read moreread less

11 citations

Proceedings Article•10.1109/CANOPIEHPC51917.2020.00009•

Containers for Massive Ensemble of I/O Bound Hierarchical Coupled Simulations

[...]

Wael R. Elwasif¹, Ross Whitfield¹, Jin Myung Park¹, Mark Cianciosa¹•Institutions (1)

Oak Ridge National Laboratory¹

1 Nov 2020

TL;DR: In this paper, the authors describe a hierarchical simulation structure using the Integrated Plasma Simulator (IPS) that enables the flexible execution of coupled simulations at the system, node, and core level using the same coupling abstraction and API.

...read moreread less

Abstract: We present our experience using containers to scale up a massive ensemble of coupled I/O bound workloads on the NERSC Cori supercomputer. We describe the design of a hierarchical simulation structure using the Integrated Plasma Simulator (IPS) that enables the flexible execution of coupled simulations at the system, node, and core level using the same coupling abstraction and API. The hierarchical design allows for the node-level execution to be efficiently executed using containers while not impacting the structure of the simulation at the system level. We demonstrate the viability of the approach by presenting experimental results from applications in coupled fusion plasma simulations that illustrate the performance impact of using containers to deploy the node-level workloads, in conjunction with the user mountable XFS file systems to ameliorate the load on the Lustre parallel file system. We also present results from production runs showing the ability of the ensemble simulations to scale to hundreds of Cori Haswell nodes, with little or no overhead.

...read moreread less

4 citations

Topic Tools

Papers

Scalable I/O-bound parallel incremental gradient descent for big data analytics in GLADE

Compute Bound and I/O Bound Cellular Automata Simulations on FPGA Logic

Investigation into scaling I/O bound streaming applications productively with an all-FPGA cluster

Resource aware scheduling in Hadoop for heterogeneous workloads based on load estimation

Containers for Massive Ensemble of I/O Bound Hierarchical Coupled Simulations

Related Topics (5)

Performance Metrics

No. of papers in the topic in previous years
Year	Papers
2020	1
2016	1
2013	4
2012	1
2010	1
2009	1