Kaihao Ma
The Chinese University of Hong Kong
9 Papers
2 Citations
Kaihao Ma is an academic researcher from The Chinese University of Hong Kong. The author has contributed to research in topics: Computer science & Deep learning. The author has an hindex of 3, co-authored 5 publications.
Chat about Author
Papers
Seastar: vertex-centric programming for graph neural networks
Yidi Wu,Kaihao Ma,Zhenkun Cai,Tatiana Jin,Boyang Li,Chenguang Zheng,James Cheng,Fan Yu +7 more
- 21 Apr 2021
TL;DR: Seastar as discussed by the authors is a vertex-centric programming model for GNN training on GPU and provides idiomatic python constructs to enable easy development of novel homogeneous and heterogeneous GNN models.
62
•Posted Content
TensorOpt: Exploring the Tradeoffs in Distributed DNN Training with Auto-Parallelism.
TL;DR: In this article, the authors propose FT, an efficient algorithm that searches for an optimal set of parallelization strategies to allow the trade-off among different objectives, which can adapt to different scenarios by minimizing the memory consumption when the number of devices is limited.
43
PPS: Fair and efficient black-box scheduling for multi-tenant GPU clusters
Kaihao Ma,Zhenkun Cai,Xi-Hong Yan,Yang Zhang,Zhi Liu,Yihui Feng,Chao Li,Wei Lin,James Cheng +8 more
TL;DR: PPS is proposed, a probabilistic prediction-based scheduler for multi-tenant GPU clusters, achieving high utilization and fairness by predicting future cluster status using job history statistics, treating jobs as black boxes without requiring detailed job information.
2
FEC: Efficient Deep Recommendation Model Training with Flexible Embedding Communication
Kaihao Ma,Xiao Yan,Zhenkun Cai,Yuzhen Huang,Yidi Wu,James Cheng +5 more
- 13 Jun 2023
TL;DR: In this paper , the authors proposed two strategies to improve the efficiency of distributed training of embedding-based deep recommendation models (EDRMs), i.e., embedding tiering and pre-fetching.
2
•Posted Content
Elastic deep learning in multi-tenant GPU cluster
TL;DR: In this paper, the authors study how to support elasticity, i.e., the ability to dynamically adjust the parallelism (number of GPUs), for deep neural network (DNN) training.
2