Fathom: Reference Workloads for Modern Deep Learning Methods
TL;DR: Fathom as discussed by the authors is a collection of eight archetypal deep learning workloads for study, ranging from the familiar deep convolutional neural network of Krizhevsky et al. to the more exotic memory networks from Facebook's AI research group.
read more
Abstract: Deep learning has been popularized by its recent successes on challenging artificial intelligence problems. One of the reasons for its dominance is also an ongoing challenge: the need for immense amounts of computational power. Hardware architects have responded by proposing a wide array of promising ideas, but to date, the majority of the work has focused on specific algorithms in somewhat narrow application domains. While their specificity does not diminish these approaches, there is a clear need for more flexible solutions. We believe the first step is to examine the characteristics of cutting edge models from across the deep learning community.
Consequently, we have assembled Fathom: a collection of eight archetypal deep learning workloads for study. Each of these models comes from a seminal work in the deep learning community, ranging from the familiar deep convolutional neural network of Krizhevsky et al., to the more exotic memory networks from Facebook's AI research group. Fathom has been released online, and this paper focuses on understanding the fundamental performance characteristics of each model. We use a set of application-level modeling tools built around the TensorFlow deep learning framework in order to analyze the behavior of the Fathom workloads. We present a breakdown of where time is spent, the similarities between the performance profiles of our models, an analysis of behavior in inference and training, and the effects of parallelism on scaling.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Posted Content
In-Datacenter Performance Analysis of a Tensor Processing Unit
Norman P. Jouppi,Cliff Young,Nishant Patil,David A. Patterson,Gaurav Agrawal,Raminder Bajwa,Sarah Bates,Suresh Bhatia,Nan Boden,Albert T. Borchers,Rick Boyle,Pierre-luc Cantin,Clifford Chao,Christopher Aaron Clark,Jeremy Coriell,Michael J. Daley,Matt Dau,Jeffrey Dean,Ben Gelb,Tara Vazir Ghaemmaghami,Rajendra Gottipati,William John Gulland,Robert Hagmann,C. Richard Ho,Doug Hogberg,John Hu,Robert Hundt,D. Hurt,Julian Ibarz,Aaron Jaffey,Alek Jaworski,Alexander Kaplan,Khaitan Harshit,Andy Koch,Naveen Kumar,Steve Lacy,James Laudon,James Law,Diemthu Le,Chris Leary,Zhuyuan Liu,Kyle Lucke,Alan Lundin,Gordon MacKean,Adriana Maggiore,Maire Mahony,Kieran Miller,Rahul Nagarajan,Ravi Narayanaswami,Ray Ni,Kathy Nix,Thomas Norrie,Mark Omernick,Narayana Penukonda,Andrew Everett Phelps,Jonathan Ross,Matt Ross,Amir Salek,Emad Samadiani,Chris Severn,Gregory Sizikov,Matthew Snelham,Jed Souter,Dan Steinberg,Andy Swing,Mercedes Tan,Gregory Michael Thorson,Bo Tian,Horia Toma,Erick Tuttle,Vijay K. Vasudevan,Richard Walter,Walter Wang,Eric Wilcox,Doe Hyun Yoon +74 more
TL;DR: This paper evaluates a custom ASIC-called a Tensor Processing Unit (TPU)-deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN) and compares it to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the samedatacenters.
In-Datacenter Performance Analysis of a Tensor Processing Unit
Norman P. Jouppi,Cliff Young,Nishant Patil,David A. Patterson,Gaurav Agrawal,Raminder Bajwa,Sarah Bates,Suresh Bhatia,Nan Boden,Albert T. Borchers,Rick Boyle,Pierre-luc Cantin,Clifford Chao,Christopher Aaron Clark,Jeremy Coriell,Michael J. Daley,Matt Dau,Jeffrey Dean,Ben Gelb,Tara Vazir Ghaemmaghami,Rajendra Gottipati,William John Gulland,Robert Hagmann,C. Richard Ho,Doug Hogberg,John Hu,Robert Hundt,D. Hurt,Julian Ibarz,Aaron Jaffey,Alek Jaworski,Alexander Kaplan,Khaitan Harshit,Daniel Killebrew,Andy Koch,Naveen Kumar,Steve Lacy,James Laudon,James Law,Diemthu Le,Chris Leary,Zhuyuan Liu,Kyle Lucke,Alan Lundin,Gordon MacKean,Adriana Maggiore,Maire Mahony,Kieran Miller,Rahul Nagarajan,Ravi Narayanaswami,Ray Ni,Kathy Nix,Thomas Norrie,Mark Omernick,Narayana Penukonda,Andrew Everett Phelps,Jonathan Ross,Matt Ross,Amir Salek,Emad Samadiani,Chris Severn,Gregory Sizikov,Matthew Snelham,Jed Souter,Dan Steinberg,Andy Swing,Mercedes Tan,Gregory Michael Thorson,Bo Tian,Horia Toma,Erick Tuttle,Vijay K. Vasudevan,Richard Walter,Walter Wang,Eric Wilcox,Doe Hyun Yoon +75 more
- 24 Jun 2017
TL;DR: The Tensor Processing Unit (TPU) as discussed by the authors is a custom ASIC deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN) using a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS).
MLPerf inference benchmark
Vijay Janapa Reddi,Christine Cheng,David Kanter,Peter Mattson,Guenther Schmuelling,Carole-Jean Wu,Brian M. Anderson,Maximilien Breughe,Mark Charlebois,William Chou,Ramesh Chukka,Cody Coleman,Sam Davis,Pan Deng,Greg Diamos,Jared Duke,Dave Fick,J. Scott Gardner,Itay Hubara,Sachin Satish Idgunji,Thomas B. Jablin,Jeff Jiao,Tom St. John,Pankaj Kanwar,David Lee,Jeffery Liao,Anton Lokhmotov,Francisco Massa,Peng Meng,Paulius Micikevicius,Colin Osborne,Gennady Pekhimenko,Arun Tejusve Raghunath Rajan,Dilip Sequeira,Ashish Sirasao,Fei Sun,Hanlin Tang,Michael Thomson,Frank Wei,Ephrem C. Wu,Lingjie Xu,Koichi Yamada,Bing Yu,George Yuan,Aaron Zhong,Peizhao Zhang,Yuchen Zhou +46 more
- 30 May 2020
TL;DR: This paper presents the benchmarking method for evaluating ML inference systems, MLPerf Inference, and prescribes a set of rules and best practices to ensure comparability across systems with wildly differing architectures.
554
Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks
Amirali Boroumand,Saugata Ghose,Youngsok Kim,Rachata Ausavarungnirun,Eric Shiu,Rahul Thakur,Dae Hyun Kim,Aki Kuusela,Allan Knies,Parthasarathy Ranganathan,Onur Mutlu +10 more
- 19 Mar 2018
TL;DR: This work comprehensively analyzes the energy and performance impact of data movement for several widely-used Google consumer workloads, and finds that processing-in-memory (PIM) can significantly reduceData movement for all of these workloads by performing part of the computation close to memory.
368
The Architectural Implications of Facebook's DNN-Based Personalized Recommendation
Udit Gupta,Carole-Jean Wu,Xiaodong Wang,Maxim Naumov,Brandon Reagen,David Brooks,Bradford Cottel,Kim Hazelwood,Mark Hempstead,Bill Jia,Hsien-Hsin S. Lee,Andrey Malevich,Dheevatsa Mudigere,Mikhail Smelyanskiy,Liang Xiong,Xuan Zhang +15 more
- 01 Feb 2020
TL;DR: A set of real-world, production-scale DNNs for personalized recommendation coupled with relevant performance metrics for evaluation are presented and in-depth analysis is conducted that underpins future system design and optimization for at-scale recommendation.
318
References
Deep Residual Learning for Image Recognition
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
•Posted Content
Deep Residual Learning for Image Recognition
TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
117.9K
•Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
- 04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
102.6K
•Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton +2 more
- 03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
ImageNet: A large-scale hierarchical image database
Jia Deng,Wei Dong,Richard Socher,Li-Jia Li,Kai Li,Li Fei-Fei +5 more
- 20 Jun 2009
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Related Papers (5)
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 27 Jun 2016
Karen Simonyan,Andrew Zisserman +1 more
- 04 Sep 2014