Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV).
Been Kim,Martin Wattenberg,Justin Gilmer,Carrie Cai,James Wexler,Fernanda Viegas,Rory Sayres +6 more
223
TL;DR: Researchers introduce Concept Activation Vectors (CAVs) to interpret deep learning models, enabling quantitative testing of concept importance through directional derivatives, and demonstrate its application in image classification and medical domains for hypothesis exploration and insight generation.
read more
Abstract: The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts. To address these challenges, we introduce Concept Activation Vectors (CAVs), which provide an interpretation of a neural net's internal state in terms of human-friendly concepts. The key idea is to view the high-dimensional internal state of a neural net as an aid, not an obstacle. We show how to use CAVs as part of a technique, Testing with CAVs (TCAV), that uses directional derivatives to quantify the degree to which a user-defined concept is important to a classification result--for example, how sensitive a prediction of zebra is to the presence of stripes. Using the domain of image classification as a testing ground, we describe how CAVs may be used to explore hypotheses and generate insights for a standard image classification network as well as a medical application.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI
Alejandro Barredo Arrieta,Natalia Díaz-Rodríguez,Javier Del Ser,Javier Del Ser,Adrien Bennetot,Adrien Bennetot,Siham Tabik,Alberto Barbado,Salvador García,Sergio Gil-Lopez,Daniel Molina,Richard Benjamins,Raja Chatila,Francisco Herrera +13 more
TL;DR: In this paper, a taxonomy of recent contributions related to explainability of different machine learning models, including those aimed at explaining Deep Learning methods, is presented, and a second dedicated taxonomy is built and examined in detail.
4.7K
Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications
Wojciech Samek,Grégoire Montavon,Sebastian Lapuschkin,Christopher J. Anders,Klaus-Robert Müller +4 more
TL;DR: In this paper, the authors provide a timely overview of explainable AI, with a focus on 'post-hoc' explanations, explain its theoretical foundations, and put interpretability algorithms to a test both from a theory and comparative evaluation perspective using extensive simulations.
709
A Survey on Neural Network Interpretability
Yu Zhang,Peter Tino,Ales Leonardis,Ke Tang +3 more
- 24 Aug 2021
TL;DR: A comprehensive review of the neural network interpretability research can be found in this paper, where a novel taxonomy organized along three dimensions: type of engagement (passive vs. active interpretation approaches), the type of explanation, and the focus (from local to global interpretability).
708
Towards Explainable Artificial Intelligence
Wojciech Samek,Klaus-Robert Müller,Klaus-Robert Müller,Klaus-Robert Müller +3 more
- 10 Sep 2019
TL;DR: This introductory paper presents recent developments and applications in deep learning, and makes a plea for a wider use of explainable learning algorithms in practice.
589
Understanding the role of individual units in a deep neural network.
TL;DR: This work presents network dissection, an analytic framework to systematically identify the semantics of individual hidden units within image classification and image generation networks, and applies it to understanding adversarial attacks and to semantic image editing.
478
References
Deep Residual Learning for Image Recognition
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Going deeper with convolutions
Christian Szegedy,Wei Liu,Yangqing Jia,Pierre Sermanet,Scott Reed,Dragomir Anguelov,Dumitru Erhan,Vincent Vanhoucke,Andrew Rabinovich +8 more
- 07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky,Jia Deng,Hao Su,Jonathan Krause,Sanjeev Satheesh,Sean Ma,Zhiheng Huang,Andrej Karpathy,Aditya Khosla,Michael S. Bernstein,Alexander C. Berg,Li Fei-Fei +11 more
TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Rethinking the Inception Architecture for Computer Vision
Christian Szegedy,Vincent Vanhoucke,Sergey Ioffe,Jonathon Shlens,Zbigniew Wojna +4 more
- 27 Jun 2016
TL;DR: In this article, the authors explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.
27.9K
Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks
Jun-Yan Zhu,Taesung Park,Phillip Isola,Alexei A. Efros +3 more
- 01 Oct 2017
TL;DR: CycleGAN as discussed by the authors learns a mapping G : X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss.
19.5K