| 2020 | CVPR | Explaining Knowledge Distillation by Quantifying the Knowledge | 81 | |
| 2020 | CVPR | High-frequency Component Helps Explain the Generalization of Convolutional Neural Networks | 289 | |
| 2020 | CVPRW | Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks | 414 | Pytorch |
| 2020 | ICLR | Knowledge consistency between neural networks and beyond | 28 | |
| 2020 | ICLR | Interpretable Complex-Valued Neural Networks for Privacy Protection | 23 | |
| 2019 | AI | Explanation in artificial intelligence: Insights from the social sciences | 3248 | |
| 2019 | NMI | Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead | 3505 | |
| 2019 | NeurIPS | Can you trust your model's uncertainty? Evaluating predictive uncertainty under dataset shift | 1052 | - |
| 2019 | NeurIPS | This looks like that: deep learning for interpretable image recognition | 665 | Pytorch |
| 2019 | NeurIPS | A benchmark for interpretability methods in deep neural networks | 413 | |
| 2019 | NeurIPS | Full-gradient representation for neural network visualization | 155 | |
| 2019 | NeurIPS | On the (In) fidelity and Sensitivity of Explanations | 226 | |
| 2019 | NeurIPS | Towards Automatic Concept-based Explanations | 342 | Tensorflow |
| 2019 | NeurIPS | CXPlain: Causal explanations for model interpretation under uncertainty | 133 | |
| 2019 | CVPR | Interpreting CNNs via Decision Trees | 293 | |
| 2019 | CVPR | From Recognition to Cognition: Visual Commonsense Reasoning | 544 | Pytorch |
| 2019 | CVPR | Attention branch network: Learning of attention mechanism for visual explanation | 371 | |
| 2019 | CVPR | Interpretable and fine-grained visual explanations for convolutional neural networks | 116 | |
| 2019 | CVPR | Learning to Explain with Complemental Examples | 36 | |
| 2019 | CVPR | Revealing Scenes by Inverting Structure from Motion Reconstructions | 84 | Tensorflow |
| 2019 | CVPR | Multimodal Explanations by Predicting Counterfactuality in Videos | 26 | |
| 2019 | CVPR | Visualizing the Resilience of Deep Convolutional Network Interpretations | 2 | |
| 2019 | ICCV | U-CAM: Visual Explanation using Uncertainty based Class Activation Maps | 61 | |
| 2019 | ICCV | Towards Interpretable Face Recognition | 66 | |
| 2019 | ICCV | Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded | 163 | |
| 2019 | ICCV | Understanding Deep Networks via Extremal Perturbations and Smooth Masks | 276 | Pytorch |
| 2019 | ICCV | Explaining Neural Networks Semantically and Quantitatively | 49 | |
| 2019 | ICLR | Hierarchical interpretations for neural network predictions | 111 | Pytorch |
| 2019 | ICLR | How Important Is a Neuron? | 101 | |
| 2019 | ICLR | Visual Explanation by Interpretation: Improving Visual Feedback Capabilities of Deep Neural Networks | 56 | |
| 2018 | ICML | Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples | 169 | Pytorch |
| 2019 | ICML | Towards A Deep and Unified Understanding of Deep Neural Models in NLP | 80 | Pytorch |
| 2019 | ICAIS | Interpreting black box predictions using fisher kernels | 80 | |
| 2019 | ACMFAT | Explaining explanations in AI | 558 | |
| 2019 | AAAI | Interpretation of neural networks is fragile | 597 | Tensorflow |
| 2019 | AAAI | Classifier-agnostic saliency map extraction | 23 | |
| 2019 | AAAI | Can You Explain That? Lucid Explanations Help Human-AI Collaborative Image Retrieval | 11 | |
| 2019 | AAAIW | Unsupervised Learning of Neural Networks to Explain Neural Networks | 28 | |
| 2019 | AAAIW | Network Transplanting | 4 | |
| 2019 | CSUR | A Survey of Methods for Explaining Black Box Models | 3088 | |
| 2019 | JVCIR | Interpretable convolutional neural networks via feedforward design | 134 | Keras |
| 2019 | ExplainAI | The (Un)reliability of saliency methods | 515 | |
| 2019 | ACL | Attention is not Explanation | 920 | |
| 2019 | EMNLP | Attention is not not Explanation | 667 | |
| 2019 | arxiv | Attention Interpretability Across NLP Tasks | 129 | |
| 2019 | arxiv | Interpretable CNNs | 2 | |
| 2018 | ICLR | Towards better understanding of gradient-based attribution methods for deep neural networks | 775 | |
| 2018 | ICLR | Learning how to explain neural networks: PatternNet and PatternAttribution | 342 | |
| 2018 | ICLR | On the importance of single directions for generalization | 282 | Pytorch |
| 2018 | ICLR | Detecting statistical interactions from neural network weights | 148 | Pytorch |
| 2018 | ICLR | Interpretable counting for visual question answering | 55 | Pytorch |
| 2018 | CVPR | Interpretable Convolutional Neural Networks | 677 | |
| 2018 | CVPR | Tell me where to look: Guided attention inference network | 454 | Chainer |
| 2018 | CVPR | Multimodal Explanations: Justifying Decisions and Pointing to the Evidence | 349 | Caffe |
| 2018 | CVPR | Transparency by design: Closing the gap between performance and interpretability in visual reasoning | 180 | Pytorch |
| 2018 | CVPR | Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks | 186 | |
| 2018 | CVPR | What have we learned from deep representations for action recognition? | 52 | |
| 2018 | CVPR | Learning to Act Properly: Predicting and Explaining Affordances from Images | 57 | |
| 2018 | CVPR | Teaching Categories to Human Learners with Visual Explanations | 64 | Pytorch |
| 2018 | CVPR | What do deep networks like to see? | 36 | |
| 2018 | CVPR | Interpret Neural Networks by Identifying Critical Data Routing Paths | 73 | Tensorflow |
| 2018 | ECCV | Deep clustering for unsupervised learning of visual features | 2056 | Pytorch |
| 2018 | ECCV | Explainable neural computation via stack neural module networks | 164 | Tensorflow |
| 2018 | ECCV | Grounding visual explanations | 184 | |
| 2018 | ECCV | Textual explanations for self-driving vehicles | 196 | |
| 2018 | ECCV | Interpretable basis decomposition for visual explanation | 228 | Pytorch |
| 2018 | ECCV | Convnets and imagenet beyond accuracy: Understanding mistakes and uncovering biases | 147 | |
| 2018 | ECCV | Vqa-e: Explaining, elaborating, and enhancing your answers for visual questions | 71 | |
| 2018 | ECCV | Choose Your Neuron: Incorporating Domain Knowledge through Neuron-Importance | 41 | Pytorch |
| 2018 | ECCV | Diverse feature visualizations reveal invariances in early layers of deep neural networks | 23 | Tensorflow |
| 2018 | ECCV | ExplainGAN: Model Explanation via Decision Boundary Crossing Transformations | 36 | |
| 2018 | ICML | Interpretability beyond feature attribution: Quantitative testing with concept activation vectors | 1130 | Tensorflow |
| 2018 | ICML | Learning to explain: An information-theoretic perspective on model interpretation | 421 | |
| 2018 | ACL | Did the Model Understand the Question? | 171 | Tensorflow |
| 2018 | FITEE | Visual interpretability for deep learning: a survey | 731 | |
| 2018 | NeurIPS | Sanity Checks for Saliency Maps | 1353 | |
| 2018 | NeurIPS | Explanations based on the missing: Towards contrastive explanations with pertinent negatives | 443 | Tensorflow |
| 2018 | NeurIPS | Towards robust interpretability with self-explaining neural networks | 648 | Pytorch |
| 2018 | NeurIPS | Attacks meet interpretability: Attribute-steered detection of adversarial samples | 142 | |
| 2018 | NeurIPS | DeepPINK: reproducible feature selection in deep neural networks | 125 | Keras |
| 2018 | NeurIPS | Representer point selection for explaining deep neural networks | 182 | Tensorflow |
| 2018 | NeurIPS Workshop | Interpretable convolutional filters with sincNet | 97 | |
| 2018 | AAAI | Anchors: High-precision model-agnostic explanations | 1517 | |
| 2018 | AAAI | Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients | 537 | Tensorflow |
| 2018 | AAAI | Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions | 396 | Tensorflow |
| 2018 | AAAI | Interpreting CNN Knowledge via an Explanatory Graph | 199 | Matlab |
| 2018 | AAAI | Examining CNN Representations with respect to Dataset Bias | 88 | |
| 2018 | WACV | Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks | 1459 | |
| 2018 | IJCV | Top-down neural attention by excitation backprop | 778 | |
| 2018 | TPAMI | Interpreting deep visual representations via network dissection | 252 | |
| 2018 | DSP | Methods for interpreting and understanding deep neural networks | 2046 | |
| 2018 | Access | Peeking inside the black-box: A survey on Explainable Artificial Intelligence (XAI) | 3110 | |
| 2018 | JAIR | Learning Explanatory Rules from Noisy Data | 440 | Tensorflow |
| 2018 | MIPRO | Explainable artificial intelligence: A survey | 794 | |
| 2018 | BMVC | Rise: Randomized input sampling for explanation of black-box models | 657 | |
| 2018 | arxiv | Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation | 194 | |
| 2018 | arxiv | Manipulating and measuring model interpretability | 496 | |
| 2018 | arxiv | How convolutional neural network see the world-A survey of convolutional neural network visualization methods | 211 | |
| 2018 | arxiv | Revisiting the importance of individual units in cnns via ablation | 93 | |
| 2018 | arxiv | Computationally Efficient Measures of Internal Neuron Importance | 10 | |
| 2017 | ICML | Understanding Black-box Predictions via Influence Functions | 2062 | Pytorch |
| 2017 | ICML | Axiomatic attribution for deep networks | 3654 | Keras |
| 2017 | ICML | Learning Important Features Through Propagating Activation Differences | 2835 | |
| 2017 | ICLR | Visualizing deep neural network decisions: Prediction difference analysis | 674 | Caffe |
| 2017 | ICLR | Exploring LOTS in Deep Neural Networks | 34 | |
| 2017 | NeurIPS | A Unified Approach to Interpreting Model Predictions | 11511 | |
| 2017 | NeurIPS | Real time image saliency for black box classifiers | 483 | Pytorch |
| 2017 | NeurIPS | SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability | 473 | |
| 2017 | CVPR | Mining Object Parts from CNNs via Active Question-Answering | 29 | |
| 2017 | CVPR | Network dissection: Quantifying interpretability of deep visual representations | 1254 | |
| 2017 | CVPR | Improving Interpretability of Deep Neural Networks with Semantic Information | 118 | |
| 2017 | CVPR | MDNet: A Semantically and Visually Interpretable Medical Image Diagnosis Network | 307 | Torch |
| 2017 | CVPR | Making the V in VQA matter: Elevating the role of image understanding in Visual Question Answering | 1686 | |
| 2017 | CVPR | Knowing when to look: Adaptive attention via a visual sentinel for image captioning | 1392 | Torch |
| 2017 | CVPRW | Interpretable 3d human action analysis with temporal convolutional networks | 539 | |
| 2017 | ICCV | Grad-cam: Visual explanations from deep networks via gradient-based localization | 13006 | Pytorch |
| 2017 | ICCV | Interpretable Explanations of Black Boxes by Meaningful Perturbation | 1293 | Pytorch |
| 2017 | ICCV | Interpretable Learning for Self-Driving Cars by Visualizing Causal Attention | 323 | |
| 2017 | ICCV | Understanding and comparing deep neural networks for age and gender classification | 130 | |
| 2017 | ICCV | Learning to disambiguate by asking discriminative questions | 26 | |
| 2017 | IJCAI | Right for the right reasons: Training differentiable models by constraining their explanations | 429 | |
| 2017 | IJCAI | Understanding and improving convolutional neural networks via concatenated rectified linear units | 510 | Caffe |
| 2017 | AAAI | Growing Interpretable Part Graphs on ConvNets via Multi-Shot Learning | 67 | Matlab |
| 2017 | ACL | Visualizing and Understanding Neural Machine Translation | 179 | |
| 2017 | EMNLP | A causal framework for explaining the predictions of black-box sequence-to-sequence models | 192 | |
| 2017 | CVPR Workshop | Looking under the hood: Deep neural network visualization to interpret whole-slide image analysis outcomes for colorectal polyps | 47 | |
| 2017 | survey | Interpretability of deep learning models: a survey of results | 345 | |
| 2017 | arxiv | SmoothGrad: removing noise by adding noise | 1479 | |
| 2017 | arxiv | Interpretable & explorable approximations of black box models | 259 | |
| 2017 | arxiv | Distilling a neural network into a soft decision tree | 520 | Pytorch |
| 2017 | arxiv | Towards interpretable deep neural networks by leveraging adversarial examples | 111 | |
| 2017 | arxiv | Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models | 1279 | |
| 2017 | arxiv | Contextual Explanation Networks | 77 | Pytorch |
| 2017 | arxiv | Challenges for transparency | 142 | |
| 2017 | ACMSOPP | Deepxplore: Automated whitebox testing of deep learning systems | 1144 | |
| 2017 | CEURW | What does explainable AI really mean? A new conceptualization of perspectives | 518 | |
| 2017 | TVCG | ActiVis: Visual Exploration of Industry-Scale Deep Neural Network Models | 346 | |
| 2016 | NeurIPS | Synthesizing the preferred inputs for neurons in neural networks via deep generator networks | 659 | Caffe |
| 2016 | NeurIPS | Understanding the effective receptive field in deep convolutional neural networks | 1356 | |
| 2016 | CVPR | Inverting Visual Representations with Convolutional Networks | 626 | |
| 2016 | CVPR | Visualizing and Understanding Deep Texture Representations | 147 | |
| 2016 | CVPR | Analyzing Classifiers: Fisher Vectors and Deep Neural Networks | 191 | |
| 2016 | ECCV | Generating Visual Explanations | 613 | Caffe |
| 2016 | ECCV | Design of kernels in convolutional neural networks for image classification | 24 | |
| 2016 | ICML | Understanding and improving convolutional neural networks via concatenated rectified linear units | 510 | |
| 2016 | ICML | Visualizing and comparing AlexNet and VGG using deconvolutional layers | 126 | |
| 2016 | EMNLP | Rationalizing Neural Predictions | 738 | Pytorch |
| 2016 | IJCV | Visualizing deep convolutional neural networks using natural pre-images | 508 | Matlab |
| 2016 | IJCV | Visualizing Object Detection Features | 38 | Caffe |
| 2016 | KDD | Why should i trust you?: Explaining the predictions of any classifier | 11742 | |
| 2016 | TVCG | Visualizing the hidden activity of artificial neural networks | 309 | |
| 2016 | TVCG | Towards better analysis of deep convolutional neural networks | 474 | |
| 2016 | NAACL | Visualizing and understanding neural models in nlp | 650 | Torch |
| 2016 | arxiv | Understanding neural networks through representation erasure) | 492 | |
| 2016 | arxiv | Grad-CAM: Why did you say that? | 398 | |
| 2016 | arxiv | Investigating the influence of noise and distractors on the interpretation of neural networks | 108 | |
| 2016 | arxiv | Attentive Explanations: Justifying Decisions and Pointing to the Evidence | 88 | |
| 2016 | arxiv | The Mythos of Model Interpretability | 3786 | |
| 2016 | arxiv | Multifaceted feature visualization: Uncovering the different types of features learned by each neuron in deep neural networks | 317 | |
| 2015 | ICLR | Striving for Simplicity: The All Convolutional Net | 4645 | Pytorch |
| 2015 | CVPR | Understanding deep image representations by inverting them | 1942 | Matlab |
| 2015 | ICCV | Understanding deep features with computer-generated imagery | 156 | Caffe |
| 2015 | ICML Workshop | Understanding Neural Networks Through Deep Visualization | 2038 | Tensorflow |
| 2015 | AAS | Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model | 749 | |
| 2014 | ECCV | Visualizing and Understanding Convolutional Networks | 18604 | Pytorch |
| 2014 | ICLR | Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps | 6142 | Pytorch |
| 2013 | ICCV | Hoggles: Visualizing object detection features | 352 | |