Open AccessProceedings Article
Training Set Debugging Using Trusted Items.
Xuezhou Zhang,Xiaojin Zhu,Stephen J. Wright +2 more
- 29 Apr 2018
pp 4482-4489
TL;DR: In this paper, the authors propose an approach to identify training set bugs in the training set and suggest appropriate changes to the labels to improve the learning of machine learning models, which is a step toward trustworthy machine learning.
read more
Abstract: Training set bugs are flaws in the data that adversely affect machine learning. The training set is usually too large for manual inspection, but one may have the resources to verify a few trusted items. The set of trusted items may not by itself be adequate for learning, so we propose an algorithm that uses these items to identify bugs in the training set and thus improves learning. Specifically, our approach seeks the smallest set of changes to the training set labels such that the model learned from this corrected training set predicts labels of the trusted items correctly. We flag the items whose labels are changed as potential bugs, whose labels can be checked for veracity by human experts. To find the bugs in this way is a challenging combinatorial bilevel optimization problem, but it can be relaxed into a continuous optimization problem.Experiments on toy and real data demonstrate that our approach can identify training set bugs effectively and suggest appropriate changes to the labels. Our algorithm is a step toward trustworthy machine learning.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Semantically Equivalent Adversarial Rules for Debugging NLP models
Marco Tulio Ribeiro,Sameer Singh,Carlos Guestrin +2 more
- 01 Jul 2018
TL;DR: This work presents semantically equivalent adversaries (SEAs) – semantic-preserving perturbations that induce changes in the model’s predictions that induce adversaries on many instances that are extremely similar semantically.
O2U-Net: A Simple Noisy Label Detection Approach for Deep Neural Networks
Jinchi Huang,Lie Qu,Rongfei Jia,Binqiang Zhao +3 more
- 01 Oct 2019
TL;DR: This paper proposes a novel noisy label detection approach, named O2U-net, for deep neural networks without human annotations, which only requires adjusting the hyper-parameters of the deep network to make its status transfer from overfitting to underfitting (O2U) cyclically.
De-Pois: An Attack-Agnostic Defense against Data Poisoning Attacks
TL;DR: De-Pois as discussed by the authors is an attack-agnostic defense against data poisoning attacks, where a mimic model is trained to imitate the behavior of the target model trained by clean samples.
AutoTrainer: An Automatic DNN Training Problem Detection and Repair System
Xiaoyu Zhang,Juan Zhai,Shiqing Ma,Chao Shen +3 more
- 22 May 2021
TL;DR: Autotrainer as discussed by the authors is a DNN training monitoring and automatic repairing tool which supports detecting and auto repairing five commonly seen training problems during training, it periodically checks the training status and detects potential problems.
67
Trust-Based Cloud Machine Learning Model Selection for Industrial IoT and Smart City Services
TL;DR: The proposed heuristic comprises an intelligent polynomial-time heuristic that maximizes the level of trust of ML models by selecting and switching between a subset of the ML models from a superset of models in order to maximize the trustworthiness while respecting the given reconfiguration budget/rate and reducing the cloud communication overhead.
49