Training Set Debugging Using Trusted Items.

Open AccessProceedings Article

Training Set Debugging Using Trusted Items.

- 29 Apr 2018

pp 4482-4489

36

TL;DR: In this paper, the authors propose an approach to identify training set bugs in the training set and suggest appropriate changes to the labels to improve the learning of machine learning models, which is a step toward trustworthy machine learning.

Abstract: Training set bugs are flaws in the data that adversely affect machine learning. The training set is usually too large for manual inspection, but one may have the resources to verify a few trusted items. The set of trusted items may not by itself be adequate for learning, so we propose an algorithm that uses these items to identify bugs in the training set and thus improves learning. Specifically, our approach seeks the smallest set of changes to the training set labels such that the model learned from this corrected training set predicts labels of the trusted items correctly. We flag the items whose labels are changed as potential bugs, whose labels can be checked for veracity by human experts. To find the bugs in this way is a challenging combinatorial bilevel optimization problem, but it can be relaxed into a continuous optimization problem.Experiments on toy and real data demonstrate that our approach can identify training set bugs effectively and suggest appropriate changes to the labels. Our algorithm is a step toward trustworthy machine learning.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.18653/V1/P18-1079

Semantically Equivalent Adversarial Rules for Debugging NLP models

Marco Tulio Ribeiro, +2 more

- 01 Jul 2018

TL;DR: This work presents semantically equivalent adversaries (SEAs) – semantic-preserving perturbations that induce changes in the model’s predictions that induce adversaries on many instances that are extremely similar semantically.

...read moreread less

669

Proceedings Article•10.1109/ICCV.2019.00342

O2U-Net: A Simple Noisy Label Detection Approach for Deep Neural Networks

Jinchi Huang, +3 more

- 01 Oct 2019

TL;DR: This paper proposes a novel noisy label detection approach, named O2U-net, for deep neural networks without human annotations, which only requires adjusting the hyper-parameters of the deep network to make its status transfer from overfitting to underfitting (O2U) cyclically.

...read moreread less

314

•Journal Article•10.1109/TIFS.2021.3080522

De-Pois: An Attack-Agnostic Defense against Data Poisoning Attacks

Jian Chen, +4 more

- 14 May 2021

- IEEE Transactions on Information Forensi...

TL;DR: De-Pois as discussed by the authors is an attack-agnostic defense against data poisoning attacks, where a mimic model is trained to imitate the behavior of the target model trained by clean samples.

...read moreread less

108

Proceedings Article•10.1109/ICSE43902.2021.00043

AutoTrainer: An Automatic DNN Training Problem Detection and Repair System

Xiaoyu Zhang, +3 more

- 22 May 2021

TL;DR: Autotrainer as discussed by the authors is a DNN training monitoring and automatic repairing tool which supports detecting and auto repairing five commonly seen training problems during training, it periodically checks the training status and detects potential problems.

...read moreread less

67

•Journal Article•10.1109/JIOT.2020.3022323

Trust-Based Cloud Machine Learning Model Selection for Industrial IoT and Smart City Services

Basheer Qolomany, +4 more

- 15 Feb 2021

- IEEE Internet of Things Journal

TL;DR: The proposed heuristic comprises an intelligent polynomial-time heuristic that maximizes the level of trust of ML models by selecting and switching between a subset of the ML models from a superset of models in order to maximize the trustworthiness while respecting the given reconfiguration budget/rate and reducing the cloud communication overhead.

...read moreread less

49