Training Complex Models with Multi-Task Weak Supervision.

doi:10.1609/AAAI.V33I01.33014763

Open AccessJournal Article10.1609/AAAI.V33I01.33014763

Training Complex Models with Multi-Task Weak Supervision.

Alexander Ratner, +5 more

- 17 Jul 2019

- Vol. 33, Iss: 01, pp 4763-4771

247

TL;DR: This work shows that by solving a matrix completion-style problem, it can recover the accuracies of these multi-task sources given their dependency structure, but without any labeled data, leading to higher-quality supervision for training an end model.

Abstract: As machine learning models continue to increase in complexity, collecting large hand-labeled training sets has become one of the biggest roadblocks in practice. Instead, weaker forms of supervision that provide noisier but cheaper labels are often used. However, these weak supervision sources have diverse and unknown accuracies, may output correlated labels, and may label different tasks or apply at different levels of granularity. We propose a framework for integrating and modeling such weak supervision sources by viewing them as labeling different related sub-tasks of a problem, which we refer to as the multi-task weak supervision setting. We show that by solving a matrix completion-style problem, we can recover the accuracies of these multi-task sources given their dependency structure, but without any labeled data, leading to higher-quality supervision for training an end model. Theoretically, we show that the generalization error of models trained with this approach improves with the number of unlabeled data points, and characterize the scaling with respect to the task and dependency structures. On three fine-grained classification problems, we show that our approach leads to average gains of 20.2 points in accuracy over a traditional supervised approach, 6.8 points over a majority vote baseline, and 4.1 points over a previously proposed weak supervision method that models tasks separately.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1145/3439726

Deep Learning--based Text Classification: A Comprehensive Review

Shervin Minaee, +5 more

- 17 Apr 2021

- ACM Computing Surveys

TL;DR: This paper provided a comprehensive review of more than 150 deep learning-based models for text classification developed in recent years, and discussed their technical contributions, similarities, and strengths, and provided a quantitative analysis of the performance of different deep learning models on popular benchmarks.

...read moreread less

1K

•Journal Article•10.14778/3157794.3157797

Snorkel: Rapid Training Data Creation with Weak Supervision

Alexander Ratner, +5 more

- 28 Nov 2017

- arXiv: Learning

TL;DR: Snorkel is a first-of-its-kind system that enables users to train state- of- the-art models without hand labeling any training data and proposes an optimizer for automating tradeoff decisions that gives up to 1.8× speedup per pipeline execution.

...read moreread less

623

•Posted Content

Deep Learning Based Text Classification: A Comprehensive Review

Shervin Minaee, +5 more

- 06 Apr 2020

- arXiv: Computation and Language

TL;DR: A comprehensive review of more than 150 deep learning--based models for text classification developed in recent years is provided, and their technical contributions, similarities, and strengths are discussed.

...read moreread less

600

Journal Article•10.1007/s00778-019-00552-1

Snorkel: rapid training data creation with weak supervision

Alexander Ratner, +5 more

- 15 Jul 2019

- The Vldb Journal

TL;DR: Snorkel enables rapid training data creation with weak supervision by automating the process of labeling training data and incorporating data programming techniques.

...read moreread less

303

Journal Article•10.48550/arXiv.2304.14108

DataComp: In search of the next generation of multimodal datasets

Samir Yitzhak Gadre, +32 more

- 27 Apr 2023

- arXiv.org

TL;DR: DataComp as mentioned in this paper is a testbed for dataset experiments centered around a new candidate pool of 12.8 billion image-text pairs from Common Crawl, which can be used to design new filtering techniques or curate new data sources and then evaluate their new dataset by running our standardized CLIP training code and testing the resulting model on 38 downstream test sets.

...read moreread less

193

...

Expand

References

•Book

Probabilistic graphical models : principles and techniques

Daniel L. Koller, +1 more

- 31 Jul 2009

TL;DR: The framework of probabilistic graphical models, presented in this book, provides a general approach for causal reasoning and decision making under uncertainty, allowing interpretable models to be constructed and then manipulated by reasoning algorithms.

...read moreread less

8.5K

Proceedings Article•10.1145/279943.279962

Combining labeled and unlabeled data with co-training

Avrim Blum, +1 more

- 24 Jul 1998

TL;DR: A PAC-style analysis is provided for a problem setting motivated by the task of learning to classify web pages, in which the description of each example can be partitioned into two distinct views, to allow inexpensive unlabeled data to augment, a much smaller set of labeled examples.

...read moreread less

6.4K

•Proceedings Article•10.3115/1690219.1690287

Distant supervision for relation extraction without labeled data

Mike D. Mintz, +3 more

- 02 Aug 2009

TL;DR: This work investigates an alternative paradigm that does not require labeled corpora, avoiding the domain dependence of ACE-style algorithms, and allowing the use of corpora of any size.

...read moreread less

3.6K

•Posted Content

An Overview of Multi-Task Learning in Deep Neural Networks

Sebastian Ruder

- 15 Jun 2017

- arXiv: Learning

TL;DR: This article seeks to help ML practitioners apply MTL by shedding light on how MTL works and providing guidelines for choosing appropriate auxiliary tasks, particularly in deep neural networks.

...read moreread less

3.3K

Journal Article•10.2307/2346806

Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm

A. P. Dawid, +1 more

- 01 Mar 1979

- Journal of The Royal Statistical Society...

TL;DR: The EM algorithm is shown to provide a slow but sure way of obtaining maximum likelihood estimates of the parameters of interest in compiling a patient record.

...read moreread less

1.9K

...

Expand

Training Complex Models with Multi-Task Weak Supervision.

Chat with Paper

AI Agents for this Paper

Citations

Deep Learning--based Text Classification: A Comprehensive Review

Snorkel: Rapid Training Data Creation with Weak Supervision

Deep Learning Based Text Classification: A Comprehensive Review

Snorkel: rapid training data creation with weak supervision

DataComp: In search of the next generation of multimodal datasets

References

Probabilistic graphical models : principles and techniques

Combining labeled and unlabeled data with co-training

Distant supervision for relation extraction without labeled data

An Overview of Multi-Task Learning in Deep Neural Networks

Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm

Related Papers (5)

Distant supervision for relation extraction without labeled data

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm

Deep Residual Learning for Image Recognition

Combining labeled and unlabeled data with co-training