Top 1615 papers published in the topic of Generalization in 2019

Showing papers on "Generalization published in 2019"

Proceedings Article•10.1109/CVPR.2019.01091•

Meta-Learning With Differentiable Convex Optimization

[...]

Kwonjoon Lee¹, Subhransu Maji², Avinash Ravichandran³, Stefano Soatto³•Institutions (3)

University of California, San Diego¹, University of Massachusetts Amherst², Amazon.com³

15 Jun 2019

TL;DR: The objective is to learn feature embeddings that generalize well under a linear classification rule for novel categories and this work exploits two properties of linear classifiers: implicit differentiation of the optimality conditions of the convex problem and the dual formulation of the optimization problem.

...read moreread less

Abstract: Many meta-learning approaches for few-shot learning rely on simple base learners such as nearest-neighbor classifiers. However, even in the few-shot regime, discriminatively trained linear predictors can offer better generalization. We propose to use these predictors as base learners to learn representations for few-shot learning and show they offer better tradeoffs between feature size and performance across a range of few-shot recognition benchmarks. Our objective is to learn feature embeddings that generalize well under a linear classification rule for novel categories. To efficiently solve the objective, we exploit two properties of linear classifiers: implicit differentiation of the optimality conditions of the convex problem and the dual formulation of the optimization problem. This allows us to use high-dimensional embeddings with improved generalization at a modest increase in computational overhead. Our approach, named MetaOptNet, achieves state-of-the-art performance on miniImageNet, tieredImageNet, CIFAR-FS, and FC100 few-shot learning benchmarks.

...read moreread less

1,668 citations

Proceedings Article•

Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss

[...]

Kaidi Cao¹, Colin Wei², Adrien Gaidon³, Nikos Arechiga⁴, Tengyu Ma² - Show less +1 more•Institutions (4)

Tsinghua University¹, Stanford University², Intel³, Toyota⁴

1 Jan 2019

TL;DR: A theoretically-principled label-distribution-aware margin (LDAM) loss motivated by minimizing a margin-based generalization bound is proposed that replaces the standard cross-entropy objective during training and can be applied with prior strategies for training with class-imbalance such as re-weighting or re-sampling.

...read moreread less

Abstract: Deep learning algorithms can fare poorly when the training dataset suffers from heavy class-imbalance but the testing criterion requires good generalization on less frequent classes. We design two novel methods to improve performance in such scenarios. First, we propose a theoretically-principled label-distribution-aware margin (LDAM) loss motivated by minimizing a margin-based generalization bound. This loss replaces the standard cross-entropy objective during training and can be applied with prior strategies for training with class-imbalance such as re-weighting or re-sampling. Second, we propose a simple, yet effective, training schedule that defers re-weighting until after the initial stage, allowing the model to learn an initial representation while avoiding some of the complications associated with re-weighting or re-sampling. We test our methods on several benchmark vision tasks including the real-world imbalanced dataset iNaturalist 2018. Our experiments show that either of these methods alone can already improve over existing techniques and their combination achieves even better performance gains.

...read moreread less

1,431 citations

Posted Content•

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

[...]

Sanjeev Arora¹, Simon S. Du², Wei Hu¹, Zhiyuan Li¹, Ruosong Wang² - Show less +1 more•Institutions (2)

Princeton University¹, Carnegie Mellon University²

24 Jan 2019-arXiv: Learning

TL;DR: This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: a tighter characterization of training speed, an explanation for why training a neuralNet with random labels leads to slower training, and a data-dependent complexity measure.

...read moreread less

Abstract: Recent works have cast some light on the mystery of why deep nets fit any data and generalize despite being very overparametrized. This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: (i) Using a tighter characterization of training speed than recent papers, an explanation for why training a neural net with random labels leads to slower training, as originally observed in [Zhang et al. ICLR'17]. (ii) Generalization bound independent of network size, using a data-dependent complexity measure. Our measure distinguishes clearly between random labels and true labels on MNIST and CIFAR, as shown by experiments. Moreover, recent papers require sample complexity to increase (slowly) with the size, while our sample complexity is completely independent of the network size. (iii) Learnability of a broad class of smooth functions by 2-layer ReLU nets trained via gradient descent. The key idea is to track dynamics of training and generalization via properties of a related kernel.

...read moreread less

700 citations

Journal Article•10.1007/S00521-018-3521-2•

An approach toward decision-making and medical diagnosis problems using the concept of spherical fuzzy sets

[...]

Tahir Mahmood¹, Kifayat Ullah¹, Qaisar Khan¹, Naeem Jan¹•Institutions (1)

International Islamic University, Islamabad¹

01 Nov 2019-Neural Computing and Applications

TL;DR: The concept of spherical fuzzy set (SFS) and T-spherical fuzzy set [T-SFS] is introduced as a generalization of FS, IFS and PFS and shown by examples and graphical comparison with early established concepts.

...read moreread less

Abstract: Human opinion cannot be restricted to yes or no as depicted by conventional fuzzy set (FS) and intuitionistic fuzzy set (IFS) but it can be yes, abstain, no and refusal as explained by picture fuzzy set (PFS). In this article, the concept of spherical fuzzy set (SFS) and T-spherical fuzzy set (T-SFS) is introduced as a generalization of FS, IFS and PFS. The novelty of SFS and T-SFS is shown by examples and graphical comparison with early established concepts. Some operations of SFSs and T-SFSs along with spherical fuzzy relations are defined, and related results are conferred. Medical diagnostics and decision-making problem are discussed in the environment of SFSs and T-SFSs as practical applications.

...read moreread less

688 citations

Posted Content•

Fantastic Generalization Measures and Where to Find Them

[...]

Yiding Jiang¹, Behnam Neyshabur¹, Hossein Mobahi¹, Dilip Krishnan¹, Samy Bengio¹ - Show less +1 more•Institutions (1)

Google¹

04 Dec 2019-arXiv: Learning

TL;DR: This work presents the first large scale study of generalization in deep networks, investigating more then 40 complexity measures taken from both theoretical bounds and empirical studies and showing surprising failures of some measures as well as promising measures for further research.

...read moreread less

Abstract: Generalization of deep networks has been of great interest in recent years, resulting in a number of theoretically and empirically motivated complexity measures. However, most papers proposing such measures study only a small set of models, leaving open the question of whether the conclusion drawn from those experiments would remain valid in other settings. We present the first large scale study of generalization in deep networks. We investigate more then 40 complexity measures taken from both theoretical bounds and empirical studies. We train over 10,000 convolutional networks by systematically varying commonly used hyperparameters. Hoping to uncover potentially causal relationships between each measure and generalization, we analyze carefully controlled experiments and show surprising failures of some measures as well as promising measures for further research.

...read moreread less

462 citations

Posted Content•

Domain Generalization via Model-Agnostic Learning of Semantic Features

[...]

Qi Dou¹, Daniel Coelho de Castro¹, Konstantinos Kamnitsas¹, Ben Glocker¹•Institutions (1)

Imperial College London¹

29 Oct 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work investigates the challenging problem of domain generalization, i.e., training a model on multi-domain source data such that it can directly generalize to target domains with unknown statistics, and adopts a model-agnostic learning paradigm with gradient-based meta-train and meta-test procedures to expose the optimization to domain shift.

...read moreread less

Abstract: Generalization capability to unseen domains is crucial for machine learning models when deploying to real-world conditions. We investigate the challenging problem of domain generalization, i.e., training a model on multi-domain source data such that it can directly generalize to target domains with unknown statistics. We adopt a model-agnostic learning paradigm with gradient-based meta-train and meta-test procedures to expose the optimization to domain shift. Further, we introduce two complementary losses which explicitly regularize the semantic structure of the feature space. Globally, we align a derived soft confusion matrix to preserve general knowledge about inter-class relationships. Locally, we promote domain-independent class-specific cohesion and separation of sample features with a metric-learning component. The effectiveness of our method is demonstrated with new state-of-the-art results on two common object recognition benchmarks. Our method also shows consistent improvement on a medical image segmentation task.

...read moreread less

460 citations

Proceedings Article•10.1109/CVPR.2019.01026•

Multi-Adversarial Discriminative Deep Domain Generalization for Face Presentation Attack Detection

[...]

Rui Shao¹, Xiangyuan Lan¹, Jiawei Li², Pong C. Yuen¹•Institutions (2)

Southwest Baptist University¹, Hong Kong Baptist University²

15 Jun 2019

TL;DR: This work proposes to learn a generalized feature space via a novel multi-adversarial discriminative deep domain generalization framework under a dual-force triplet-mining constraint, which ensures that the learned feature space is discriminating and shared by multiple source domains, and thus more generalized to new face presentation attacks.

...read moreread less

Abstract: Face presentation attacks have become an increasingly critical issue in the face recognition community. Many face anti-spoofing methods have been proposed, but they cannot generalize well on "unseen" attacks. This work focuses on improving the generalization ability of face anti-spoofing methods from the perspective of the domain generalization. We propose to learn a generalized feature space via a novel multi-adversarial discriminative deep domain generalization framework. In this framework, a multi-adversarial deep domain generalization is performed under a dual-force triplet-mining constraint. This ensures that the learned feature space is discriminative and shared by multiple source domains, and thus is more generalized to new face presentation attacks. An auxiliary face depth supervision is incorporated to further enhance the generalization ability. Extensive experiments on four public datasets validate the effectiveness of the proposed method.

...read moreread less

456 citations

Proceedings Article•

Domain Generalization via Model-Agnostic Learning of Semantic Features

[...]

Qi Dou¹, Daniel Coelho de Castro¹, Konstantinos Kamnitsas¹, Ben Glocker¹•Institutions (1)

Imperial College London¹

29 Oct 2019

TL;DR: In this paper, the authors adopt a model-agnostic learning paradigm with gradient-based meta-train and meta-test procedures to expose the optimization to domain shift and introduce two complementary losses which explicitly regularize the semantic structure of the feature space.

...read moreread less

Abstract: Generalization capability to unseen domains is crucial for machine learning models when deploying to real-world conditions. We investigate the challenging problem of domain generalization, i.e., training a model on multi-domain source data such that it can directly generalize to target domains with unknown statistics. We adopt a model-agnostic learning paradigm with gradient-based meta-train and meta-test procedures to expose the optimization to domain shift. Further, we introduce two complementary losses which explicitly regularize the semantic structure of the feature space. Globally, we align a derived soft confusion matrix to preserve general knowledge of inter-class relationships. Locally, we promote domain-independent class-specific cohesion and separation of sample features with a metric-learning component. The effectiveness of our method is demonstrated with new state-of-the-art results on two common object recognition benchmarks. Our method also shows consistent improvement on a medical image segmentation task.

...read moreread less

404 citations

Journal Article•10.1038/S42256-021-00302-5•

DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators

[...]

Lu Lu, Pengzhan Jin, George Em Karniadakis

08 Oct 2019-arXiv: Learning

TL;DR: This work proposes deep operator networks (DeepONets) to learn operators accurately and efficiently from a relatively small dataset, and demonstrates that DeepONet significantly reduces the generalization error compared to the fully-connected networks.

...read moreread less

Abstract: While it is widely known that neural networks are universal approximators of continuous functions, a less known and perhaps more powerful result is that a neural network with a single hidden layer can approximate accurately any nonlinear continuous operator. This universal approximation theorem is suggestive of the potential application of neural networks in learning nonlinear operators from data. However, the theorem guarantees only a small approximation error for a sufficient large network, and does not consider the important optimization and generalization errors. To realize this theorem in practice, we propose deep operator networks (DeepONets) to learn operators accurately and efficiently from a relatively small dataset. A DeepONet consists of two sub-networks, one for encoding the input function at a fixed number of sensors $x_i, i=1,\dots,m$ (branch net), and another for encoding the locations for the output functions (trunk net). We perform systematic simulations for identifying two types of operators, i.e., dynamic systems and partial differential equations, and demonstrate that DeepONet significantly reduces the generalization error compared to the fully-connected networks. We also derive theoretically the dependence of the approximation error in terms of the number of sensors (where the input function is defined) as well as the input function type, and we verify the theorem with computational results. More importantly, we observe high-order error convergence in our computational tests, namely polynomial rates (from half order to fourth order) and even exponential convergence with respect to the training dataset size.

...read moreread less

324 citations

Proceedings Article•

The role of over-parametrization in generalization of neural networks

[...]

Behnam Neyshabur, Zhiyuan Li¹, Srinadh Bhojanapalli², Yann LeCun³, Nathan Srebro² - Show less +1 more•Institutions (3)

Princeton University¹, Toyota Technological Institute at Chicago², New York University³

1 Jan 2019

315 citations

Journal Article•10.1137/18M118236X•

Maximum principle preserving exponential time differencing schemes for the nonlocal Allen Cahn equation

[...]

Qiang Du¹, Lili Ju², Xiao Li³, Xiao Li⁴, Xiao Li⁵, Zhonghua Qiao³ - Show less +2 more•Institutions (5)

Columbia University¹, Ocean University of China², Hong Kong Polytechnic University³, University of South Carolina⁴, China Academy of Engineering Physics⁵

30 Apr 2019-SIAM Journal on Numerical Analysis

TL;DR: The nonlocal Allen--Cahn equation, a generalization of the classic Allen-- Cahn equation by replacing the Laplacian with a parameterized nonlocal diffusion operator, satisfies the maximum principle.

...read moreread less

Abstract: The nonlocal Allen--Cahn equation, a generalization of the classic Allen--Cahn equation by replacing the Laplacian with a parameterized nonlocal diffusion operator, satisfies the maximum principle ...

...read moreread less

Proceedings Article•10.1109/CVPR.2019.00019•

Striking the Right Balance With Uncertainty

[...]

Salman Khan¹, Munawar Hayat², Syed Waqas Zamir, Jianbing Shen³, Ling Shao - Show less +1 more•Institutions (3)

Australian National University¹, University of Canberra², Beijing Institute of Technology³

15 Jun 2019

TL;DR: This paper demonstrates that the Bayesian uncertainty estimates directly correlate with the rarity of classes and the difficulty level of individual samples, and presents a novel framework for uncertainty based class imbalance learning that efficiently utilizes sample and class uncertainty information to learn robust features and more generalizable classifiers.

...read moreread less

Abstract: Learning unbiased models on imbalanced datasets is a significant challenge. Rare classes tend to get a concentrated representation in the classification space which hampers the generalization of learned boundaries to new test examples. In this paper, we demonstrate that the Bayesian uncertainty estimates directly correlate with the rarity of classes and the difficulty level of individual samples. Subsequently, we present a novel framework for uncertainty based class imbalance learning that follows two key insights: First, classification boundaries should be extended further away from a more uncertain (rare) class to avoid over-fitting and enhance its generalization. Second, each sample should be modeled as a multi-variate Gaussian distribution with a mean vector and a covariance matrix defined by the sample's uncertainty. The learned boundaries should respect not only the individual samples but also their distribution in the feature space. Our proposed approach efficiently utilizes sample and class uncertainty information to learn robust features and more generalizable classifiers. We systematically study the class imbalance problem and derive a novel loss formulation for max-margin learning based on Bayesian uncertainty measure. The proposed method shows significant performance improvements on six benchmark datasets for face verification, attribute prediction, digit/object classification and skin lesion detection.

...read moreread less

Journal Article•10.1073/PNAS.1802705116•

Optimal errors and phase transitions in high-dimensional generalized linear models

[...]

Jean Barbier, Florent Krzakala¹, Nicolas Macris², Léo Miolane¹, Lenka Zdeborová¹ - Show less +1 more•Institutions (2)

Centre national de la recherche scientifique¹, École Polytechnique Fédérale de Lausanne²

01 Mar 2019-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: In this paper, the mutual information (or free entropy) from which the Bayes-optimal estimation and generalization errors of generalized linear models (GLMs) are deduced is analyzed.

...read moreread less

Abstract: Generalized linear models (GLMs) are used in high-dimensional machine learning, statistics, communications, and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes, or benchmark models in neural networks. We evaluate the mutual information (or “free entropy”) from which we deduce the Bayes-optimal estimation and generalization errors. Our analysis applies to the high-dimensional limit where both the number of samples and the dimension are large and their ratio is fixed. Nonrigorous predictions for the optimal errors existed for special cases of GLMs, e.g., for the perceptron, in the field of statistical physics based on the so-called replica method. Our present paper rigorously establishes those decades-old conjectures and brings forward their algorithmic interpretation in terms of performance of the generalized approximate message-passing algorithm. Furthermore, we tightly characterize, for many learning problems, regions of parameters for which this algorithm achieves the optimal performance and locate the associated sharp phase transitions separating learnable and nonlearnable regions. We believe that this random version of GLMs can serve as a challenging benchmark for multipurpose algorithms.

...read moreread less

Journal Article•10.1109/COMST.2018.2889329•

Optimizing Bloom Filter: Challenges, Solutions, and Comparisons

[...]

Lailong Luo¹, Deke Guo¹, Richard T. B. Ma², Ori Rottenstreich³, Xueshan Luo¹ - Show less +1 more•Institutions (3)

National University of Defense Technology¹, National University of Singapore², Technion – Israel Institute of Technology³

01 Jan 2019-IEEE Communications Surveys and Tutorials

TL;DR: In this article, a survey of the existing literature on BF optimization, covering more than 60 variants, is presented, and a comprehensive analysis and qualitative comparison are conducted from the perspectives of BF components.

...read moreread less

Abstract: Bloom filter (BF) has been widely used to support membership query, i.e., to judge whether a given element ${x}$ is a member of a given set ${S}$ or not. Recent years have seen a flourish design explosion of BF due to its characteristic of space-efficiency and the functionality of constant-time membership query. The existing reviews or surveys mainly focus on the applications of BF, but fall short in covering the current trends, thereby lacking intrinsic understanding of their design philosophy. To this end, this survey provides an overview of BF and its variants, with an emphasis on the optimization techniques. Basically, we survey the existing variants from two dimensions, i.e., performance and generalization. To improve the performance, dozens of variants devote themselves to reducing the false positives and implementation costs. Besides, tens of variants generalize the BF framework in more scenarios by diversifying the input sets and enriching the output functionalities. To summarize the existing efforts, we conduct an in-depth study of the existing literature on BF optimization, covering more than 60 variants. We unearth the design philosophy of these variants and elaborate how the employed optimization techniques improve BF. Furthermore, comprehensive analysis and qualitative comparison are conducted from the perspectives of BF components. Lastly, we highlight the future trends of designing BFs. This is, to the best of our knowledge, the first survey that accomplishes such goals.

...read moreread less

Proceedings Article•10.18653/V1/P19-1485•

MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension

[...]

Alon Talmor¹, Jonathan Berant²•Institutions (2)

Allen Institute for Artificial Intelligence¹, Tel Aviv University²

31 May 2019

TL;DR: This paper proposed MultiQA, a BERT-based model, trained on multiple RC datasets, which leads to state-of-the-art performance on five reading comprehension (RC) datasets.

...read moreread less

Abstract: A large number of reading comprehension (RC) datasets has been created recently, but little analysis has been done on whether they generalize to one another, and the extent to which existing datasets can be leveraged for improving performance on new ones. In this paper, we conduct such an investigation over ten RC datasets, training on one or more source RC datasets, and evaluating generalization, as well as transfer to a target RC dataset. We analyze the factors that contribute to generalization, and show that training on a source RC dataset and transferring to a target dataset substantially improves performance, even in the presence of powerful contextual representations from BERT (Devlin et al., 2019). We also find that training on multiple source RC datasets leads to robust generalization and transfer, and can reduce the cost of example collection for a new RC dataset. Following our analysis, we propose MultiQA, a BERT-based model, trained on multiple RC datasets, which leads to state-of-the-art performance on five RC datasets. We share our infrastructure for the benefit of the research community.

...read moreread less

Proceedings Article•10.1109/CVPR.2019.00263•

ContextDesc: Local Descriptor Augmentation With Cross-Modality Context

[...]

Zixin Luo¹, Tianwei Shen¹, Lei Zhou¹, Jiahui Zhang², Yao Yao¹, Shiwei Li¹, Tian Fang, Long Quan¹ - Show less +4 more•Institutions (2)

Hong Kong University of Science and Technology¹, Tsinghua University²

8 Apr 2019

TL;DR: This paper proposed a unified learning framework that leverages and aggregates the cross-modality contextual information, including visual context from high-level image representation and geometric context from 2D keypoint distribution.

...read moreread less

Abstract: Most existing studies on learning local features focus on the patch-based descriptions of individual keypoints, whereas neglecting the spatial relations established from their keypoint locations. In this paper, we go beyond the local detail representation by introducing context awareness to augment off-the-shelf local feature descriptors. Specifically, we propose a unified learning framework that leverages and aggregates the cross-modality contextual information, including (i) visual context from high-level image representation, and (ii) geometric context from 2D keypoint distribution. Moreover, we propose an effective N-pair loss that eschews the empirical hyper-parameter search and improves the convergence. The proposed augmentation scheme is lightweight compared with the raw local feature description, meanwhile improves remarkably on several large-scale benchmarks with diversified scenes, which demonstrates both strong practicality and generalization ability in geometric matching applications.

...read moreread less

Proceedings Article•

Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence

[...]

Fengxiang He¹, Tongliang Liu¹, Dacheng Tao¹•Institutions (1)

University of Sydney¹

1 Jan 2019

TL;DR: A PAC-Bayes generalization bound for neural networks trained by SGD is proved, which has a positive correlation with the ratio of batch size to learning rate, which builds the theoretical foundation of the training strategy.

...read moreread less

Abstract: Deep neural networks have received dramatic success based on the optimization method of stochastic gradient descent (SGD). However, it is still not clear how to tune hyper-parameters, especially batch size and learning rate, to ensure good generalization. This paper reports both theoretical and empirical evidence of a training strategy that we should control the ratio of batch size to learning rate not too large to achieve a good generalization ability. Specifically, we prove a PAC-Bayes generalization bound for neural networks trained by SGD, which has a positive correlation with the ratio of batch size to learning rate. This correlation builds the theoretical foundation of the training strategy. Furthermore, we conduct a large-scale experiment to verify the correlation and training strategy. We trained 1,600 models based on architectures ResNet-110, and VGG-19 with datasets CIFAR-10 and CIFAR-100 while strictly control unrelated variables. Accuracies on the test sets are collected for the evaluation. Spearman's rank-order correlation coefficients and the corresponding $p$ values on 164 groups of the collected data demonstrate that the correlation is statistically significant, which fully supports the training strategy.

...read moreread less

Journal Article•10.1109/TII.2018.2826064•

A Learning Framework of Adaptive Manipulative Skills From Human to Robot

[...]

Chenguang Yang¹, Chao Zeng¹, Yang Cong², Ning Wang³, Min Wang¹ - Show less +1 more•Institutions (3)

South China University of Technology¹, Chinese Academy of Sciences², University of Plymouth³

01 Feb 2019-IEEE Transactions on Industrial Informatics

TL;DR: A new framework to facilitate robot skill generalization is proposed, in that the learned skills are first segmented into a sequence of subskills automatically, then each individual subskill is encoded and regulated accordingly.

...read moreread less

Abstract: Robots are often required to generalize the skills learned from human demonstrations to fulfil new task requirements. However, skill generalization will be difficult to realize when facing with the following situations: the skill for a complex multistep task includes a number of features; some special constraints are imposed on the robots during the process of task reproduction; and a completely new situation quite different with the one in which demonstrations are given to the robot. This work proposes a new framework to facilitate robot skill generalization. The basic idea lies in that the learned skills are first segmented into a sequence of subskills automatically, then each individual subskill is encoded and regulated accordingly. Specifically, we adapt each set of the segmented movement trajectories individually instead of the whole movement profiles, thus, making it more convenient for the realization of skill generalization. In addition, human limb stiffness estimated from surface electromyographic signals is considered in the framework for the realization of human-to-robot variable impedance control skill transfer, as well as the generalization of both movement trajectories and stiffness profiles. Experimental study has been performed to verify the effectiveness of the proposed framework.

...read moreread less

Posted Content•

Deep Reinforcement Learning meets Graph Neural Networks: exploring a routing optimization use case

[...]

Paul Almasan, José Suárez-Varela, Arnau Badia-Sampera, Krzysztof Rusek, Pere Barlet-Ros, Albert Cabellos-Aparicio - Show less +2 more

16 Oct 2019-arXiv: Networking and Internet Architecture

TL;DR: This paper proposes to use Graph Neural Networks (GNN) in combination with DRL, and its novel DRL+GNN architecture is able to learn, operate and generalize over arbitrary network topologies.

...read moreread less

Abstract: Recent advances in Deep Reinforcement Learning (DRL) have shown a significant improvement in decision-making problems. The networking community has started to investigate how DRL can provide a new breed of solutions to relevant optimization problems, such as routing. However, most of the state-of-the-art DRL-based networking techniques fail to generalize, this means that they can only operate over network topologies seen during training, but not over new topologies. The reason behind this important limitation is that existing DRL networking solutions use standard neural networks (e.g., fully connected), which are unable to learn graph-structured information. In this paper we propose to use Graph Neural Networks (GNN) in combination with DRL. GNN have been recently proposed to model graphs, and our novel DRL+GNN architecture is able to learn, operate and generalize over arbitrary network topologies. To showcase its generalization capabilities, we evaluate it on an Optical Transport Network (OTN) scenario, where the agent needs to allocate traffic demands efficiently. Our results show that our DRL+GNN agent is able to achieve outstanding performance in topologies unseen during training.

...read moreread less

Posted Content•

Generalizing to unseen domains via distribution matching

[...]

Isabela Albuquerque, Joao Monteiro, Mohammad Darvishi, Tiago H. Falk, Ioannis Mitliagkas - Show less +1 more

03 Nov 2019-arXiv: Learning

TL;DR: This work focuses on domain generalization: a formalization where the data generating process at test time may yield samples from never-before-seen domains (distributions), and relies on a simple lemma to derive a generalization bound for this setting.

...read moreread less

Abstract: Supervised learning results typically rely on assumptions of i.i.d. data. Unfortunately, those assumptions are commonly violated in practice. In this work, we tackle this problem by focusing on domain generalization: a formalization where the data generating process at test time may yield samples from never-before-seen domains (distributions). Our work relies on a simple lemma: by minimizing a notion of discrepancy between all pairs from a set of given domains, we also minimize the discrepancy between any pairs of mixtures of domains. Using this result, we derive a generalization bound for our setting. We then show that low risk over unseen domains can be achieved by representing the data in a space where (i) the training distributions are indistinguishable, and (ii) relevant information for the task at hand is preserved. Minimizing the terms in our bound yields an adversarial formulation which estimates and minimizes pairwise discrepancies. We validate our proposed strategy on standard domain generalization benchmarks, outperforming a number of recently introduced methods. Notably, we tackle a real-world application where the underlying data corresponds to multi-channel electroencephalography time series from different subjects, each considered as a distinct domain.

...read moreread less

Posted Content•

Why ResNet Works? Residuals Generalize

[...]

Fengxiang He¹, Tongliang Liu¹, Dacheng Tao¹•Institutions (1)

University of Sydney¹

02 Apr 2019-arXiv: Machine Learning

TL;DR: According to the obtained generalization bound, regularization terms should be introduced to control the magnitude of the norms of weight matrices not to increase too much, in practice, to ensure a good generalization ability, which justifies the technique of weight decay.

...read moreread less

Abstract: Residual connections significantly boost the performance of deep neural networks. However, there are few theoretical results that address the influence of residuals on the hypothesis complexity and the generalization ability of deep neural networks. This paper studies the influence of residual connections on the hypothesis complexity of the neural network in terms of the covering number of its hypothesis space. We prove that the upper bound of the covering number is the same as chain-like neural networks, if the total numbers of the weight matrices and nonlinearities are fixed, no matter whether they are in the residuals or not. This result demonstrates that residual connections may not increase the hypothesis complexity of the neural network compared with the chain-like counterpart. Based on the upper bound of the covering number, we then obtain an $\mathcal O(1 / \sqrt{N})$ margin-based multi-class generalization bound for ResNet, as an exemplary case of any deep neural network with residual connections. Generalization guarantees for similar state-of-the-art neural network architectures, such as DenseNet and ResNeXt, are straight-forward. From our generalization bound, a practical implementation is summarized: to approach a good generalization ability, we need to use regularization terms to control the magnitude of the norms of weight matrices not to increase too much, which justifies the standard technique of weight decay.

...read moreread less

Posted Content•

Measuring Compositional Generalization: A Comprehensive Method on Realistic Data

[...]

Daniel Keysers¹, Nathanael Schärli¹, Nathan Scales¹, Hylke Buisman¹, Daniel Furrer¹, Sergii Kashubin, Nikola Momchev¹, Danila Sinopalnikov¹, Lukasz Stafiniak¹, Tibor Tihon¹, Dmitry Tsarkov², Xiao Wang, Marc van Zee¹, Olivier Bousquet¹ - Show less +10 more•Institutions (2)

Google¹, University of Manchester²

20 Dec 2019-arXiv: Learning

TL;DR: This article proposed a method to systematically construct such benchmarks by maximizing compound divergence while guaranteeing a small atom divergence between train and test sets, and quantitatively compare this method to other approaches for creating compositional generalization benchmarks.

...read moreread less

Abstract: State-of-the-art machine learning methods exhibit limited compositional generalization. At the same time, there is a lack of realistic benchmarks that comprehensively measure this ability, which makes it challenging to find and evaluate improvements. We introduce a novel method to systematically construct such benchmarks by maximizing compound divergence while guaranteeing a small atom divergence between train and test sets, and we quantitatively compare this method to other approaches for creating compositional generalization benchmarks. We present a large and realistic natural language question answering dataset that is constructed according to this method, and we use it to analyze the compositional generalization ability of three machine learning architectures. We find that they fail to generalize compositionally and that there is a surprisingly strong negative correlation between compound divergence and accuracy. We also demonstrate how our method can be used to create new compositionality benchmarks on top of the existing SCAN dataset, which confirms these findings.

...read moreread less

Book Chapter•10.1007/978-3-319-73074-5_5•

Generalization Error in Deep Learning

[...]

Daniel Jakubovitz¹, Raja Giryes¹, Miguel R. D. Rodrigues²•Institutions (2)

Tel Aviv University¹, University College London²

08 Apr 2019-arXiv: Learning

TL;DR: This chapter provides an overview of the existing theory and bounds for the characterization of the generalization error of deep neural networks, combining both classical and more recent theoretical and empirical results.

...read moreread less

Abstract: Deep learning models have lately shown great performance in various fields such as computer vision, speech recognition, speech translation, and natural language processing. However, alongside their state-of-the-art performance, it is still generally unclear what is the source of their generalization ability. Thus, an important question is what makes deep neural networks able to generalize well from the training set to new data. In this chapter, we provide an overview of the existing theory and bounds for the characterization of the generalization error of deep neural networks, combining both classical and more recent theoretical and empirical results.

...read moreread less

Proceedings Article•10.1109/CVPR.2019.01218•

Generalizing Eye Tracking With Bayesian Adversarial Learning

[...]

Kang Wang¹, Rui Zhao¹, Hui Su², Qiang Ji¹•Institutions (2)

Rensselaer Polytechnic Institute¹, IBM²

15 Jun 2019

TL;DR: This work adds an adversarial component into traditional CNN-based gaze estimator so that it can learn features that are gaze-responsive but can generalize to appearance and pose variations and extends the point-estimation based deterministic model to a Bayesian framework so that gaze estimation can be performed using all parameters.

...read moreread less

Abstract: Existing appearance-based gaze estimation approaches with CNN have poor generalization performance. By systematically studying this issue, we identify three major factors: 1) appearance variations; 2) head pose variations and 3) over-fitting issue with point estimation. To improve the generalization performance, we propose to incorporate adversarial learning and Bayesian inference into a unified framework. In particular, we first add an adversarial component into traditional CNN-based gaze estimator so that we can learn features that are gaze-responsive but can generalize to appearance and pose variations. Next, we extend the point-estimation based deterministic model to a Bayesian framework so that gaze estimation can be performed using all parameters instead of only one set of parameters. Besides improved performance on several benchmark datasets, the proposed method also enables online adaptation of the model to new subjects/environments, demonstrating the potential usage for practical real-time eye tracking applications.

...read moreread less

Journal Article•10.1088/1751-8121/AB4C8B•

A jamming transition from under- to over-parametrization affects generalization in deep learning

[...]

Stefano Spigler¹, Mario Geiger¹, Stéphane d'Ascoli², Levent Sagun¹, Giulio Biroli², Matthieu Wyart¹ - Show less +2 more•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, École Normale Supérieure²

22 Nov 2019-Journal of Physics A

TL;DR: It is argued that in fully-connected networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved, and it is shown that this transition is sharp for the hinge loss.

...read moreread less

Journal Article•10.1177/0956797619863663•

Searching for rewards like a child means less generalization and more directed exploration.

[...]

Eric Schulz¹, Charley M. Wu², Azzurra Ruggeri², Björn Meder²•Institutions (2)

Harvard University¹, Max Planck Society²

25 Oct 2019-Psychological Science

TL;DR: A predictive model of search is built to disentangle the unique contributions of the three hypotheses of developmental differences and found robust and recoverable parameter estimates indicating that children generalize less and rely on directed exploration more than adults.

...read moreread less

Abstract: How do children and adults differ in their search for rewards? We considered three different hypotheses that attribute developmental differences to (a) children’s increased random sampling, (b) mor...

...read moreread less

Proceedings Article•

When to Trust Your Model: Model-Based Policy Optimization

[...]

Michael Janner¹, Justin Fu¹, Marvin Zhang¹, Sergey Levine²•Institutions (2)

University of California, Berkeley¹, Google²

1 Jun 2019

TL;DR: In this article, the role of model usage in policy optimization both theoretically and empirically is investigated, and a simple procedure of using short model-generated rollouts branched from real data has the benefits of more complicated model-based algorithms.

...read moreread less

Abstract: Designing effective model-based reinforcement learning algorithms is difficult because the ease of data generation must be weighed against the bias of model-generated data. In this paper, we study the role of model usage in policy optimization both theoretically and empirically. We first formulate and analyze a model-based reinforcement learning algorithm with a guarantee of monotonic improvement at each step. In practice, this analysis is overly pessimistic and suggests that real off-policy data is always preferable to model-generated on-policy data, but we show that an empirical estimate of model generalization can be incorporated into such analysis to justify model usage. Motivated by this analysis, we then demonstrate that a simple procedure of using short model-generated rollouts branched from real data has the benefits of more complicated model-based algorithms without the usual pitfalls. In particular, this approach surpasses the sample efficiency of prior model-based methods, matches the asymptotic performance of the best model-free algorithms, and scales to horizons that cause other model-based methods to fail entirely.

...read moreread less

Proceedings Article•

Learning Action Representations for Reinforcement Learning

[...]

Yash Chandak¹, Georgios Theocharous², James Kostas¹, Scott M. Jordan¹, Philip S. Thomas¹ - Show less +1 more•Institutions (2)

University of Massachusetts Amherst¹, Adobe Systems²

1 Feb 2019

TL;DR: In this article, a policy can be decomposed into a component that acts in a low-dimensional space of action representations, and another component that transforms these representations into actual actions to improve generalization over large, finite action sets by allowing the agent to infer the outcomes of actions similar to actions already taken.

...read moreread less

Abstract: Most model-free reinforcement learning methods leverage state representations (embeddings) for generalization, but either ignore structure in the space of actions or assume the structure is provided a priori. We show how a policy can be decomposed into a component that acts in a low-dimensional space of action representations and a component that transforms these representations into actual actions. These representations improve generalization over large, finite action sets by allowing the agent to infer the outcomes of actions similar to actions already taken. We provide an algorithm to both learn and use action representations and provide conditions for its convergence. The efficacy of the proposed method is demonstrated on large-scale real-world problems.

...read moreread less

Proceedings Article•

Hierarchically structured meta-learning

[...]

Huaxiu Yao¹, Ying Wei², Junzhou Huang², Zhenhui Li³•Institutions (3)

Pennsylvania State University¹, Tencent², Penn State College of Information Sciences and Technology³

24 May 2019

TL;DR: Huang et al. as mentioned in this paper proposed a hierarchical structured meta-learning (HSML) algorithm that explicitly tailors the transferable knowledge to different clusters of tasks, which not only addresses the task uncertainty and heterogeneity, but also preserves knowledge generalization among a cluster of similar tasks.

...read moreread less

Abstract: In order to learn quickly with few samples, meta-learning utilizes prior knowledge learned from previous tasks. However, a critical challenge in meta-learning is task uncertainty and heterogeneity, which can not be handled via globally sharing knowledge among tasks. In this paper, based on gradient-based meta-learning, we propose a hierarchically structured meta-learning (HSML) algorithm that explicitly tailors the transferable knowledge to different clusters of tasks. Inspired by the way human beings organize knowledge, we resort to a hierarchical task clustering structure to cluster tasks. As a result, the proposed approach not only addresses the challenge via the knowledge customization to different clusters of tasks, but also preserves knowledge generalization among a cluster of similar tasks. To tackle the changing of task relationship, in addition, we extend the hierarchical structure to a continual learning environment. The experimental results show that our approach can achieve state-of-the-art performance in both toy-regression and few-shot image classification problems.

...read moreread less

Posted Content•

Physics-informed Autoencoders for Lyapunov-stable Fluid Flow Prediction.

[...]

N. Benjamin Erichson, Michael Muehlebach, Michael W. Mahoney

26 May 2019-arXiv: Computational Physics

TL;DR: This work investigates whether it is possible to include physics-informed prior knowledge for improving the model quality, and focuses on the stability of an equilibrium, one of the most basic properties a dynamic system can have, via the lens of Lyapunov analysis.

...read moreread less

Abstract: In addition to providing high-profile successes in computer vision and natural language processing, neural networks also provide an emerging set of techniques for scientific problems. Such data-driven models, however, typically ignore physical insights from the scientific system under consideration. Among other things, a physics-informed model formulation should encode some degree of stability or robustness or well-conditioning (in that a small change of the input will not lead to drastic changes in the output), characteristic of the underlying scientific problem. We investigate whether it is possible to include physics-informed prior knowledge for improving the model quality (e.g., generalization performance, sensitivity to parameter tuning, or robustness in the presence of noisy data). To that extent, we focus on the stability of an equilibrium, one of the most basic properties a dynamic system can have, via the lens of Lyapunov analysis. For the prototypical problem of fluid flow prediction, we show that models preserving Lyapunov stability improve the generalization error and reduce the prediction uncertainty.

...read moreread less

...

Expand