Scispace (Formerly Typeset)
  1. Home
  2. Topics
  3. Unsupervised learning
  4. 2019
  1. Home
  2. Topics
  3. Unsupervised learning
  4. 2019
Showing papers on "Unsupervised learning published in 2019"
Posted Content•
Momentum Contrast for Unsupervised Visual Representation Learning

[...]

Kaiming He1, Haoqi Fan1, Yuxin Wu1, Saining Xie1, Ross Girshick1 •
Facebook1
13 Nov 2019-arXiv: Computer Vision and Pattern Recognition
TL;DR: This article proposed Momentum Contrast (MoCo) for unsupervised visual representation learning, which enables building a large and consistent dictionary on-the-fly that facilitates contrastive learning.
Abstract: We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning. MoCo provides competitive results under the common linear protocol on ImageNet classification. More importantly, the representations learned by MoCo transfer well to downstream tasks. MoCo can outperform its supervised pre-training counterpart in 7 detection/segmentation tasks on PASCAL VOC, COCO, and other datasets, sometimes surpassing it by large margins. This suggests that the gap between unsupervised and supervised representation learning has been largely closed in many vision tasks.

7,913 citations

Posted Content•
Contrastive Multiview Coding

[...]

Yonglong Tian1, Dilip Krishnan2, Phillip Isola1•
Massachusetts Institute of Technology1, Google2
13 Jun 2019-arXiv: Computer Vision and Pattern Recognition
TL;DR: Key properties of the multiview contrastive learning approach are analyzed, finding that the contrastive loss outperforms a popular alternative based on cross-view prediction, and that the more views the authors learn from, the better the resulting representation captures underlying scene semantics.
Abstract: Humans view the world through many sensory channels, e.g., the long-wavelength light channel, viewed by the left eye, or the high-frequency vibrations channel, heard by the right ear. Each view is noisy and incomplete, but important factors, such as physics, geometry, and semantics, tend to be shared between all views (e.g., a "dog" can be seen, heard, and felt). We investigate the classic hypothesis that a powerful representation is one that models view-invariant factors. We study this hypothesis under the framework of multiview contrastive learning, where we learn a representation that aims to maximize mutual information between different views of the same scene but is otherwise compact. Our approach scales to any number of views, and is view-agnostic. We analyze key properties of the approach that make it work, finding that the contrastive loss outperforms a popular alternative based on cross-view prediction, and that the more views we learn from, the better the resulting representation captures underlying scene semantics. Our approach achieves state-of-the-art results on image and video unsupervised learning benchmarks. Code is released at: this http URL.

1,951 citations

Posted Content•10.1101/622803•
Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences

[...]

Alexander Rives1, Siddharth Goyal2, Joshua Meier2, Demi Guo2, Myle Ott2, C. Lawrence Zitnick2, Jerry Ma2, Rob Fergus1, Rob Fergus2 •
New York University1, Facebook2
29 Apr 2019-bioRxiv
TL;DR: This work uses unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million protein sequences spanning evolutionary diversity, enabling state-of-the-art supervised prediction of mutational effect and secondary structure, and improving state- of- the-art features for long-range contact prediction.
Abstract: In the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised learning has led to major advances in representation learning and statistical generation. In biology, the anticipated growth of sequencing promises unprecedented data on natural sequence diversity. Learning the natural distribution of evolutionary protein sequence variation is a logical step toward predictive and generative modeling for biology. To this end we use unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million sequences spanning evolutionary diversity. The resulting model maps raw sequences to representations of biological properties without labels or prior domain knowledge. The learned representation space organizes sequences at multiple levels of biological granularity from the biochemical to proteomic levels. Learning recovers information about protein structure: secondary structure and residue-residue contacts can be extracted by linear projections from learned representations. With small amounts of labeled data, the ability to identify tertiary contacts is further improved. Learning on full sequence diversity rather than individual protein families increases recoverable information about secondary structure. We show the networks generalize by adapting them to variant activity prediction from sequences only, with results that are comparable to a state-of-the-art variant predictor that uses evolutionary and structurally derived features.

1,577 citations

Book Chapter•10.1007/978-3-030-58621-8_45•
Contrastive Multiview Coding

[...]

Yonglong Tian1, Dilip Krishnan2, Phillip Isola1•
Massachusetts Institute of Technology1, Google2
13 Jun 2019
TL;DR: In this paper, a multiview contrastive learning framework is proposed to maximize mutual information between different views of the same scene but is otherwise compact, which achieves state-of-the-art results on image and video unsupervised learning benchmarks.
Abstract: Humans view the world through many sensory channels, e.g., the long-wavelength light channel, viewed by the left eye, or the high-frequency vibrations channel, heard by the right ear. Each view is noisy and incomplete, but important factors, such as physics, geometry, and semantics, tend to be shared between all views (e.g., a "dog" can be seen, heard, and felt). We investigate the classic hypothesis that a powerful representation is one that models view-invariant factors. We study this hypothesis under the framework of multiview contrastive learning, where we learn a representation that aims to maximize mutual information between different views of the same scene but is otherwise compact. Our approach scales to any number of views, and is view-agnostic. We analyze key properties of the approach that make it work, finding that the contrastive loss outperforms a popular alternative based on cross-view prediction, and that the more views we learn from, the better the resulting representation captures underlying scene semantics. Our approach achieves state-of-the-art results on image and video unsupervised learning benchmarks. Code is released at: this http URL.

1,525 citations

Journal Article•10.1016/J.MEDIA.2019.01.010•
f-AnoGAN: Fast unsupervised anomaly detection with generative adversarial networks.

[...]

Thomas Schlegl1, Philipp Seeböck1, Sebastian M. Waldstein1, Georg Langs1, Ursula Schmidt-Erfurth1 •
Medical University of Vienna1
01 May 2019-Medical Image Analysis
TL;DR: Fast AnoGAN (f‐AnoGAN), a generative adversarial network (GAN) based unsupervised learning approach capable of identifying anomalous images and image segments, that can serve as imaging biomarker candidates is presented.

1,419 citations

Proceedings Article•
Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations

[...]

Vincent Sitzmann1, Michael Zollhoefer1, Gordon Wetzstein1•
Stanford University1
4 Jun 2019
TL;DR: The proposed Scene Representation Networks (SRNs), a continuous, 3D-structure-aware scene representation that encodes both geometry and appearance, are demonstrated by evaluating them for novel view synthesis, few-shot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a non-rigid face model.
Abstract: Unsupervised learning with generative models has the potential of discovering rich representations of 3D scenes. While geometric deep learning has explored 3D-structure-aware representations of scene geometry, these models typically require explicit 3D supervision. Emerging neural scene representations can be trained only with posed 2D images, but existing methods ignore the three-dimensional structure of scenes. We propose Scene Representation Networks (SRNs), a continuous, 3D-structure-aware scene representation that encodes both geometry and appearance. SRNs represent scenes as continuous functions that map world coordinates to a feature representation of local scene properties. By formulating the image formation as a differentiable ray-marching algorithm, SRNs can be trained end-to-end from only 2D images and their camera poses, without access to depth or shape. This formulation naturally generalizes across scenes, learning powerful geometry and appearance priors in the process. We demonstrate the potential of SRNs by evaluating them for novel view synthesis, few-shot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a non-rigid face model.

1,387 citations

Journal Article•10.1038/S41586-019-1157-8•
All-optical spiking neurosynaptic networks with self-learning capabilities.

[...]

Johannes Feldmann1, Nathan Youngblood2, C.D. Wright3, Harish Bhaskaran2, Wolfram H. P. Pernice1 •
University of Münster1, University of Oxford2, University of Exeter3
08 May 2019-Nature
TL;DR: An optical version of a brain-inspired neurosynaptic system, using wavelength division multiplexing techniques, is presented that is capable of supervised and unsupervised learning.
Abstract: Software implementations of brain-inspired computing underlie many important computational tasks, from image processing to speech recognition, artificial intelligence and deep learning applications. Yet, unlike real neural tissue, traditional computing architectures physically separate the core computing functions of memory and processing, making fast, efficient and low-energy computing difficult to achieve. To overcome such limitations, an attractive alternative is to design hardware that mimics neurons and synapses. Such hardware, when connected in networks or neuromorphic systems, processes information in a way more analogous to brains. Here we present an all-optical version of such a neurosynaptic system, capable of supervised and unsupervised learning. We exploit wavelength division multiplexing techniques to implement a scalable circuit architecture for photonic neural networks, successfully demonstrating pattern recognition directly in the optical domain. Such photonic neurosynaptic networks promise access to the high speed and high bandwidth inherent to optical systems, thus enabling the direct processing of optical telecommunication and visual data. An optical version of a brain-inspired neurosynaptic system, using wavelength division multiplexing techniques, is presented that is capable of supervised and unsupervised learning.

1,347 citations

Proceedings Article•
Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations

[...]

Francesco Locatello1, Stefan Bauer1, Mario Lucic2, Gunnar Rätsch3, Sylvain Gelly2, Bernhard Schölkopf1, Olivier Bachem2 •
Max Planck Society1, Google2, Swiss Institute of Bioinformatics3
24 May 2019
TL;DR: The authors show that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data, and suggest that future work on disentanglement learning should be explicit about the role of inductive bias and (implicit) supervision.
Abstract: The key idea behind the unsupervised learning of disentangled representations is that real-world data is generated by a few explanatory factors of variation which can be recovered by unsupervised learning algorithms. In this paper, we provide a sober look at recent progress in the field and challenge some common assumptions. We first theoretically show that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data. Then, we train more than 12000 models covering most prominent methods and evaluation metrics in a reproducible large-scale experimental study on seven different data sets. We observe that while the different methods successfully enforce properties ``encouraged'' by the corresponding losses, well-disentangled models seemingly cannot be identified without supervision. Furthermore, increased disentanglement does not seem to lead to a decreased sample complexity of learning for downstream tasks. Our results suggest that future work on disentanglement learning should be explicit about the role of inductive biases and (implicit) supervision, investigate concrete benefits of enforcing disentanglement of the learned representations, and consider a reproducible experimental setup covering several data sets.

1,296 citations

Posted Content•
Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey

[...]

Longlong Jing1, Yingli Tian1•
City University of New York1
16 Feb 2019-arXiv: Computer Vision and Pattern Recognition
TL;DR: Self-Supervised Learning: Self-supervised learning as discussed by the authors is a subset of unsupervised image and video feature learning, which aims to learn general image features from large-scale unlabeled data without using any human-annotated labels.
Abstract: Large-scale labeled data are generally required to train deep neural networks in order to obtain better performance in visual feature learning from images or videos for computer vision applications. To avoid extensive cost of collecting and annotating large-scale datasets, as a subset of unsupervised learning methods, self-supervised learning methods are proposed to learn general image and video features from large-scale unlabeled data without using any human-annotated labels. This paper provides an extensive review of deep learning-based self-supervised general visual feature learning methods from images or videos. First, the motivation, general pipeline, and terminologies of this field are described. Then the common deep neural network architectures that used for self-supervised learning are summarized. Next, the main components and evaluation metrics of self-supervised learning methods are reviewed followed by the commonly used image and video datasets and the existing self-supervised visual feature learning methods. Finally, quantitative performance comparisons of the reviewed methods on benchmark datasets are summarized and discussed for both image and video feature learning. At last, this paper is concluded and lists a set of promising future directions for self-supervised visual feature learning.

1,133 citations

Journal Article•10.1038/S41576-018-0088-9•
Challenges in unsupervised clustering of single-cell RNA-seq data.

[...]

Vladimir Yu. Kiselev1, Tallulah S. Andrews1, Martin Hemberg1•
Wellcome Trust Sanger Institute1
01 May 2019-Nature Reviews Genetics
TL;DR: This Review discusses the multiple algorithmic options for clustering scRNA-seq data, including various technical, biological and computational considerations.
Abstract: Single-cell RNA sequencing (scRNA-seq) allows researchers to collect large catalogues detailing the transcriptomes of individual cells. Unsupervised clustering is of central importance for the analysis of these data, as it is used to identify putative cell types. However, there are many challenges involved. We discuss why clustering is a challenging problem from a computational point of view and what aspects of the data make it challenging. We also consider the difficulties related to the biological interpretation and annotation of the identified clusters.

1,114 citations

Proceedings Article•10.1109/CVPR.2019.00233•
Domain Generalization by Solving Jigsaw Puzzles

[...]

Fabio Maria Carlucci1, Antonio D'Innocente2, Silvia Bucci3, Barbara Caputo, Tatiana Tommasi4 •
Huawei1, Sapienza University of Rome2, Istituto Italiano di Tecnologia3, Polytechnic University of Turin4
15 Jun 2019
TL;DR: This model learns the semantic labels in a supervised fashion, and broadens its understanding of the data by learning from self-supervised signals how to solve a jigsaw puzzle on the same images, which helps the network to learn the concepts of spatial correlation while acting as a regularizer for the classification task.
Abstract: Human adaptability relies crucially on the ability to learn and merge knowledge both from supervised and unsupervised learning: the parents point out few important concepts, but then the children fill in the gaps on their own. This is particularly effective, because supervised learning can never be exhaustive and thus learning autonomously allows to discover invariances and regularities that help to generalize. In this paper we propose to apply a similar approach to the task of object recognition across domains: our model learns the semantic labels in a supervised fashion, and broadens its understanding of the data by learning from self-supervised signals how to solve a jigsaw puzzle on the same images. This secondary task helps the network to learn the concepts of spatial correlation while acting as a regularizer for the classification task. Multiple experiments on the PACS, VLCS, Office-Home and digits datasets confirm our intuition and show that this simple method outperforms previous domain generalization and adaptation solutions. An ablation study further illustrates the inner workings of our approach.
Journal Article•10.1038/S41576-019-0122-6•
Deep learning: new computational modelling techniques for genomics

[...]

Gökcen Eraslan1, Žiga Avsec1, Julien Gagneur1, Fabian J. Theis1•
Technische Universität München1
01 Jul 2019-Nature Reviews Genetics
TL;DR: This Review describes different deep learning techniques and how they can be applied to extract biologically relevant information from large, complex genomic data sets.
Abstract: As a data-driven science, genomics largely utilizes machine learning to capture dependencies in data and derive novel biological hypotheses. However, the ability to extract new insights from the exponentially increasing volume of genomics data requires more expressive machine learning models. By effectively leveraging large data sets, deep learning has transformed fields such as computer vision and natural language processing. Now, it is becoming the method of choice for many genomics modelling tasks, including predicting the impact of genetic variation on gene regulatory mechanisms such as DNA accessibility and splicing. This Review describes different deep learning techniques and how they can be applied to extract biologically relevant information from large, complex genomic data sets.
Posted Content•
MAD-GAN: Multivariate Anomaly Detection for Time Series Data with Generative Adversarial Networks

[...]

Dan Li, Dacheng Chen, Lei Shi, Baihong Jin, Jonathan Goh, See-Kiong Ng 
15 Jan 2019-arXiv: Learning
TL;DR: The proposed MAD-GAN framework considers the entire variable set concurrently to capture the latent interactions amongst the variables and is effective in reporting anomalies caused by various cyber-intrusions compared in these complex real-world systems.
Abstract: The prevalence of networked sensors and actuators in many real-world systems such as smart buildings, factories, power plants, and data centers generate substantial amounts of multivariate time series data for these systems. The rich sensor data can be continuously monitored for intrusion events through anomaly detection. However, conventional threshold-based anomaly detection methods are inadequate due to the dynamic complexities of these systems, while supervised machine learning methods are unable to exploit the large amounts of data due to the lack of labeled data. On the other hand, current unsupervised machine learning approaches have not fully exploited the spatial-temporal correlation and other dependencies amongst the multiple variables (sensors/actuators) in the system for detecting anomalies. In this work, we propose an unsupervised multivariate anomaly detection method based on Generative Adversarial Networks (GANs). Instead of treating each data stream independently, our proposed MAD-GAN framework considers the entire variable set concurrently to capture the latent interactions amongst the variables. We also fully exploit both the generator and discriminator produced by the GAN, using a novel anomaly score called DR-score to detect anomalies by discrimination and reconstruction. We have tested our proposed MAD-GAN using two recent datasets collected from real-world CPS: the Secure Water Treatment (SWaT) and the Water Distribution (WADI) datasets. Our experimental results showed that the proposed MAD-GAN is effective in reporting anomalies caused by various cyber-intrusions compared in these complex real-world systems.
Journal Article•10.1016/J.PHYSREP.2019.03.001•
A high-bias, low-variance introduction to Machine Learning for physicists

[...]

Pankaj Mehta1, Marin Bukov2, Ching-Hao Wang1, Alexandre G. R. Day1, Charles C. Richardson1, Charles K. Fisher, David J. Schwab3 •
Boston University1, University of California, Berkeley2, City University of New York3
30 May 2019-Physics Reports
TL;DR: The review begins by covering fundamental concepts in ML and modern statistics such as the bias-variance tradeoff, overfitting, regularization, generalization, and gradient descent before moving on to more advanced topics in both supervised and unsupervised learning.
Journal Article•10.1038/S41586-019-1335-8•
Unsupervised word embeddings capture latent knowledge from materials science literature

[...]

Vahe Tshitoyan1, Vahe Tshitoyan2, John Dagdelen3, John Dagdelen2, Leigh Weston2, Alexander Dunn3, Alexander Dunn2, Ziqin Rong2, Olga Kononova3, Kristin A. Persson2, Kristin A. Persson3, Gerbrand Ceder3, Gerbrand Ceder2, Anubhav Jain2 •
Google1, Lawrence Berkeley National Laboratory2, University of California, Berkeley3
03 Jul 2019-Nature
TL;DR: It is shown that materials science knowledge present in the published literature can be efficiently encoded as information-dense word embeddings11–13 (vector representations of words) without human labelling or supervision, suggesting that latent knowledge regarding future discoveries is to a large extent embedded in past publications.
Abstract: The overwhelming majority of scientific knowledge is published as text, which is difficult to analyse by either traditional statistical analysis or modern machine learning methods. By contrast, the main source of machine-interpretable data for the materials research community has come from structured property databases1,2, which encompass only a small fraction of the knowledge present in the research literature. Beyond property values, publications contain valuable knowledge regarding the connections and relationships between data items as interpreted by the authors. To improve the identification and use of this knowledge, several studies have focused on the retrieval of information from scientific literature using supervised natural language processing3-10, which requires large hand-labelled datasets for training. Here we show that materials science knowledge present in the published literature can be efficiently encoded as information-dense word embeddings11-13 (vector representations of words) without human labelling or supervision. Without any explicit insertion of chemical knowledge, these embeddings capture complex materials science concepts such as the underlying structure of the periodic table and structure-property relationships in materials. Furthermore, we demonstrate that an unsupervised method can recommend materials for functional applications several years before their discovery. This suggests that latent knowledge regarding future discoveries is to a large extent embedded in past publications. Our findings highlight the possibility of extracting knowledge and relationships from the massive body of scientific literature in a collective manner, and point towards a generalized approach to the mining of scientific literature.
Proceedings Article•10.1109/CVPR.2019.00202•
Revisiting Self-Supervised Visual Representation Learning

[...]

Alexander Kolesnikov1, Xiaohua Zhai1, Lucas Beyer1•
Google1
15 Jun 2019
TL;DR: This study revisits numerous previously proposed self-supervised models, conducts a thorough large scale study and uncovers multiple crucial insights about standard recipes for CNN design that do not always translate to self- supervised representation learning.
Abstract: Unsupervised visual representation learning remains a largely unsolved problem in computer vision research. Among a big body of recently proposed approaches for unsupervised learning of visual representations, a class of self-supervised techniques achieves superior performance on many challenging benchmarks. A large number of the pretext tasks for self-supervised learning have been studied, but other important aspects, such as the choice of convolutional neural networks (CNN), has not received equal attention. Therefore, we revisit numerous previously proposed self-supervised models, conduct a thorough large scale study and, as a result, uncover multiple crucial insights. We challenge a number of common practices in self-supervised visual representation learning and observe that standard recipes for CNN design do not always translate to self-supervised representation learning. As part of our study, we drastically boost the performance of previously proposed techniques and outperform previously published state-of-the-art results by a large margin. We will release the code for reproducing our experiments when the anonymity requirements are lifted.
Proceedings Article•10.1109/CVPR.2019.01252•
Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation

[...]

Anurag Ranjan1, Varun Jampani2, Lukas Balles2, Kihwan Kim2, Deqing Sun3, Jonas Wulff4, Michael J. Black1 •
Max Planck Society1, Nvidia2, University of Tübingen3, Massachusetts Institute of Technology4
15 Jun 2019
TL;DR: In this article, the authors propose a competitive collaboration framework that facilitates the coordinated training of multiple specialized neural networks to solve complex low-level vision problems, such as single view depth prediction, camera motion estimation, optical flow, and segmentation of a video into the static scene and moving regions.
Abstract: We address the unsupervised learning of several interconnected problems in low-level vision: single view depth prediction, camera motion estimation, optical flow, and segmentation of a video into the static scene and moving regions. Our key insight is that these four fundamental vision problems are coupled through geometric constraints. Consequently, learning to solve them together simplifies the problem because the solutions can reinforce each other. We go beyond previous work by exploiting geometry more explicitly and segmenting the scene into static and moving regions. To that end, we introduce Competitive Collaboration, a framework that facilitates the coordinated training of multiple specialized neural networks to solve complex problems. Competitive Collaboration works much like expectation-maximization, but with neural networks that act as both competitors to explain pixels that correspond to static or moving regions, and as collaborators through a moderator that assigns pixels to be either static or independently moving. Our novel method integrates all these problems in a common framework and simultaneously reasons about the segmentation of the scene into moving objects and the static background, the camera motion, depth of the static scene structure, and the optical flow of moving objects. Our model is trained without any supervision and achieves state-of-the-art performance among joint unsupervised methods on all sub-problems.
Proceedings Article•10.1109/ICCV.2019.00610•
Local Aggregation for Unsupervised Learning of Visual Embeddings

[...]

Chengxu Zhuang1, Alex Zhai1, Daniel L. K. Yamins1•
Stanford University1
29 Mar 2019
TL;DR: In this paper, the authors train an embedding function to maximize a metric of local aggregation, causing similar data instances to move together in the embedding space, while allowing dissimilar instances to separate.
Abstract: Unsupervised approaches to learning in neural networks are of substantial interest for furthering artificial intelligence, both because they would enable the training of networks without the need for large numbers of expensive annotations, and because they would be better models of the kind of general-purpose learning deployed by humans. However, unsupervised networks have long lagged behind the performance of their supervised counterparts, especially in the domain of large-scale visual recognition. Recent developments in training deep convolutional embeddings to maximize non-parametric instance separation and clustering objectives have shown promise in closing this gap. Here, we describe a method that trains an embedding function to maximize a metric of local aggregation, causing similar data instances to move together in the embedding space, while allowing dissimilar instances to separate. This aggregation metric is dynamic, allowing soft clusters of different scales to emerge. We evaluate our procedure on several large-scale visual recognition datasets, achieving state-of-the-art unsupervised transfer learning performance on object recognition in ImageNet, scene recognition in Places 205, and object detection in PASCAL VOC.
Posted Content•
HoloGAN: Unsupervised learning of 3D representations from natural images

[...]

Thu Nguyen-Phuoc1, Chuan Li, Lucas Theis2, Christian Richardt1, Yong-Liang Yang1 •
University of Bath1, Twitter2
02 Apr 2019-arXiv: Computer Vision and Pattern Recognition
TL;DR: HoloGAN is the first generative model that learns 3D representations from natural images in an entirely unsupervised manner and is shown to be able to generate images with similar or higher visual quality than other generative models.
Abstract: We propose a novel generative adversarial network (GAN) for the task of unsupervised learning of 3D representations from natural images. Most generative models rely on 2D kernels to generate images and make few assumptions about the 3D world. These models therefore tend to create blurry images or artefacts in tasks that require a strong 3D understanding, such as novel-view synthesis. HoloGAN instead learns a 3D representation of the world, and to render this representation in a realistic manner. Unlike other GANs, HoloGAN provides explicit control over the pose of generated objects through rigid-body transformations of the learnt 3D features. Our experiments show that using explicit 3D features enables HoloGAN to disentangle 3D pose and identity, which is further decomposed into shape and appearance, while still being able to generate images with similar or higher visual quality than other generative models. HoloGAN can be trained end-to-end from unlabelled 2D images only. Particularly, we do not require pose labels, 3D shapes, or multiple views of the same objects. This shows that HoloGAN is the first generative model that learns 3D representations from natural images in an entirely unsupervised manner.
Journal Article•10.1038/S41467-019-11786-6•
A critique of pure learning and what artificial neural networks can learn from animal brains.

[...]

Anthony M. Zador1•
Cold Spring Harbor Laboratory1
21 Aug 2019-Nature Communications
TL;DR: It is suggested that for AI to learn from animal brains, it is important to consider that animal behaviour results from brain connectivity specified in the genome through evolution, and not due to unique learning algorithms.
Abstract: Artificial neural networks (ANNs) have undergone a revolution, catalyzed by better supervised learning algorithms. However, in stark contrast to young animals (including humans), training such networks requires enormous numbers of labeled examples, leading to the belief that animals must rely instead mainly on unsupervised learning. Here we argue that most animal behavior is not the result of clever learning algorithms-supervised or unsupervised-but is encoded in the genome. Specifically, animals are born with highly structured brain connectivity, which enables them to learn very rapidly. Because the wiring diagram is far too complex to be specified explicitly in the genome, it must be compressed through a "genomic bottleneck". The genomic bottleneck suggests a path toward ANNs capable of rapid learning.
Proceedings Article•10.1109/CVPR.2019.00140•
Unsupervised Deep Tracking

[...]

Ning Wang1, Yibing Song2, Chao Ma3, Wengang Zhou1, Wei Liu2, Houqiang Li1 •
University of Science and Technology of China1, Tencent2, Shanghai Jiao Tong University3
15 Jun 2019
TL;DR: The proposed unsupervised tracker achieves the baseline accuracy of fully supervised trackers, which require complete and accurate labels during training, and exhibits a potential in leveraging unlabeled or weakly labeled data to further improve the tracking accuracy.
Abstract: We propose an unsupervised visual tracking method in this paper. Different from existing approaches using extensive annotated data for supervised learning, our CNN model is trained on large-scale unlabeled videos in an unsupervised manner. Our motivation is that a robust tracker should be effective in both the forward and backward predictions (i.e., the tracker can forward localize the target object in successive frames and backtrace to its initial position in the first frame). We build our framework on a Siamese correlation filter network, which is trained using unlabeled raw videos. Meanwhile, we propose a multiple-frame validation method and a cost-sensitive loss to facilitate unsupervised learning. Without bells and whistles, the proposed unsupervised tracker achieves the baseline accuracy of fully supervised trackers, which require complete and accurate labels during training. Furthermore, unsupervised framework exhibits a potential in leveraging unlabeled or weakly labeled data to further improve the tracking accuracy.
Journal Article•10.1609/AAAI.V33I01.33018001•
Depth Prediction without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos

[...]

Vincent Casser1, Soeren Pirk2, Reza Mahjourian3, Anelia Angelova2•
Harvard University1, Google2, University of Texas at Austin3
17 Jul 2019
TL;DR: This work addresses unsupervised learning of scene depth and robot ego-motion where supervision is provided by monocular videos, as cameras are the cheapest, least restrictive and most ubiquitous sensor for robotics.
Abstract: Learning to predict scene depth from RGB inputs is a challenging task both for indoor and outdoor robot navigation. In this work we address unsupervised learning of scene depth and robot ego-motion where supervision is provided by monocular videos, as cameras are the cheapest, least restrictive and most ubiquitous sensor for robotics. Previous work in unsupervised image-to-depth learning has established strong baselines in the domain. We propose a novel approach which produces higher quality results, is able to model moving objects and is shown to transfer across data domains, e.g. from outdoors to indoor scenes. The main idea is to introduce geometric structure in the learning process, by modeling the scene and the individual objects; camera ego-motion and object motions are learned from monocular videos as input. Furthermore an online refinement method is introduced to adapt learning on the fly to unknown domains. The proposed approach outperforms all state-of-the-art approaches, including those that handle motion e.g. through learned flow. Our results are comparable in quality to the ones which used stereo as supervision and significantly improve depth prediction on scenes and datasets which contain a lot of object motion. The approach is of practical relevance, as it allows transfer across environments, by transferring models trained on data collected for robot navigation in urban scenes to indoor navigation settings. The code associated with this paper can be found at https://sites.google.com/view/struct2depth.
Posted Content•
Self-labelling via simultaneous clustering and representation learning

[...]

Yuki M. Asano1, Christian Rupprecht1, Andrea Vedaldi1•
University of Oxford1
13 Nov 2019-arXiv: Computer Vision and Pattern Recognition
TL;DR: The proposed novel and principled learning formulation is able to self-label visual data so as to train highly competitive image representations without manual labels and yields the first self-supervised AlexNet that outperforms the supervised Pascal VOC detection baseline.
Abstract: Combining clustering and representation learning is one of the most promising approaches for unsupervised learning of deep neural networks. However, doing so naively leads to ill posed learning problems with degenerate solutions. In this paper, we propose a novel and principled learning formulation that addresses these issues. The method is obtained by maximizing the information between labels and input data indices. We show that this criterion extends standard crossentropy minimization to an optimal transport problem, which we solve efficiently for millions of input images and thousands of labels using a fast variant of the Sinkhorn-Knopp algorithm. The resulting method is able to self-label visual data so as to train highly competitive image representations without manual labels. Our method achieves state of the art representation learning performance for AlexNet and ResNet-50 on SVHN, CIFAR-10, CIFAR-100 and ImageNet and yields the first self-supervised AlexNet that outperforms the supervised Pascal VOC detection baseline. Code and models are available.
Journal Article•10.1016/J.MEDIA.2019.07.006•
Unsupervised learning of probabilistic diffeomorphic registration for images and surfaces.

[...]

Adrian V. Dalca1, Adrian V. Dalca2, Adrian V. Dalca3, Guha Balakrishnan1, John V. Guttag1, Mert R. Sabuncu3 •
Massachusetts Institute of Technology1, Harvard University2, Cornell University3
12 Jul 2019-Medical Image Analysis
TL;DR: A probabilistic generative model is presented and an unsupervised learning-based inference algorithm is derived that uses insights from classical registration methods and makes use of recent developments in convolutional neural networks (CNNs).
Proceedings Article•10.5220/0007364503720380•
Improving Unsupervised Defect Segmentation by Applying Structural Similarity to Autoencoders

[...]

Paul Bergmann, Sindy Löwe1, Michael Fauser, David Sattlegger, Carsten Steger •
University of Amsterdam1
25 Feb 2019
TL;DR: This work proposes to use a perceptual loss function based on structural similarity which examines inter-dependencies between local image regions, taking into account luminance, contrast and structural information, instead of simply comparing single pixel values.
Abstract: Convolutional autoencoders have emerged as popular methods for unsupervised defect segmentation on image data. Most commonly, this task is performed by thresholding a per-pixel reconstruction error based on an p-distance. This procedure, however, leads to large residuals whenever the reconstruction includes slight localization inaccuracies around edges. It also fails to reveal defective regions that have been visually altered when intensity values stay roughly consistent. We show that these problems prevent these approaches from being applied to complex real-world scenarios and that they cannot be easily avoided by employing more elaborate architectures such as variational or feature matching autoencoders. We propose to use a perceptual loss function based on structural similarity that examines inter-dependencies between local image regions, taking into account luminance, contrast, and structural information, instead of simply comparing single pixel values. It achieves significant performance gains on a challenging real-world dataset of nanofibrous materials and a novel dataset of two woven fabrics over state-of-the-art approaches for unsupervised defect segmentation that use per-pixel reconstruction error metrics.
Posted Content•
Machine Learning Methods Economists Should Know About

[...]

Susan Athey1, Guido W. Imbens1•
Stanford University1
24 Mar 2019-arXiv: Econometrics
TL;DR: Newly developed methods at the intersection of ML and econometrics, methods that typically perform better than either off-the-shelf ML or more traditional econometric methods when applied to particular classes of problems, are highlighted.
Abstract: We discuss the relevance of the recent Machine Learning (ML) literature for economics and econometrics. First we discuss the differences in goals, methods and settings between the ML literature and the traditional econometrics and statistics literatures. Then we discuss some specific methods from the machine learning literature that we view as important for empirical researchers in economics. These include supervised learning methods for regression and classification, unsupervised learning methods, as well as matrix completion methods. Finally, we highlight newly developed methods at the intersection of ML and econometrics, methods that typically perform better than either off-the-shelf ML or more traditional econometric methods when applied to particular classes of problems, problems that include causal inference for average treatment effects, optimal policy estimation, and estimation of the counterfactual effect of price changes in consumer choice models.
Proceedings Article•10.1109/CVPR.2019.00108•
Unsupervised Event-Based Learning of Optical Flow, Depth, and Egomotion

[...]

Alex Zihao Zhu1, Liangzhe Yuan1, Kenneth Chaney1, Kostas Daniilidis1•
University of Pennsylvania1
15 Jun 2019
TL;DR: A novel framework for unsupervised learning for event cameras that learns motion information from only the event stream in the form of a discretized volume that maintains the temporal distribution of the events is proposed.
Abstract: In this work, we propose a novel framework for unsupervised learning for event cameras that learns motion information from only the event stream. In particular, we propose an input representation of the events in the form of a discretized volume that maintains the temporal distribution of the events, which we pass through a neural network to predict the motion of the events. This motion is used to attempt to remove any motion blur in the event image. We then propose a loss function applied to the motion compensated event image that measures the motion blur in this image. We train two networks with this framework, one to predict optical flow, and one to predict egomotion and depths, and evaluate these networks on the Multi Vehicle Stereo Event Camera dataset, along with qualitative results from a variety of different scenes.
Journal Article•10.1016/J.INS.2019.05.042•
Combining unsupervised and supervised learning in credit card fraud detection

[...]

Fabrizio Carcillo1, Yann-Aël Le Borgne1, Olivier Caelen, Yacine Kessaci, Frédéric Oblé, Gianluca Bontempi1 •
Université libre de Bruxelles1
16 May 2019-Information Sciences
TL;DR: This paper presents a hybrid technique that combines supervised and unsupervised techniques to improve the fraud detection accuracy and shows that the combination is efficient and does indeed improve the accuracy of the detection.
Journal Article•10.1016/J.COMPAG.2018.12.006•
Current and future applications of statistical machine learning algorithms for agricultural machine vision systems

[...]

Tanzeel U. Rehman1, Md. Sultan Mahmud2, Young K. Chang2, Jian Jin1, Jaemyung Shin2 •
Purdue University1, Dalhousie University2
01 Jan 2019-Computers and Electronics in Agriculture
TL;DR: Current application of statistical machine learning techniques in machine vision systems, analyses each technique potential for specific application and represents an overview of instructive examples in different agricultural areas are surveyed.
Proceedings Article•10.1109/ICCV.2019.00305•
Unsupervised Pre-Training of Image Features on Non-Curated Data

[...]

Mathilde Caron1, Piotr Bojanowski1, Julien Mairal, Armand Joulin1•
Facebook1
27 Oct 2019
TL;DR: This work proposes a new unsupervised approach which leverages self-supervision and clustering to capture complementary statistics from large-scale data and validates its approach on 96 million images from YFCC100M, achieving state-of-the-art results among unsuper supervised methods on standard benchmarks.
Abstract: Pre-training general-purpose visual features with convolutional neural networks without relying on annotations is a challenging and important task. Most recent efforts in unsupervised feature learning have focused on either small or highly curated datasets like ImageNet, whereas using uncurated raw datasets was found to decrease the feature quality when evaluated on a transfer task. Our goal is to bridge the performance gap between unsupervised methods trained on curated data, which are costly to obtain, and massive raw datasets that are easily available. To that effect, we propose a new unsupervised approach which leverages self-supervision and clustering to capture complementary statistics from large-scale data. We validate our approach on 96 million images from YFCC100M, achieving state-of-the-art results among unsupervised methods on standard benchmarks, which confirms the potential of unsupervised learning when only uncurated data are available. We also show that pre-training a supervised VGG-16 with our method achieves 74.9% top-1 classification accuracy on the validation set of ImageNet, which is an improvement of +0.8% over the same network trained from scratch. Our code is available at https://github.com/facebookresearch/DeeperCluster.
...

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve