Scispace (Formerly Typeset)
  1. Home
  2. Topics
  3. Unsupervised learning
  4. 2016
  1. Home
  2. Topics
  3. Unsupervised learning
  4. 2016
Showing papers on "Unsupervised learning published in 2016"
Gaussian Processes For Machine Learning

[...]

Tanja Hueber
1 Jan 2016
TL;DR: The gaussian processes for machine learning is universally compatible with any devices to read, and is available in the digital library an online access to it is set as public so you can get it instantly.
Abstract: Thank you for downloading gaussian processes for machine learning. As you may know, people have search numerous times for their chosen readings like this gaussian processes for machine learning, but end up in harmful downloads. Rather than enjoying a good book with a cup of tea in the afternoon, instead they juggled with some malicious bugs inside their computer. gaussian processes for machine learning is available in our digital library an online access to it is set as public so you can get it instantly. Our books collection hosts in multiple locations, allowing you to get the most less latency time to download any of our books like this one. Kindly say, the gaussian processes for machine learning is universally compatible with any devices to read.

10,041 citations

Proceedings Article•
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

[...]

Alec Radford, Luke Metz, Soumith Chintala1•
Facebook1
1 Jan 2016
TL;DR: Deep convolutional generative adversarial networks (DCGANs) as discussed by the authors learn a hierarchy of representations from object parts to scenes in both the generator and discriminator for unsupervised learning.
Abstract: In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrate that they are a strong candidate for unsupervised learning. Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator. Additionally, we use the learned features for novel tasks - demonstrating their applicability as general image representations.

7,389 citations

Posted Content•
Least Squares Generative Adversarial Networks

[...]

Xudong Mao1, Qing Li1, Haoran Xie2, Raymond Y. K. Lau, Zhen Wang3, Stephen Paul Smolley •
City University of Hong Kong1, University of Hong Kong2, Northwestern Polytechnical University3
13 Nov 2016-arXiv: Computer Vision and Pattern Recognition
TL;DR: This paper proposes the Least Squares Generative Adversarial Networks (LSGANs) which adopt the least squares loss function for the discriminator, and shows that minimizing the objective function of LSGAN yields minimizing the Pearson X2 divergence.
Abstract: Unsupervised learning with generative adversarial networks (GANs) has proven hugely successful. Regular GANs hypothesize the discriminator as a classifier with the sigmoid cross entropy loss function. However, we found that this loss function may lead to the vanishing gradients problem during the learning process. To overcome such a problem, we propose in this paper the Least Squares Generative Adversarial Networks (LSGANs) which adopt the least squares loss function for the discriminator. We show that minimizing the objective function of LSGAN yields minimizing the Pearson $\chi^2$ divergence. There are two benefits of LSGANs over regular GANs. First, LSGANs are able to generate higher quality images than regular GANs. Second, LSGANs perform more stable during the learning process. We evaluate LSGANs on five scene datasets and the experimental results show that the images generated by LSGANs are of better quality than the ones generated by regular GANs. We also conduct two comparison experiments between LSGANs and regular GANs to illustrate the stability of LSGANs.

3,650 citations

Book Chapter•10.1007/978-3-319-46466-4_5•
Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles

[...]

Mehdi Noroozi1, Paolo Favaro1•
University of Bern1
8 Oct 2016
TL;DR: In this article, a siamese-ennead convolutional neural network (CFN) is proposed to build features suitable for object detection and classification without human annotation and later transferred via fine-tuning on a different, smaller and labeled dataset.
Abstract: We propose a novel unsupervised learning approach to build features suitable for object detection and classification. The features are pre-trained on a large dataset without human annotation and later transferred via fine-tuning on a different, smaller and labeled dataset. The pre-training consists of solving jigsaw puzzles of natural images. To facilitate the transfer of features to other tasks, we introduce the context-free network (CFN), a siamese-ennead convolutional neural network. The features correspond to the columns of the CFN and they process image tiles independently (i.e., free of context). The later layers of the CFN then use the features to identify their geometric arrangement. Our experimental evaluations show that the learned features capture semantically relevant content. We pre-train the CFN on the training set of the ILSVRC2012 dataset and transfer the features on the combined training and validation set of Pascal VOC 2007 for object detection (via fast RCNN) and classification. These features outperform all current unsupervised features with \(51.8\,\%\) for detection and \(68.6\,\%\) for classification, and reduce the gap with supervised learning (\(56.5\,\%\) and \(78.2\,\%\) respectively).

3,016 citations

Posted Content•
Variational Graph Auto-Encoders

[...]

Thomas Kipf, Max Welling
21 Nov 2016-arXiv: Machine Learning
TL;DR: The variational graph auto-encoder (VGAE) is introduced, a framework for unsupervised learning on graph-structured data based on the variational auto- Encoder (VAE) that can naturally incorporate node features, which significantly improves predictive performance on a number of benchmark datasets.
Abstract: We introduce the variational graph auto-encoder (VGAE), a framework for unsupervised learning on graph-structured data based on the variational auto-encoder (VAE). This model makes use of latent variables and is capable of learning interpretable latent representations for undirected graphs. We demonstrate this model using a graph convolutional network (GCN) encoder and a simple inner product decoder. Our model achieves competitive results on a link prediction task in citation networks. In contrast to most existing models for unsupervised learning on graph-structured data and link prediction, our model can naturally incorporate node features, which significantly improves predictive performance on a number of benchmark datasets.

2,955 citations

Proceedings Article•
Pixel recurrent neural networks

[...]

Aaron van den Oord1, Nal Kalchbrenner1, Koray Kavukcuoglu1•
Google1
19 Jun 2016
TL;DR: A deep neural network is presented that sequentially predicts the pixels in an image along the two spatial dimensions and encodes the complete set of dependencies in the image to achieve log-likelihood scores on natural images that are considerably better than the previous state of the art.
Abstract: Modeling the distribution of natural images is a landmark problem in unsupervised learning. This task requires an image model that is at once expressive, tractable and scalable. We present a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions. Our method models the discrete probability of the raw pixel values and encodes the complete set of dependencies in the image. Architectural novelties include fast two-dimensional recurrent layers and an effective use of residual connections in deep recurrent networks. We achieve log-likelihood scores on natural images that are considerably better than the previous state of the art. Our main results also provide benchmarks on the diverse ImageNet dataset. Samples generated from the model appear crisp, varied and globally coherent.

2,250 citations

Posted Content•
Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles

[...]

Mehdi Noroozi1, Paolo Favaro1•
University of Bern1
30 Mar 2016-arXiv: Computer Vision and Pattern Recognition
TL;DR: A novel unsupervised learning approach to build features suitable for object detection and classification and to facilitate the transfer of features to other tasks, the context-free network (CFN), a siamese-ennead convolutional neural network is introduced.
Abstract: In this paper we study the problem of image representation learning without human annotation. By following the principles of self-supervision, we build a convolutional neural network (CNN) that can be trained to solve Jigsaw puzzles as a pretext task, which requires no manual labeling, and then later repurposed to solve object classification and detection. To maintain the compatibility across tasks we introduce the context-free network (CFN), a siamese-ennead CNN. The CFN takes image tiles as input and explicitly limits the receptive field (or context) of its early processing units to one tile at a time. We show that the CFN includes fewer parameters than AlexNet while preserving the same semantic learning capabilities. By training the CFN to solve Jigsaw puzzles, we learn both a feature mapping of object parts as well as their correct spatial arrangement. Our experimental evaluations show that the learned features capture semantically relevant content. Our proposed method for learning visual representations outperforms state of the art methods in several transfer learning benchmarks.

2,145 citations

Proceedings Article•
Return of frustratingly easy domain adaptation

[...]

Baochen Sun1, Jiashi Feng2, Kate Saenko1•
University of Massachusetts Lowell1, National University of Singapore2
12 Feb 2016
TL;DR: Correlation alignment (CORAL) as discussed by the authors minimizes domain shift by aligning the second-order statistics of source and target distributions, without requiring any target labels, and it can be implemented in four lines of Matlab code.
Abstract: Unlike human learning, machine learning often fails to handle changes between training (source) and test (target) input distributions. Such domain shifts, common in practical scenarios, severely damage the performance of conventional machine learning methods. Supervised domain adaptation methods have been proposed for the case when the target data have labels, including some that perform very well despite being "frustratingly easy" to implement. However, in practice, the target domain is often unlabeled, requiring unsupervised adaptation. We propose a simple, effective, and efficient method for unsupervised domain adaptation called CORrelation ALignment (CORAL). CORAL minimizes domain shift by aligning the second-order statistics of source and target distributions, without requiring any target labels. Even though it is extraordinarily simple–it can be implemented in four lines of Matlab code–CORAL performs remarkably well in extensive evaluations on standard benchmark datasets.

2,132 citations

Posted Content•
Generative Adversarial Imitation Learning

[...]

Jonathan Ho1, Stefano Ermon2•
OpenAI1, Stanford University2
10 Jun 2016-arXiv: Learning
TL;DR: A new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning, is proposed and a certain instantiation of this framework draws an analogy between imitation learning and generative adversarial networks.
Abstract: Consider learning a policy from example expert behavior, without interaction with the expert or access to reinforcement signal. One approach is to recover the expert's cost function with inverse reinforcement learning, then extract a policy from that cost function with reinforcement learning. This approach is indirect and can be slow. We propose a new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning. We show that a certain instantiation of our framework draws an analogy between imitation learning and generative adversarial networks, from which we derive a model-free imitation learning algorithm that obtains significant performance gains over existing model-free methods in imitating complex behaviors in large, high-dimensional environments.

2,000 citations

Posted Content•
Tutorial on Variational Autoencoders

[...]

Carl Doersch
19 Jun 2016-arXiv: Machine Learning
TL;DR: This tutorial introduces the intuitions behind VAEs, explains the mathematics behind them, and describes some empirical behavior.
Abstract: In just three years, Variational Autoencoders (VAEs) have emerged as one of the most popular approaches to unsupervised learning of complicated distributions. VAEs are appealing because they are built on top of standard function approximators (neural networks), and can be trained with stochastic gradient descent. VAEs have already shown promise in generating many kinds of complicated data, including handwritten digits, faces, house numbers, CIFAR images, physical models of scenes, segmentation, and predicting the future from static images. This tutorial introduces the intuitions behind VAEs, explains the mathematics behind them, and describes some empirical behavior. No prior knowledge of variational Bayesian methods is assumed.

1,923 citations

Posted Content•
Pixel Recurrent Neural Networks

[...]

Aaron van den Oord1, Nal Kalchbrenner1, Koray Kavukcuoglu1•
Google1
25 Jan 2016-arXiv: Computer Vision and Pattern Recognition
TL;DR: In this paper, a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions is presented. But the model is not able to model the discrete probability of the raw pixel values and encodes the complete set of dependencies.
Abstract: Modeling the distribution of natural images is a landmark problem in unsupervised learning. This task requires an image model that is at once expressive, tractable and scalable. We present a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions. Our method models the discrete probability of the raw pixel values and encodes the complete set of dependencies in the image. Architectural novelties include fast two-dimensional recurrent layers and an effective use of residual connections in deep recurrent networks. We achieve log-likelihood scores on natural images that are considerably better than the previous state of the art. Our main results also provide benchmarks on the diverse ImageNet dataset. Samples generated from the model appear crisp, varied and globally coherent.
Journal Article•10.1109/TNNLS.2015.2424995•
Extreme Learning Machine for Multilayer Perceptron

[...]

Jiexiong Tang1, Chenwei Deng1, Guang-Bin Huang2•
Beijing Institute of Technology1, Nanyang Technological University2
01 Apr 2016-IEEE Transactions on Neural Networks
TL;DR: Extensive experiments on various widely used classification data sets show that the proposed algorithm achieves better and faster convergence than the existing state-of-the-art hierarchical learning methods, and multiple applications in computer vision further confirm the generality and capability of the proposed learning scheme.
Abstract: Extreme learning machine (ELM) is an emerging learning algorithm for the generalized single hidden layer feedforward neural networks, of which the hidden node parameters are randomly generated and the output weights are analytically computed. However, due to its shallow architecture, feature learning using ELM may not be effective for natural signals (e.g., images/videos), even with a large number of hidden nodes. To address this issue, in this paper, a new ELM-based hierarchical learning framework is proposed for multilayer perceptron. The proposed architecture is divided into two main components: 1) self-taught feature extraction followed by supervised feature classification and 2) they are bridged by random initialized hidden weights. The novelties of this paper are as follows: 1) unsupervised multilayer encoding is conducted for feature extraction, and an ELM-based sparse autoencoder is developed via $\ell _{1}$ constraint. By doing so, it achieves more compact and meaningful feature representations than the original ELM; 2) by exploiting the advantages of ELM random feature mapping, the hierarchically encoded outputs are randomly projected before final decision making, which leads to a better generalization with faster learning speed; and 3) unlike the greedy layerwise training of deep learning (DL), the hidden layers of the proposed framework are trained in a forward manner. Once the previous layer is established, the weights of the current layer are fixed without fine-tuning. Therefore, it has much better learning efficiency than the DL. Extensive experiments on various widely used classification data sets show that the proposed algorithm achieves better and faster convergence than the existing state-of-the-art hierarchical learning methods. Furthermore, multiple applications in computer vision further confirm the generality and capability of the proposed learning scheme.
Posted Content•
Reinforcement Learning with Unsupervised Auxiliary Tasks

[...]

Max Jaderberg1, Volodymyr Mnih1, Wojciech Marian Czarnecki2, Tom Schaul1, Joel Z. Leibo1, David Silver1, Koray Kavukcuoglu1 •
Google1, Jagiellonian University2
16 Nov 2016-arXiv: Learning
TL;DR: This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth.
Abstract: Deep reinforcement learning agents have achieved state-of-the-art results by directly maximising cumulative reward. However, environments contain a much wider variety of possible training signals. In this paper, we introduce an agent that also maximises many other pseudo-reward functions simultaneously by reinforcement learning. All of these tasks share a common representation that, like unsupervised learning, continues to develop in the absence of extrinsic rewards. We also introduce a novel mechanism for focusing this representation upon extrinsic rewards, so that learning can rapidly adapt to the most relevant aspects of the actual task. Our agent significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% expert human performance on Labyrinth.
Proceedings Article•
Density estimation using Real NVP

[...]

Laurent Dinh1, Jascha Sohl-Dickstein2, Samy Bengio2•
Université de Montréal1, Google2
27 May 2016
TL;DR: The authors extend the space of probabilistic models using real-valued non-volume preserving transformations, a set of powerful invertible and learnable transformations, resulting in an unsupervised learning algorithm with exact log-likelihood computation, exact sampling, exact inference of latent variables, and an interpretable latent space.
Abstract: Unsupervised learning of probabilistic models is a central yet challenging problem in machine learning. Specifically, designing models with tractable learning, sampling, inference and evaluation is crucial in solving this task. We extend the space of such models using real-valued non-volume preserving (real NVP) transformations, a set of powerful invertible and learnable transformations, resulting in an unsupervised learning algorithm with exact log-likelihood computation, exact sampling, exact inference of latent variables, and an interpretable latent space. We demonstrate its ability to model natural images on four datasets through sampling, log-likelihood evaluation and latent variable manipulations.
Journal Article•10.1109/TIE.2016.2519325•
An Intelligent Fault Diagnosis Method Using Unsupervised Feature Learning Towards Mechanical Big Data

[...]

Yaguo Lei1, Feng Jia1, Jing Lin1, Saibo Xing1, Steven X. Ding2 •
Xi'an Jiaotong University1, University of Duisburg-Essen2
19 Jan 2016-IEEE Transactions on Industrial Electronics
TL;DR: A two-stage learning method inspired by the idea of unsupervised feature learning that uses artificial intelligence techniques to learn features from raw data for intelligent diagnosis of machines that reduces the need of human labor and makes intelligent fault diagnosis handle big data more easily.
Abstract: Intelligent fault diagnosis is a promising tool to deal with mechanical big data due to its ability in rapidly and efficiently processing collected signals and providing accurate diagnosis results. In traditional intelligent diagnosis methods, however, the features are manually extracted depending on prior knowledge and diagnostic expertise. Such processes take advantage of human ingenuity but are time-consuming and labor-intensive. Inspired by the idea of unsupervised feature learning that uses artificial intelligence techniques to learn features from raw data, a two-stage learning method is proposed for intelligent diagnosis of machines. In the first learning stage of the method, sparse filtering, an unsupervised two-layer neural network, is used to directly learn features from mechanical vibration signals. In the second stage, softmax regression is employed to classify the health conditions based on the learned features. The proposed method is validated by a motor bearing dataset and a locomotive bearing dataset, respectively. The results show that the proposed method obtains fairly high diagnosis accuracies and is superior to the existing methods for the motor bearing dataset. Because of learning features adaptively, the proposed method reduces the need of human labor and makes intelligent fault diagnosis handle big data more easily.
Proceedings Article•
Unsupervised Learning for Physical Interaction through Video Prediction

[...]

Chelsea Finn1, Ian Goodfellow2, Sergey Levine1•
University of California, Berkeley1, OpenAI2
23 May 2016
TL;DR: An action-conditioned video prediction model is developed that explicitly models pixel motion, by predicting a distribution over pixel motion from previous frames, and is partially invariant to object appearance, enabling it to generalize to previously unseen objects.
Abstract: A core challenge for an agent learning to interact with the world is to predict how its actions affect objects in its environment. Many existing methods for learning the dynamics of physical interactions require labeled object information. However, to scale real-world interaction learning to a variety of scenes and objects, acquiring labeled data becomes increasingly impractical. To learn about physical object motion without labels, we develop an action-conditioned video prediction model that explicitly models pixel motion, by predicting a distribution over pixel motion from previous frames. Because our model explicitly predicts motion, it is partially invariant to object appearance, enabling it to generalize to previously unseen objects. To explore video prediction for real-world interactive agents, we also introduce a dataset of 59,000 robot interactions involving pushing motions, including a test set with novel objects. In this dataset, accurate prediction of videos conditioned on the robot's future actions amounts to learning a "visual imagination" of different futures based on different courses of action. Our experiments show that our proposed method produces more accurate video predictions both quantitatively and qualitatively, when compared to prior methods.
Journal Article•10.1109/TPAMI.2015.2496141•
Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks

[...]

Alexey Dosovitskiy1, Philipp Fischer1, Jost Tobias Springenberg1, Martin Riedmiller1, Thomas Brox1 •
University of Freiburg1
01 Sep 2016-IEEE Transactions on Pattern Analysis and Machine Intelligence
TL;DR: In this article, a set of surrogate classes are formed by applying a variety of transformations to a randomly sampled image patch, and the resulting feature representation is not class specific, but provides robustness to the transformations that have been applied during training.
Abstract: Deep convolutional networks have proven to be very successful in learning task specific features that allow for unprecedented performance on various computer vision tasks. Training of such networks follows mostly the supervised learning paradigm, where sufficiently many input-output pairs are required for training. Acquisition of large training sets is one of the key challenges, when approaching a new task. In this paper, we aim for generic feature learning and present an approach for training a convolutional network using only unlabeled data. To this end, we train the network to discriminate between a set of surrogate classes. Each surrogate class is formed by applying a variety of transformations to a randomly sampled ‘seed’ image patch. In contrast to supervised network training, the resulting feature representation is not class specific. It rather provides robustness to the transformations that have been applied during training. This generic feature representation allows for classification results that outperform the state of the art for unsupervised learning on several popular datasets (STL-10, CIFAR-10, Caltech-101, Caltech-256). While features learned with our approach cannot compete with class specific features from supervised training on a classification task, we show that they are advantageous on geometric matching problems, where they also outperform the SIFT descriptor.
Proceedings Article•10.1109/CVPR.2016.556•
Joint Unsupervised Learning of Deep Representations and Image Clusters

[...]

Jianwei Yang1, Devi Parikh1, Dhruv Batra1•
Virginia Tech1
13 Apr 2016
TL;DR: A recurrent framework for joint unsupervised learning of deep representations and image clusters by integrating two processes into a single model with a unified weighted triplet loss function and optimizing it end-to-end can obtain not only more powerful representations, but also more precise image clusters.
Abstract: In this paper, we propose a recurrent framework for joint unsupervised learning of deep representations and image clusters. In our framework, successive operations in a clustering algorithm are expressed as steps in a recurrent process, stacked on top of representations output by a Convolutional Neural Network (CNN). During training, image clusters and representations are updated jointly: image clustering is conducted in the forward pass, while representation learning in the backward pass. Our key idea behind this framework is that good representations are beneficial to image clustering and clustering results provide supervisory signals to representation learning. By integrating two processes into a single model with a unified weighted triplet loss function and optimizing it end-to-end, we can obtain not only more powerful representations, but also more precise image clusters. Extensive experiments show that our method outperforms the state of-the-art on image clustering across a variety of image datasets. Moreover, the learned representations generalize well when transferred to other tasks. The source code can be downloaded from https://github.com/ jwyang/joint-unsupervised-learning.
Book Chapter•10.1007/978-3-319-46448-0_32•
Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification

[...]

Ishan Misra1, C. Lawrence Zitnick2, Martial Hebert1•
Carnegie Mellon University1, Facebook2
8 Oct 2016
TL;DR: This paper forms an approach for learning a visual representation from the raw spatiotemporal signals in videos using a Convolutional Neural Network, and shows that this method captures information that is temporally varying, such as human pose.
Abstract: In this paper, we present an approach for learning a visual representation from the raw spatiotemporal signals in videos. Our representation is learned without supervision from semantic labels. We formulate our method as an unsupervised sequential verification task, i.e., we determine whether a sequence of frames from a video is in the correct temporal order. With this simple task and no semantic labels, we learn a powerful visual representation using a Convolutional Neural Network (CNN). The representation contains complementary information to that learned from supervised image datasets like ImageNet. Qualitative results show that our method captures information that is temporally varying, such as human pose. When used as pre-training for action recognition, our method gives significant gains over learning without external data on benchmark datasets like UCF101 and HMDB51. To demonstrate its sensitivity to human pose, we show results for pose estimation on the FLIC and MPII datasets that are competitive, or better than approaches using significantly more supervision. Our method can be combined with supervised representations to provide an additional boost in accuracy.
Posted Content•
Matching Networks for One Shot Learning

[...]

Oriol Vinyals1, Charles Blundell1, Timothy P. Lillicrap1, Koray Kavukcuoglu1, Daan Wierstra1 •
Google1
13 Jun 2016-arXiv: Learning
TL;DR: This work employs ideas from metric learning based on deep neural features and from recent advances that augment neural networks with external memories to learn a network that maps a small labelled support set and an unlabelled example to its label, obviating the need for fine-tuning to adapt to new class types.
Abstract: Learning from a few examples remains a key challenge in machine learning. Despite recent advances in important domains such as vision and language, the standard supervised deep learning paradigm does not offer a satisfactory solution for learning new concepts rapidly from little data. In this work, we employ ideas from metric learning based on deep neural features and from recent advances that augment neural networks with external memories. Our framework learns a network that maps a small labelled support set and an unlabelled example to its label, obviating the need for fine-tuning to adapt to new class types. We then define one-shot learning problems on vision (using Omniglot, ImageNet) and language tasks. Our algorithm improves one-shot accuracy on ImageNet from 87.6% to 93.2% and from 88.0% to 93.8% on Omniglot compared to competing approaches. We also demonstrate the usefulness of the same model on language modeling by introducing a one-shot task on the Penn Treebank.
Posted Content•
Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning

[...]

William Lotter1, Gabriel Kreiman1, David D. Cox2•
Boston Children's Hospital1, Harvard University2
25 May 2016-arXiv: Learning
TL;DR: In this article, a predictive neural network is proposed to predict future frames in a video sequence, with each layer in the network making local predictions and only forwarding deviations from those predictions to subsequent network layers.
Abstract: While great strides have been made in using deep learning algorithms to solve supervised learning tasks, the problem of unsupervised learning - leveraging unlabeled examples to learn about the structure of a domain - remains a difficult unsolved challenge. Here, we explore prediction of future frames in a video sequence as an unsupervised learning rule for learning about the structure of the visual world. We describe a predictive neural network ("PredNet") architecture that is inspired by the concept of "predictive coding" from the neuroscience literature. These networks learn to predict future frames in a video sequence, with each layer in the network making local predictions and only forwarding deviations from those predictions to subsequent network layers. We show that these networks are able to robustly learn to predict the movement of synthetic (rendered) objects, and that in doing so, the networks learn internal representations that are useful for decoding latent object parameters (e.g. pose) that support object recognition with fewer training views. We also show that these networks can scale to complex natural image streams (car-mounted camera videos), capturing key aspects of both egocentric movement and the movement of objects in the visual scene, and the representation learned in this setting is useful for estimating the steering angle. Altogether, these results suggest that prediction represents a powerful framework for unsupervised learning, allowing for implicit learning of object and scene structure.
Proceedings Article•
Learning to Navigate in Complex Environments

[...]

Piotr Mirowski1, Razvan Pascanu1, Fabio Viola2, Hubert Soyer3, Andrew J. Ballard4, Andrea Banino1, Misha Denil5, Ross Goroshin6, Laurent Sifre1, Koray Kavukcuoglu1, Dharshan Kumaran1, Raia Hadsell1 •
Google1, University of Palermo2, National Institute of Informatics3, University of Cambridge4, University of Oxford5, Courant Institute of Mathematical Sciences6
11 Nov 2016
TL;DR: This work considers jointly learning the goal-driven reinforcement learning problem with auxiliary depth prediction and loop closure classification tasks and shows that data efficiency and task performance can be dramatically improved by relying on additional auxiliary tasks leveraging multimodal sensory inputs.
Abstract: Learning to navigate in complex environments with dynamic elements is an important milestone in developing AI agents. In this work we formulate the navigation question as a reinforcement learning problem and show that data efficiency and task performance can be dramatically improved by relying on additional auxiliary tasks leveraging multimodal sensory inputs. In particular we consider jointly learning the goal-driven reinforcement learning problem with auxiliary depth prediction and loop closure classification tasks. This approach can learn to navigate from raw sensory input in complicated 3D mazes, approaching human-level performance even under conditions where the goal location changes frequently. We provide detailed analysis of the agent behaviour, its ability to localise, and its network activity dynamics, showing that the agent implicitly learns key navigation abilities.
Journal Article•10.1103/PHYSREVB.94.195105•
Discovering phase transitions with unsupervised learning

[...]

Lei Wang1•
Chinese Academy of Sciences1
02 Nov 2016-Physical Review B
TL;DR: This work shows that unsupervised learning techniques can be readily used to identify phases and phases transitions of many-body systems by using principal component analysis to extract relevant low-dimensional representations of the original data and clustering analysis to identify distinct phases in the feature space.
Abstract: Unsupervised learning is a discipline of machine learning which aims at discovering patterns in large data sets or classifying the data into several categories without being trained explicitly. We show that unsupervised learning techniques can be readily used to identify phases and phases transitions of many-body systems. Starting with raw spin configurations of a prototypical Ising model, we use principal component analysis to extract relevant low-dimensional representations of the original data and use clustering analysis to identify distinct phases in the feature space. This approach successfully finds physical concepts such as the order parameter and structure factor to be indicators of a phase transition. We discuss the future prospects of discovering more complex phases and phase transitions using unsupervised learning techniques.
Book Chapter•10.1007/978-3-319-46448-0_1•
CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples

[...]

Filip Radenovic1, Giorgos Tolias1, Ondřej Chum1•
Czech Technical University in Prague1
8 Oct 2016
TL;DR: This work proposes to fine-tune CNN for image retrieval from a large collection of unordered images in a fully automated manner and shows that both hard positive and hard negative examples enhance the final performance in particular object retrieval with compact codes.
Abstract: Convolutional Neural Networks (CNNs) achieve state-of-the-art performance in many computer vision tasks. However, this achievement is preceded by extreme manual annotation in order to perform either training from scratch or fine-tuning for the target task. In this work, we propose to fine-tune CNN for image retrieval from a large collection of unordered images in a fully automated manner. We employ state-of-the-art retrieval and Structure-from-Motion (SfM) methods to obtain 3D models, which are used to guide the selection of the training data for CNN fine-tuning. We show that both hard positive and hard negative examples enhance the final performance in particular object retrieval with compact codes.
Proceedings Article•
Multi-task Sequence to Sequence Learning

[...]

Minh-Thang Luong1, Quoc V. Le1, Ilya Sutskever1, Oriol Vinyals1, Lukasz Kaiser1 •
Google1
1 Jan 2016
TL;DR: The results show that training on a small amount of parsing and image caption data can improve the translation quality between English and German by up to 1.5 BLEU points over strong single-task baselines on the WMT benchmarks, and reveal interesting properties of the two unsupervised learning objectives, autoencoder and skip-thought, in the MTL context.
Abstract: Sequence to sequence learning has recently emerged as a new paradigm in supervised learning. To date, most of its applications focused on only one task and not much work explored this framework for multiple tasks. This paper examines three multi-task learning (MTL) settings for sequence to sequence models: (a) the oneto-many setting - where the encoder is shared between several tasks such as machine translation and syntactic parsing, (b) the many-to-one setting - useful when only the decoder can be shared, as in the case of translation and image caption generation, and (c) the many-to-many setting - where multiple encoders and decoders are shared, which is the case with unsupervised objectives and translation. Our results show that training on a small amount of parsing and image caption data can improve the translation quality between English and German by up to 1.5 BLEU points over strong single-task baselines on the WMT benchmarks. Furthermore, we have established a new state-of-the-art result in constituent parsing with 93.0 F1. Lastly, we reveal interesting properties of the two unsupervised learning objectives, autoencoder and skip-thought, in the MTL context: autoencoder helps less in terms of perplexities but more on BLEU scores compared to skip-thought.
Journal Article•10.1109/TNNLS.2015.2404803•
Machine Learning Methods for Attack Detection in the Smart Grid

[...]

Mete Ozay1, Inaki Esnaola2, Fatos T. Yarman Vural3, Sanjeev R. Kulkarni2, H. Vincent Poor2 •
University of Birmingham1, Princeton University2, Middle East Technical University3
01 Aug 2016-IEEE Transactions on Neural Networks
TL;DR: Experimental analyses show that machine learning algorithms can detect attacks with performances higher than attack detection algorithms that employ state vector estimation methods in the proposed attack detection framework.
Abstract: Attack detection problems in the smart grid are posed as statistical learning problems for different attack scenarios in which the measurements are observed in batch or online settings. In this approach, machine learning algorithms are used to classify measurements as being either secure or attacked. An attack detection framework is provided to exploit any available prior knowledge about the system and surmount constraints arising from the sparse structure of the problem in the proposed approach. Well-known batch and online learning algorithms (supervised and semisupervised) are employed with decision- and feature-level fusion to model the attack detection problem. The relationships between statistical and geometric properties of attack vectors employed in the attack scenarios and learning algorithms are analyzed to detect unobservable attacks using statistical learning methods. The proposed algorithms are examined on various IEEE test systems. Experimental analyses show that machine learning algorithms can detect attacks with performances higher than attack detection algorithms that employ state vector estimation methods in the proposed attack detection framework.
Journal Article•10.3389/FNCOM.2016.00094•
Toward an Integration of Deep Learning and Neuroscience.

[...]

Adam H. Marblestone1, Greg Wayne2, Konrad P. Kording3•
Massachusetts Institute of Technology1, Google2, Rehabilitation Institute of Chicago3
14 Sep 2016-Frontiers in Computational Neuroscience
TL;DR: In this paper, the authors argue that a range of implementations of credit assignment through multiple layers of neurons are compatible with our current knowledge of neural circuitry, and that the brain's specialized systems can be interpreted as enabling efficient optimization for specific problem classes.
Abstract: Neuroscience has focused on the detailed implementation of computation, studying neural codes, dynamics and circuits. In machine learning, however, artificial neural networks tend to eschew precisely designed codes, dynamics or circuits in favor of brute force optimization of a cost function, often using simple and relatively uniform initial architectures. Two recent developments have emerged within machine learning that create an opportunity to connect these seemingly divergent perspectives. First, structured architectures are used, including dedicated systems for attention, recursion and various forms of short- and long-term memory storage. Second, cost functions and training procedures have become more complex and are varied across layers and over time. Here we think about the brain in terms of these ideas. We hypothesize that (1) the brain optimizes cost functions, (2) the cost functions are diverse and differ across brain locations and over development, and (3) optimization operates within a pre-structured architecture matched to the computational problems posed by behavior. In support of these hypotheses, we argue that a range of implementations of credit assignment through multiple layers of neurons are compatible with our current knowledge of neural circuitry, and that the brain's specialized systems can be interpreted as enabling efficient optimization for specific problem classes. Such a heterogeneously optimized system, enabled by a series of interacting cost functions, serves to make learning data-efficient and precisely targeted to the needs of the organism. We suggest directions by which neuroscience could seek to refine and test these hypotheses.
Journal Article•10.1109/JAS.2016.7508798•
Traffic signal timing via deep reinforcement learning

[...]

Li Li1, Yisheng Lv2, Fei-Yue Wang2•
Tsinghua University1, Chinese Academy of Sciences2
10 Jul 2016-IEEE/CAA Journal of Automatica Sinica
TL;DR: A set of algorithms to design signal timing plans via deep reinforcement learning to set up a deep neural network to learn the Q-function of reinforcement learning from the sampled traffic state/control inputs and the corresponding traffic system performance output.
Abstract: In this paper, we propose a set of algorithms to design signal timing plans via deep reinforcement learning. The core idea of this approach is to set up a deep neural network (DNN) to learn the Q-function of reinforcement learning from the sampled traffic state/control inputs and the corresponding traffic system performance output. Based on the obtained DNN, we can find the appropriate signal timing policies by implicitly modeling the control actions and the change of system states. We explain the possible benefits and implementation tricks of this new approach. The relationships between this new approach and some existing approaches are also carefully discussed.
Journal Article•10.1109/TKDE.2016.2545658•
Label Distribution Learning

[...]

Xin Geng1•
Southeast University1
01 Jul 2016-IEEE Transactions on Knowledge and Data Engineering
TL;DR: This paper proposes six working LDL algorithms in three ways: problem transformation, algorithm adaptation, and specialized algorithm design, and results show clear advantages of the specialized algorithms, which indicates the importance of special design for the characteristics of the LDL problem.
Abstract: Although multi-label learning can deal with many problems with label ambiguity, it does not fit some real applications well where the overall distribution of the importance of the labels matters. This paper proposes a novel learning paradigm named label distribution learning (LDL) for such kind of applications. The label distribution covers a certain number of labels, representing the degree to which each label describes the instance. LDL is a more general learning framework which includes both single-label and multi-label learning as its special cases. This paper proposes six working LDL algorithms in three ways: problem transformation, algorithm adaptation, and specialized algorithm design. In order to compare the performance of the LDL algorithms, six representative and diverse evaluation measures are selected via a clustering analysis, and the first batch of label distribution datasets are collected and made publicly available. Experimental results on one artificial and 15 real-world datasets show clear advantages of the specialized algorithms, which indicates the importance of special design for the characteristics of the LDL problem.
Book Chapter•10.1007/978-3-319-46448-0_48•
Ambient Sound Provides Supervision for Visual Learning

[...]

Andrew Owens1, Jiajun Wu1, Josh H. McDermott1, William T. Freeman2, William T. Freeman1, Antonio Torralba1 •
Massachusetts Institute of Technology1, Google2
8 Oct 2016
TL;DR: This work trains a convolutional neural network to predict a statistical summary of the sound associated with a video frame, and shows that this representation is comparable to that of other state-of-the-art unsupervised learning methods.
Abstract: The sound of crashing waves, the roar of fast-moving cars – sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models. To demonstrate this, we train a convolutional neural network to predict a statistical summary of the sound associated with a video frame. We show that, through this process, the network learns a representation that conveys information about objects and scenes. We evaluate this representation on several recognition tasks, finding that its performance is comparable to that of other state-of-the-art unsupervised learning methods. Finally, we show through visualizations that the network learns units that are selective to objects that are often associated with characteristic sounds.
...

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve