Open AccessPosted Content
Motion2Vec: Semi-Supervised Representation Learning from Surgical Videos
TL;DR: Motion2vec as discussed by the authors learns a motion-centric representation of surgical video demonstrations by grouping them into action segments/sub-goals/options in a semi-supervised manner.
read more
Abstract: Learning meaningful visual representations in an embedding space can facilitate generalization in downstream tasks such as action segmentation and imitation. In this paper, we learn a motion-centric representation of surgical video demonstrations by grouping them into action segments/sub-goals/options in a semi-supervised manner. We present Motion2Vec, an algorithm that learns a deep embedding feature space from video observations by minimizing a metric learning loss in a Siamese network: images from the same action segment are pulled together while pushed away from randomly sampled images of other segments, while respecting the temporal ordering of the images. The embeddings are iteratively segmented with a recurrent neural network for a given parametrization of the embedding space after pre-training the Siamese network. We only use a small set of labeled video segments to semantically align the embedding space and assign pseudo-labels to the remaining unlabeled data by inference on the learned model parameters. We demonstrate the use of this representation to imitate surgical suturing motions from publicly available videos of the JIGSAWS dataset. Results give 85.5 % segmentation accuracy on average suggesting performance improvement over several state-of-the-art baselines, while kinematic pose imitation gives 0.94 centimeter error in position per observation on the test set. Videos, code and data are available at this https URL
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A Kinematic Bottleneck Approach for Pose Regression of Flexible Surgical Instruments Directly From Images
Luca Sestini,Benoit Rosa,Elena De Momi,Giancarlo Ferrigno,Nicolas Padoy +4 more
- 25 Feb 2021
TL;DR: In this paper, a self-supervised image-based method was proposed for real-time 3D pose estimation of surgical instruments using a flexible robotized endoscope, which is shown promising results on semi-synthetic, phantom and in-vivo datasets.
26
A Kinematic Bottleneck Approach For Pose Regression of Flexible Surgical Instruments directly from Images
TL;DR: In this paper, a self-supervised image-based method for real-time 3D pose estimation of surgical instruments is proposed. But this method requires a large number of manually annotated images for efficient learning.
11
•Posted Content
Curiosity-driven Intuitive Physics Learning.
Tejas Gaikwad,Romi Banerjee +1 more
TL;DR: In this article, a model for curiosity-driven learning and inference for real-world AI agents is proposed based on the arousal of curiosity, deriving from observations along discontinuities in the fundamental macroscopic solid-body physics parameters, i.e., shape constancy, spatial-temporal continuity, and object permanence.
•Book
Learning in control
Edward Grant
- 02 Jan 1993
TL;DR: The probabilistic forward dynamics models can be employed to control complex musculoskeletal robots on an antagonistic pair of pneumatic artificial muscles using only one-step-ahead predictions of the forward model and incorporating model uncertainty.
References
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
99K
A tutorial on hidden Markov models and selected applications in speech recognition
Lawrence R. Rabiner
- 01 Feb 1989
TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.
•Posted Content
Distributed Representations of Words and Phrases and their Compositionality
TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.
•Proceedings Article
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
John Lafferty,Andrew McCallum,Fernando Pereira +2 more
- 28 Jun 2001
TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.
FaceNet: A Unified Embedding for Face Recognition and Clustering
TL;DR: FaceNet as discussed by the authors uses a deep convolutional network trained to directly optimize the embedding itself, rather than an intermediate bottleneck layer as in previous deep learning approaches, and achieves state-of-the-art face recognition performance using only 128 bytes per face.
14.2K