Motion2Vec: Semi-Supervised Representation Learning from Surgical Videos

Open AccessPosted Content

Motion2Vec: Semi-Supervised Representation Learning from Surgical Videos

- 31 May 2020

4

TL;DR: Motion2vec as discussed by the authors learns a motion-centric representation of surgical video demonstrations by grouping them into action segments/sub-goals/options in a semi-supervised manner.

Abstract: Learning meaningful visual representations in an embedding space can facilitate generalization in downstream tasks such as action segmentation and imitation. In this paper, we learn a motion-centric representation of surgical video demonstrations by grouping them into action segments/sub-goals/options in a semi-supervised manner. We present Motion2Vec, an algorithm that learns a deep embedding feature space from video observations by minimizing a metric learning loss in a Siamese network: images from the same action segment are pulled together while pushed away from randomly sampled images of other segments, while respecting the temporal ordering of the images. The embeddings are iteratively segmented with a recurrent neural network for a given parametrization of the embedding space after pre-training the Siamese network. We only use a small set of labeled video segments to semantically align the embedding space and assign pseudo-labels to the remaining unlabeled data by inference on the learned model parameters. We demonstrate the use of this representation to imitate surgical suturing motions from publicly available videos of the JIGSAWS dataset. Results give 85.5 % segmentation accuracy on average suggesting performance improvement over several state-of-the-art baselines, while kinematic pose imitation gives 0.94 centimeter error in position per observation on the test set. Videos, code and data are available at this https URL

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1109/LRA.2021.3062308

A Kinematic Bottleneck Approach for Pose Regression of Flexible Surgical Instruments Directly From Images

Luca Sestini, +4 more

- 25 Feb 2021

TL;DR: In this paper, a self-supervised image-based method was proposed for real-time 3D pose estimation of surgical instruments using a flexible robotized endoscope, which is shown promising results on semi-synthetic, phantom and in-vivo datasets.

...read moreread less

26

•Journal Article•10.1109/LRA.2021.3062308

A Kinematic Bottleneck Approach For Pose Regression of Flexible Surgical Instruments directly from Images

Luca Sestini, +4 more

- 28 Feb 2021

- arXiv: Robotics

TL;DR: In this paper, a self-supervised image-based method for real-time 3D pose estimation of surgical instruments is proposed. But this method requires a large number of manually annotated images for efficient learning.

...read moreread less

11

•Posted Content

Curiosity-driven Intuitive Physics Learning.

Tejas Gaikwad, +1 more

- 16 May 2021

- arXiv: Artificial Intelligence

TL;DR: In this article, a model for curiosity-driven learning and inference for real-world AI agents is proposed based on the arousal of curiosity, deriving from observations along discontinuities in the fundamental macroscopic solid-body physics parameters, i.e., shape constancy, spatial-temporal continuity, and object permanence.

...read moreread less

•Book

Learning in control

Edward Grant

- 02 Jan 1993

TL;DR: The probabilistic forward dynamics models can be employed to control complex musculoskeletal robots on an antagonistic pair of pneumatic artificial muscles using only one-step-ahead predictions of the forward model and incorporating model uncertainty.

...read moreread less

References

Journal Article•10.1162/NECO.1997.9.8.1735

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997

- Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

99K

Journal Article•10.1109/5.18626

A tutorial on hidden Markov models and selected applications in speech recognition

Lawrence R. Rabiner

- 01 Feb 1989

TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.

...read moreread less

24.3K

•Posted Content

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

- 16 Oct 2013

- arXiv: Computation and Language

TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.

...read moreread less

22.9K

•Proceedings Article

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

John Lafferty, +2 more

- 28 Jun 2001

TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.

...read moreread less

15.4K

•Proceedings Article•10.1109/CVPR.2015.7298682

FaceNet: A Unified Embedding for Face Recognition and Clustering

Florian Schroff, +2 more

- 12 Mar 2015

- arXiv: Computer Vision and Pattern Recog...

TL;DR: FaceNet as discussed by the authors uses a deep convolutional network trained to directly optimize the embedding itself, rather than an intermediate bottleneck layer as in previous deep learning approaches, and achieves state-of-the-art face recognition performance using only 128 bytes per face.

...read moreread less

14.2K