Motion2Vec: Semi-Supervised Representation Learning from Surgical Videos

doi:10.1109/ICRA40945.2020.9197324

Open AccessProceedings Article10.1109/ICRA40945.2020.9197324

Motion2Vec: Semi-Supervised Representation Learning from Surgical Videos

Ajay Kumar Tanwani, +5 more

- 31 May 2020

pp 2174-2181

45

TL;DR: This paper learns a motion-centric representation of surgical video demonstrations by grouping them into action segments/subgoals/options in a semi-supervised manner and demonstrates the use of this representation to imitate surgical suturing kinematic motions from publicly available videos of the JIGSAWS dataset.

Abstract: Learning meaningful visual representations in an embedding space can facilitate generalization in downstream tasks such as action segmentation and imitation. In this paper, we learn a motion-centric representation of surgical video demonstrations by grouping them into action segments/subgoals/options in a semi-supervised manner. We present Motion2Vec, an algorithm that learns a deep embedding feature space from video observations by minimizing a metric learning loss in a Siamese network: images from the same action segment are pulled together while pushed away from randomly sampled images of other segments, while respecting the temporal ordering of the images. The embeddings are iteratively segmented with a recurrent neural network for a given parametrization of the embedding space after pre-training the Siamese network. We only use a small set of labeled video segments to semantically align the embedding space and assign pseudo-labels to the remaining unlabeled data by inference on the learned model parameters. We demonstrate the use of this representation to imitate surgical suturing kinematic motions from publicly available videos of the JIGSAWS dataset. Results give 85.5% segmentation accuracy on average suggesting performance improvement over several state-of-the-art baselines, while kinematic pose imitation gives 0.94 centimeter error in position per observation on the test set. Videos, code and data are available at: https://sites.google.com/view/motion2vec

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1126/scirobotics.abj2908

Autonomous robotic laparoscopic surgery for intestinal anastomosis

Hamed Saeidi, +7 more

- 26 Jan 2022

- Science robotics

TL;DR: An enhanced autonomous strategy allows laparoscopic soft tissue surgery in end-to-end anastomosis of the small bowel and demonstrates that surgical robots exhibiting high levels of autonomy have the potential to improve consistency, patient outcomes, and access to a standard surgical technique.

...read moreread less

219

•Journal Article•10.1109/TBME.2021.3054828

Gesture Recognition in Robotic Surgery: A Review

Beatrice van Amsterdam, +2 more

- 26 Jan 2021

- IEEE Transactions on Biomedical Engineer...

TL;DR: In this article, the state-of-the-art in methods for automatic recognition of fine-grained gestures in robotic surgery focusing on recent data-driven approaches and outlines the open questions and future research directions.

...read moreread less

121

Journal Article•10.1016/J.IJSU.2021.106151

A systematic review on artificial intelligence in robot-assisted surgery.

Andrea Moglia, +5 more

- 01 Nov 2021

- International Journal of Surgery

TL;DR: In this article, a literature search was conducted on PubMed, Web of Science, Scopus, and IEEExplore according to PRISMA 2020 statement to identify and discuss current limitations and challenges.

...read moreread less

90

•Journal Article•10.1109/LRA.2020.3010746

Efficiently Calibrating Cable-Driven Surgical Robots with RGBD Fiducial Sensing and Recurrent Neural Networks

Minho Hwang, +7 more

- 19 Mar 2020

- arXiv: Robotics

TL;DR: A novel approach to efficiently calibrate robotic surgical assistants by placing a 3D printed fiducial coordinate frames on the arm and end-effector that is tracked using RGBD sensing and considering 13 approaches to modeling to measure the coupling and history-dependent effects between joints.

...read moreread less

34

•Journal Article•10.1109/LRA.2020.3010746

Efficiently Calibrating Cable-Driven Surgical Robots With RGBD Fiducial Sensing and Recurrent Neural Networks

Minho Hwang, +7 more

- 21 Jul 2020

TL;DR: In this article, a 3D printed fiducial coordinate frame is placed on the arm and end-effector that is tracked using RGBD sensing to measure the coupling and history-dependent effects between joints.

...read moreread less

33

...

Expand

References

Journal Article•10.1162/NECO.1997.9.8.1735

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997

- Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

99K

•Journal Article•10.3156/JSOFT.29.5_177_2

Generative Adversarial Nets

Ian Goodfellow, +7 more

- 08 Dec 2014

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

48.6K

Journal Article•10.1109/5.18626

A tutorial on hidden Markov models and selected applications in speech recognition

Lawrence R. Rabiner

- 01 Feb 1989

TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.

...read moreread less

24.3K

•Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

- 05 Dec 2013

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

24.1K