Motion2Vec: Semi-Supervised Representation Learning from Surgical Videos
Ajay Kumar Tanwani,Pierre Sermanet,Andy Yan,Raghav Anand,Mariano Phielipp,Ken Goldberg +5 more
- 31 May 2020
pp 2174-2181
45
TL;DR: This paper learns a motion-centric representation of surgical video demonstrations by grouping them into action segments/subgoals/options in a semi-supervised manner and demonstrates the use of this representation to imitate surgical suturing kinematic motions from publicly available videos of the JIGSAWS dataset.
read more
Abstract: Learning meaningful visual representations in an embedding space can facilitate generalization in downstream tasks such as action segmentation and imitation. In this paper, we learn a motion-centric representation of surgical video demonstrations by grouping them into action segments/subgoals/options in a semi-supervised manner. We present Motion2Vec, an algorithm that learns a deep embedding feature space from video observations by minimizing a metric learning loss in a Siamese network: images from the same action segment are pulled together while pushed away from randomly sampled images of other segments, while respecting the temporal ordering of the images. The embeddings are iteratively segmented with a recurrent neural network for a given parametrization of the embedding space after pre-training the Siamese network. We only use a small set of labeled video segments to semantically align the embedding space and assign pseudo-labels to the remaining unlabeled data by inference on the learned model parameters. We demonstrate the use of this representation to imitate surgical suturing kinematic motions from publicly available videos of the JIGSAWS dataset. Results give 85.5% segmentation accuracy on average suggesting performance improvement over several state-of-the-art baselines, while kinematic pose imitation gives 0.94 centimeter error in position per observation on the test set. Videos, code and data are available at: https://sites.google.com/view/motion2vec
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Autonomous robotic laparoscopic surgery for intestinal anastomosis
Hamed Saeidi,Justin D. Opfermann,Michael Kam,Shuju Wei,Simon Leonard,Michael H. Hsieh,J. Kang,Axel Krieger +7 more
TL;DR: An enhanced autonomous strategy allows laparoscopic soft tissue surgery in end-to-end anastomosis of the small bowel and demonstrates that surgical robots exhibiting high levels of autonomy have the potential to improve consistency, patient outcomes, and access to a standard surgical technique.
219
Gesture Recognition in Robotic Surgery: A Review
TL;DR: In this article, the state-of-the-art in methods for automatic recognition of fine-grained gestures in robotic surgery focusing on recent data-driven approaches and outlines the open questions and future research directions.
A systematic review on artificial intelligence in robot-assisted surgery.
Andrea Moglia,Konstantinos Georgiou,Evangelos Georgiou,Richard M. Satava,Alfred Cuschieri,Alfred Cuschieri +5 more
TL;DR: In this article, a literature search was conducted on PubMed, Web of Science, Scopus, and IEEExplore according to PRISMA 2020 statement to identify and discuss current limitations and challenges.
90
Efficiently Calibrating Cable-Driven Surgical Robots with RGBD Fiducial Sensing and Recurrent Neural Networks
Minho Hwang,Brijen Thananjeyan,Samuel Paradis,Daniel Seita,Jeffrey Ichnowski,Danyal Fer,Thomas P. Low,Ken Goldberg +7 more
TL;DR: A novel approach to efficiently calibrate robotic surgical assistants by placing a 3D printed fiducial coordinate frames on the arm and end-effector that is tracked using RGBD sensing and considering 13 approaches to modeling to measure the coupling and history-dependent effects between joints.
34
Efficiently Calibrating Cable-Driven Surgical Robots With RGBD Fiducial Sensing and Recurrent Neural Networks
Minho Hwang,Brijen Thananjeyan,Samuel Paradis,Daniel Seita,Jeffrey Ichnowski,Danyal Fer,Thomas P. Low,Ken Goldberg +7 more
- 21 Jul 2020
TL;DR: In this article, a 3D printed fiducial coordinate frame is placed on the arm and end-effector that is tracked using RGBD sensing to measure the coupling and history-dependent effects between joints.
33
References
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
99K
Generative Adversarial Nets
Ian Goodfellow,Jean Pouget-Abadie,Mehdi Mirza,Bing Xu,David Warde-Farley,Sherjil Ozair,Aaron Courville,Yoshua Bengio +7 more
- 08 Dec 2014
TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
A tutorial on hidden Markov models and selected applications in speech recognition
Lawrence R. Rabiner
- 01 Feb 1989
TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.
•Proceedings Article
Distributed Representations of Words and Phrases and their Compositionality
Tomas Mikolov,Ilya Sutskever,Kai Chen,Greg S. Corrado,Jeffrey Dean +4 more
- 05 Dec 2013
TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.