Shelf-Supervised Mesh Prediction in the Wild
Yufei Ye,Shubham Tulsiani,Abhinav Gupta +2 more
- 11 Feb 2021
- pp 8843-8852
TL;DR: In this paper, a learning-based approach that can train from unstructured image collections, supervised by only segmentation outputs from off-the-shelf recognition systems (i.e., "shelf-supervised") is proposed.
read more
Abstract: We aim to infer 3D shape and pose of object from a single image and propose a learning-based approach that can train from unstructured image collections, supervised by only segmentation outputs from off-the-shelf recognition systems (i.e. ‘shelf-supervised’). We first infer a volumetric representation in a canonical frame, along with the camera pose. We enforce the representation geometrically consistent with both appearance and masks, and also that the synthesized novel views are indistinguishable from image collections. The coarse volumetric prediction is then converted to a mesh-based representation, which is further refined in the predicted camera frame. These two steps allow both shape-pose factorization from image collections and per-instance reconstruction in finer details. We examine the method on both synthetic and the real-world datasets and demonstrate its scalability on 50 categories in the wild, an order of magnitude more classes than existing works.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A Metaverse: Taxonomy, Components, Applications, and Open Challenges
01 Jan 2022
TL;DR: In this article , the authors divide the concepts and essential techniques necessary for realizing the Metaverse into three components (i.e., hardware, software, and contents) rather than marketing or hardware approach to conduct a comprehensive analysis.
RealFusion 360° Reconstruction of Any Object from a Single Image
Luke Melas-Kyriazi,Iro Laina,Christian Rupprecht,Andrea Vedaldi +3 more
- 21 Feb 2023
TL;DR: This work takes an off-the-self conditional image generator based on diffusion and engineer a prompt that encourages it to “dream up” novel views of the object, and fuse the given input view, the conditional prior, and other regularizers into a final, consistent reconstruction.
248
SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction
TL;DR: SparseFusion as discussed by the authors distills a 3D consistent scene representation from a view-conditioned latent diffusion model, which is then used to recover a plausible 3D representation.
118
EpiGRAF: Rethinking training of 3D GANs
Ivan Sergeevich Skorokhodov,Sergey Tulyakov,Yiqun Wang,Peter Wonka +3 more
- 21 Jun 2022
TL;DR: EpiGRAF as discussed by the authors proposes a patch sampling strategy based on an annealed beta distribution to stabilize training and accelerate the convergence of a high-resolution 3D generator with SotA image quality.
SparseFusion: Distilling View-Conditioned Diffusion for 3D Reconstruction
Zhizhuo Zhou,Shubham Tulsiani +1 more
- 01 Jun 2023
TL;DR: SparseFusion unifies recent advances in neural rendering and probabilistic image generation to generate accurate and realistic 3D representations from sparse views.
80
References
•Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
- 01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
138.5K
•Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton +2 more
- 03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
ImageNet: A large-scale hierarchical image database
Jia Deng,Wei Dong,Richard Socher,Li-Jia Li,Kai Li,Li Fei-Fei +5 more
- 20 Jun 2009
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Microsoft COCO: Common Objects in Context
Tsung-Yi Lin,Michael Maire,Serge Belongie,James Hays,Pietro Perona,Deva Ramanan,Piotr Dollár,C. Lawrence Zitnick +7 more
- 06 Sep 2014
TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Generative Adversarial Nets
Ian Goodfellow,Jean Pouget-Abadie,Mehdi Mirza,Bing Xu,David Warde-Farley,Sherjil Ozair,Aaron Courville,Yoshua Bengio +7 more
- 08 Dec 2014
TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.