PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction

doi:10.48550/arxiv.2404.10620

Journal Article10.48550/arxiv.2404.10620

PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction

Sinisa Stekovic, +4 more

- 16 Apr 2024

- arXiv.org

- Vol. abs/2404.10620

TL;DR: PyTorchGeoNodes enables differentiable shape programs for 3D shape reconstruction, allowing for semantic reasoning and low memory footprint.

Abstract: We propose PyTorchGeoNodes, a differentiable module for reconstructing 3D objects from images using interpretable shape programs. In comparison to traditional CAD model retrieval methods, the use of shape programs for 3D reconstruction allows for reasoning about the semantic properties of reconstructed objects, editing, low memory footprint, etc. However, the utilization of shape programs for 3D scene understanding has been largely neglected in past works. As our main contribution, we enable gradient-based optimization by introducing a module that translates shape programs designed in Blender, for example, into efficient PyTorch code. We also provide a method that relies on PyTorchGeoNodes and is inspired by Monte Carlo Tree Search (MCTS) to jointly optimize discrete and continuous parameters of shape programs and reconstruct 3D objects for input scenes. In our experiments, we apply our algorithm to reconstruct 3D objects in the ScanNet dataset and evaluate our results against CAD model retrieval-based reconstructions. Our experiments indicate that our reconstructions match well the input scenes while enabling semantic reasoning about reconstructed objects.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Figure 1. Qualitative result on cabinet. Our search algorithm is able to recover correct parameters for ’Cabinet’ program that includes different number of shelves, existence of doors, size of cabinet, and legs parameters.

Table 4. Quantitative results on our synthetic dataset for the cabinet category. GeoCode [4] was overfitted to the data from the same distribution and therefore it reaches very high performance but it still benefits from additional refinement enabled by our PyTorchGeoNodes. Our genetic algorithm outperforms coordinate descent and performs comparably to the GeoCode baseline.

Figure 7. Qualitative results on tables. or each pair, the left image shows one of the input views. The right images in pairs show our projections of recovered shape parameters. Our recovered shape parameters are accurate, dimensions of table vary greatly in our validation set, yet we are able to reconstruct these measurements accurately. In addition, we accurately model existence and position of middle support, existence and measurements of internal cabinet, and shape of the top board.

Figure 6. Qualitative results on chairs. For each pair, the left image shows one of the input views. The right images in pairs show our projections of recovered shape parameters. They are accurate, we accurately model thickness of legs, existence position and measurements of leg supports, existence, rotations and measurements of star-shape legs, and existence and measurements of armrests.

Figure 8. Gradient-based optimization of continuous parameters of a shape program for the sofa category. From an initial estimate of the parameters of the object, we can perform gradient descent on the parameters based on a 3D geometric loss term. In contrast to methods that directly optimize the reconstructed mesh, PyTorchGeoNodes allows optimization in the parameter space which has several benefits. From the resulting shapes in this example, it is observable that individual parameters can be scaled independently targeting only specific parts of the shape geometry while preserving the compactness of the 3D shape at the same time.

Figure 5. Qualitative results on sofas. or each pair, the left image shows one of the input views. The right images in pairs show our projections of recovered shape parameters. Our results are accurate, dimensions of sofa vary greatly in our validation set, yet we are able to reconstruct these measuremenets accurately. In addition, we accurately model existence and measurements of armrests, existence and measurements of L-extensions.

References

•Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

138.5K

•Posted Content

ShapeNet: An Information-Rich 3D Model Repository

Angel X. Chang, +12 more

- 09 Dec 2015

- arXiv: Graphics

TL;DR: ShapeNet contains 3D models from a multitude of semantic categories and organizes them under the WordNet taxonomy, a collection of datasets providing many semantic annotations for each 3D model such as consistent rigid alignments, parts and bilateral symmetry planes, physical sizes, keywords, as well as other planned annotations.

...read moreread less

4.8K

•Proceedings Article•10.1109/CVPR.2017.261

ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes

Angela Dai, +5 more

- 21 Jul 2017

TL;DR: This work introduces ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations, and shows that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks.

...read moreread less

4.7K

•Journal Article•10.1109/TCIAIG.2012.2186810

A Survey of Monte Carlo Tree Search Methods

Cameron Browne, +9 more

- 03 Feb 2012

- IEEE Transactions on Computational Intel...

TL;DR: A survey of the literature to date of Monte Carlo tree search, intended to provide a snapshot of the state of the art after the first five years of MCTS research, outlines the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarizes the results from the key game and nongame domains.

...read moreread less

3.5K

•Book Chapter•10.1007/978-3-030-01252-6_4

Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images

Nanyang Wang, +5 more

- 08 Sep 2018

TL;DR: In this paper, the authors propose an end-to-end deep learning architecture that produces a 3D shape in triangular mesh from a single color image by progressively deforming an ellipsoid, leveraging perceptual features extracted from the input image.

...read moreread less

1.6K

...

Expand