GenHeld: Generating and Editing Handheld Objects

doi:10.48550/arxiv.2406.05059

Preprint10.48550/arxiv.2406.05059

GenHeld: Generating and Editing Handheld Objects

Chun-jia Min, +1 more

- 07 Jun 2024

TL;DR: GenHeld generates and edits handheld objects from 3D hand models or 2D images. It selects objects based on hand model or image, positions and orientates them for a plausible grasp, and edits images to add or replace held objects.

Abstract: Grasping is an important human activity that has long been studied in robotics, computer vision, and cognitive science. Most existing works study grasping from the perspective of synthesizing hand poses conditioned on 3D or 2D object representations. We propose GenHeld to address the inverse problem of synthesizing held objects conditioned on 3D hand model or 2D image. Given a 3D model of hand, GenHeld 3D can select a plausible held object from a large dataset using compact object representations called object codes.The selected object is then positioned and oriented to form a plausible grasp without changing hand pose. If only a 2D hand image is available, GenHeld 2D can edit this image to add or replace a held object. GenHeld 2D operates by combining the abilities of GenHeld 3D with diffusion-based image editing. Results and experiments show that we outperform baselines and can generate plausible held objects in both 2D and 3D. Our experiments demonstrate that our method achieves high quality and plausibility of held object synthesis in both 3D and 2D.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Figure 5: GenHeld2D enables us to add or replace held objects to 2D hand images. We do this by first lifting hand images to 3D hand and object using GenHeld3D. This is then following by 2D keypoint projection and alignment to create a 3D guidance image that is used to edit the image.

Figure 13: Our Object Shape code can also work with the YCB [10] objects.

Figure 6: Ablation study of the Object Selection Network with YCB [10] objects. We observe that object selection is an important step to ensure the plausibility of generated held objects.

Table 1: Grasping quality comparison. GenHeld3D outperforms existing work.

Figure 15: Non-grasping hand pose rejection algorithm (Sec. C). If the radius of maximum the inscripted ball is below a threshold, we reject that hand pose.

Figure 14: Details of the Object Selection network.

References

•Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- 12 Jun 2017

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

94.2K

Principal Component Analysis.

Heng Tao Shen

- 01 Jan 2009

TL;DR: The paper focuses on the use of principal component analysis in typical chemometric areas but the results are generally applicable.

...read moreread less

15.8K

•Posted Content

Denoising Diffusion Probabilistic Models

Jonathan Ho, +2 more

- 19 Jun 2020

- arXiv: Learning

TL;DR: High quality image synthesis results are presented using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics, which naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding.

...read moreread less

11.7K

•Posted Content

GANs Trained by a Two Time-Scale Update Rule Converge to a Nash Equilibrium

Martin Heusel, +5 more

- 26 Jun 2017

- arXiv: Learning

TL;DR: In this article, a two time-scale update rule (TTUR) was proposed for training GANs with stochastic gradient descent on arbitrary GAN loss functions, which has an individual learning rate for both the discriminator and the generator.

...read moreread less

9.2K

•Posted Content

ShapeNet: An Information-Rich 3D Model Repository

Angel X. Chang, +12 more

- 09 Dec 2015

- arXiv: Graphics

TL;DR: ShapeNet contains 3D models from a multitude of semantic categories and organizes them under the WordNet taxonomy, a collection of datasets providing many semantic annotations for each 3D model such as consistent rigid alignments, parts and bilateral symmetry planes, physical sizes, keywords, as well as other planned annotations.

...read moreread less

4.8K

...

Expand