Preprint10.48550/arxiv.2406.05059
GenHeld: Generating and Editing Handheld Objects
Chun-jia Min,Srinath Sridhar +1 more
- 07 Jun 2024
TL;DR: GenHeld generates and edits handheld objects from 3D hand models or 2D images. It selects objects based on hand model or image, positions and orientates them for a plausible grasp, and edits images to add or replace held objects.
read more
Abstract: Grasping is an important human activity that has long been studied in robotics, computer vision, and cognitive science. Most existing works study grasping from the perspective of synthesizing hand poses conditioned on 3D or 2D object representations. We propose GenHeld to address the inverse problem of synthesizing held objects conditioned on 3D hand model or 2D image. Given a 3D model of hand, GenHeld 3D can select a plausible held object from a large dataset using compact object representations called object codes.The selected object is then positioned and oriented to form a plausible grasp without changing hand pose. If only a 2D hand image is available, GenHeld 2D can edit this image to add or replace a held object. GenHeld 2D operates by combining the abilities of GenHeld 3D with diffusion-based image editing. Results and experiments show that we outperform baselines and can generate plausible held objects in both 2D and 3D. Our experiments demonstrate that our method achieves high quality and plausibility of held object synthesis in both 3D and 2D.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Figure 5: GenHeld2D enables us to add or replace held objects to 2D hand images. We do this by first lifting hand images to 3D hand and object using GenHeld3D. This is then following by 2D keypoint projection and alignment to create a 3D guidance image that is used to edit the image. ![Figure 13: Our Object Shape code can also work with the YCB [10] objects.](/figures/figure13-1-6wkcifv5f3i9.png)
Figure 13: Our Object Shape code can also work with the YCB [10] objects. ![Figure 6: Ablation study of the Object Selection Network with YCB [10] objects. We observe that object selection is an important step to ensure the plausibility of generated held objects.](/figures/figure6-1-rx64z2hlfjh7.png)
Figure 6: Ablation study of the Object Selection Network with YCB [10] objects. We observe that object selection is an important step to ensure the plausibility of generated held objects. 
Table 1: Grasping quality comparison. GenHeld3D outperforms existing work. 
Figure 15: Non-grasping hand pose rejection algorithm (Sec. C). If the radius of maximum the inscripted ball is below a threshold, we reject that hand pose. 
Figure 14: Details of the Object Selection network.
References
•Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
- 12 Jun 2017
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Principal Component Analysis.
Heng Tao Shen
- 01 Jan 2009
TL;DR: The paper focuses on the use of principal component analysis in typical chemometric areas but the results are generally applicable.
15.8K
•Posted Content
Denoising Diffusion Probabilistic Models
TL;DR: High quality image synthesis results are presented using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics, which naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding.
•Posted Content
GANs Trained by a Two Time-Scale Update Rule Converge to a Nash Equilibrium
Martin Heusel,Hubert Ramsauer,Thomas Unterthiner,Bernhard Nessler,Günter Klambauer,Sepp Hochreiter +5 more
TL;DR: In this article, a two time-scale update rule (TTUR) was proposed for training GANs with stochastic gradient descent on arbitrary GAN loss functions, which has an individual learning rate for both the discriminator and the generator.
9.2K
•Posted Content
ShapeNet: An Information-Rich 3D Model Repository
Angel X. Chang,Thomas Funkhouser,Leonidas J. Guibas,Pat Hanrahan,Qixing Huang,Zimo Li,Silvio Savarese,Manolis Savva,Shuran Song,Hao Su,Jianxiong Xiao,Li Yi,Fisher Yu +12 more
TL;DR: ShapeNet contains 3D models from a multitude of semantic categories and organizes them under the WordNet taxonomy, a collection of datasets providing many semantic annotations for each 3D model such as consistent rigid alignments, parts and bilateral symmetry planes, physical sizes, keywords, as well as other planned annotations.