Implicit Autoencoder for Point Cloud Self-supervised Representation Learning
Siming Yan,Zhenpei Yang,Haoxiang Li,Li Guan,Hao Kang,Gang Hua,Qixing Huang +6 more
- 03 Jan 2022
TL;DR: Implicit Autoencoder (IAE) is introduced, a simple yet effective method that addresses the challenge of autoencoding on point clouds by replacing the point cloud decoder with an implicit decoder that outputs a continuous representation that is shared among different point cloud sampling of the same model.
read more
Abstract: This paper advocates the use of implicit surface representation in autoencoder-based self-supervised 3D representation learning. The most popular and accessible 3D representation, i.e., point clouds, involves discrete samples of the underlying continuous 3D surface. This discretization process introduces sampling variations on the 3D shape, making it challenging to develop transferable knowledge of the true 3D geometry. In the standard autoencoding paradigm, the encoder is compelled to encode not only the 3D geometry but also information on the specific discrete sampling of the 3D shape into the latent code. This is because the point cloud reconstructed by the decoder is considered unacceptable unless there is a perfect mapping between the original and the reconstructed point clouds. This paper introduces the Implicit AutoEncoder (IAE), a simple yet effective method that addresses the sampling variation issue by replacing the commonly-used point-cloud decoder with an implicit decoder. The implicit decoder reconstructs a continuous representation of the 3D shape, independent of the imperfections in the discrete samples. Extensive experiments demonstrate that the proposed IAE achieves state-of-the-art performance across various self-supervised learning benchmarks.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Table 1: Classification results on ScanObjectNN and ModelNet40 datasets. The model parameters number (#Params), GFLOPS, and overall accuracy (%) are reported. The FULL section presents the results of fine-tuning the pre-trained models. The LINEAR section reports the results of training a linear SVM. To establish a fair comparison, we present the performance of IAE using two distinct architectures: DGCNN and Transformer-like Point-M2AE. These architectures are commonly adopted in prior methods. The methods within the same block share the same architecture. ‘w/o mesh’ indicates that pre-training does not involve mesh data. 
Figure 1: The Sampling Variation Problem. Given a continuous 3D shape, there are infinitely many ways to sample a point cloud. The proposed Implicit AutoEncoder (IAE) learns a latent representation of the true 3D geometry independent of the specific discrete sampling process. By alleviating the sampling variation problem, IAE improves existing point-cloud self-supervised representation learning methods in various downstream tasks. ![Table 3: Semantic Segmentation Results on S3DIS [3] 6-Fold. We show Overall Accuracy (OA) and mean Intersection over Union (mIoU) across six folds.](/figures/table3-1-2rv5y2aw8rqp.png)
Table 3: Semantic Segmentation Results on S3DIS [3] 6-Fold. We show Overall Accuracy (OA) and mean Intersection over Union (mIoU) across six folds. ![Table 2: 3D Object Detection Results. We fine-tune our pre-trained model on ScanNetV2 [11] and SUN-RGBD [56] validation sets using VoteNet [45] and CAGroup3D [64]. We show mean Average Precision (mAP) across all semantic classes with 3D IoU thresholds of 0.25 and 0.5. Methods in the second section denote self-supervised methods.](/figures/table2-1-4d5hfg4nlj1b.png)
Table 2: 3D Object Detection Results. We fine-tune our pre-trained model on ScanNetV2 [11] and SUN-RGBD [56] validation sets using VoteNet [45] and CAGroup3D [64]. We show mean Average Precision (mAP) across all semantic classes with 3D IoU thresholds of 0.25 and 0.5. Methods in the second section denote self-supervised methods. ![Table 4: Cross-Domain Generalizability between ShapeNet [5] and ScanNet [11]. For the 3D object detection task, we report mAP at IoU=0.25 on the SUN RGB-D dataset [56]. For ModelNet40 [68] linear evaluation, we report the classification accuracy.](/figures/table4-1-4pzq4csts603.png)
Table 4: Cross-Domain Generalizability between ShapeNet [5] and ScanNet [11]. For the 3D object detection task, we report mAP at IoU=0.25 on the SUN RGB-D dataset [56]. For ModelNet40 [68] linear evaluation, we report the classification accuracy. 
Figure 3: Label efficiency training. We pre-train our model on ScanNet and then fine-tune on ScanNet and SUN RGBD separately. During fine-tuning, different percentages of labeled data are used. Our pre-training model outperforms training from scratch and achieves nearly the same result with only 60% labeled data.
Citations
Masked Autoencoders for Point Cloud Self-supervised Learning
Yatian Pang,Wenxiao Wang,Francis E. H. Tay,Wei Li,Yonghong Tian,Li Yuan +5 more
- 13 Mar 2022
TL;DR: A simple architecture entirely based on standard Transformers can surpass dedicated Transformer models from supervised learning and inspires the feasibility of applying unified architectures from languages and images to the point cloud.
Masked Discrimination for Self-Supervised Learning on Point Clouds
Haotian Liu,Mu Cai,Yong Jae Lee +2 more
- 21 Mar 2022
TL;DR: The key idea is to represent the point cloud as discrete occupancy values, and perform simple binary classification between masked object points and sampled noise points as the proxy task, which is robust to the point sampling variance in point clouds, and facilitates learning rich representations.
109
Segment Any Point Cloud Sequences by Distilling Vision Foundation Models
TL;DR: Seal as mentioned in this paper is a novel framework that harnesses VFMs for segmenting diverse automotive point cloud sequences, and it achieves significant performance gains over existing methods across 20 different few-shot fine-tuning tasks on all eleven tested point cloud datasets.
Ponder: Point Cloud Pre-training via Neural Rendering
TL;DR: In this paper , a self-supervised learning of point cloud representations by differentiable neural rendering is proposed, motivated by the fact that informative point cloud features should be able to encode rich geometry and appearance cues and render realistic images.
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
Haoyi Zhu,Honghui Yang,Xiaoyang Wu,Di Huang,Sha Zhang,Xianglong He,Tong He,Hengshuang Zhao,Chunhua Shen,Yu Qiao,Wanli Ouyang +10 more
TL;DR: A novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation by differentiable neural rendering is introduced, thereby establishing a pathway to 3D foundational models.
References
•Journal Article
Visualizing Data using t-SNE
TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Are we ready for autonomous driving? The KITTI vision benchmark suite
Andreas Geiger,Philip Lenz,Raquel Urtasun +2 more
- 16 Jun 2012
TL;DR: The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world.
•Posted Content
A Simple Framework for Contrastive Learning of Visual Representations
TL;DR: It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
R. Qi Charles,Hao Su,Mo Kaichun,Leonidas J. Guibas +3 more
- 21 Jul 2017
TL;DR: This paper designs a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input and provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing.
Representation Learning: A Review and New Perspectives
TL;DR: Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.