Implicit Autoencoder for Point Cloud Self-supervised Representation Learning

Implicit Autoencoder for Point Cloud Self-supervised Representation Learning

- 03 Jan 2022

53

TL;DR: Implicit Autoencoder (IAE) is introduced, a simple yet effective method that addresses the challenge of autoencoding on point clouds by replacing the point cloud decoder with an implicit decoder that outputs a continuous representation that is shared among different point cloud sampling of the same model.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Table 1: Classification results on ScanObjectNN and ModelNet40 datasets. The model parameters number (#Params), GFLOPS, and overall accuracy (%) are reported. The FULL section presents the results of fine-tuning the pre-trained models. The LINEAR section reports the results of training a linear SVM. To establish a fair comparison, we present the performance of IAE using two distinct architectures: DGCNN and Transformer-like Point-M2AE. These architectures are commonly adopted in prior methods. The methods within the same block share the same architecture. ‘w/o mesh’ indicates that pre-training does not involve mesh data.

Figure 1: The Sampling Variation Problem. Given a continuous 3D shape, there are infinitely many ways to sample a point cloud. The proposed Implicit AutoEncoder (IAE) learns a latent representation of the true 3D geometry independent of the specific discrete sampling process. By alleviating the sampling variation problem, IAE improves existing point-cloud self-supervised representation learning methods in various downstream tasks.

Table 3: Semantic Segmentation Results on S3DIS [3] 6-Fold. We show Overall Accuracy (OA) and mean Intersection over Union (mIoU) across six folds.

Table 2: 3D Object Detection Results. We fine-tune our pre-trained model on ScanNetV2 [11] and SUN-RGBD [56] validation sets using VoteNet [45] and CAGroup3D [64]. We show mean Average Precision (mAP) across all semantic classes with 3D IoU thresholds of 0.25 and 0.5. Methods in the second section denote self-supervised methods.

Table 4: Cross-Domain Generalizability between ShapeNet [5] and ScanNet [11]. For the 3D object detection task, we report mAP at IoU=0.25 on the SUN RGB-D dataset [56]. For ModelNet40 [68] linear evaluation, we report the classification accuracy.

Figure 3: Label efficiency training. We pre-train our model on ScanNet and then fine-tune on ScanNet and SUN RGBD separately. During fine-tuning, different percentages of labeled data are used. Our pre-training model outperforms training from scratch and achieves nearly the same result with only 60% labeled data.

Citations

Proceedings Article•10.48550/arXiv.2203.06604

Masked Autoencoders for Point Cloud Self-supervised Learning

Yatian Pang, +5 more

- 13 Mar 2022

TL;DR: A simple architecture entirely based on standard Transformers can surpass dedicated Transformer models from supervised learning and inspires the feasibility of applying unified architectures from languages and images to the point cloud.

...read moreread less

291

Proceedings Article•10.48550/arXiv.2203.11183

Masked Discrimination for Self-Supervised Learning on Point Clouds

Haotian Liu, +2 more

- 21 Mar 2022

TL;DR: The key idea is to represent the point cloud as discrete occupancy values, and perform simple binary classification between masked object points and sampled noise points as the proxy task, which is robust to the point sampling variance in point clouds, and facilitates learning rich representations.

...read moreread less

109

Journal Article•10.48550/arXiv.2306.09347

Segment Any Point Cloud Sequences by Distilling Vision Foundation Models

You-Chen Liu, +7 more

- 15 Jun 2023

- arXiv.org

TL;DR: Seal as mentioned in this paper is a novel framework that harnesses VFMs for segmenting diverse automotive point cloud sequences, and it achieves significant performance gains over existing methods across 20 different few-shot fine-tuning tasks on all eleven tested point cloud datasets.

...read moreread less

42

Journal Article•10.48550/arXiv.2301.00157

Ponder: Point Cloud Pre-training via Neural Rendering

Di Huang, +4 more

- 31 Dec 2022

- arXiv.org

TL;DR: In this paper , a self-supervised learning of point cloud representations by differentiable neural rendering is proposed, motivated by the fact that informative point cloud features should be able to encode rich geometry and appearance cues and render realistic images.

...read moreread less

29

Journal Article•10.48550/arxiv.2310.08586

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

Haoyi Zhu, +10 more

- 12 Oct 2023

- arXiv.org

TL;DR: A novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation by differentiable neural rendering is introduced, thereby establishing a pathway to 3D foundational models.

...read moreread less

22

...

Expand

References

•Journal Article

Visualizing Data using t-SNE

Laurens van der Maaten, +1 more

- 01 Jan 2008

- Journal of Machine Learning Research

TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.

...read moreread less

45.8K

Proceedings Article•10.1109/CVPR.2012.6248074

Are we ready for autonomous driving? The KITTI vision benchmark suite

Andreas Geiger, +2 more

- 16 Jun 2012

TL;DR: The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world.

...read moreread less

16.3K

•Posted Content

A Simple Framework for Contrastive Learning of Visual Representations

Ting Chen, +3 more

- 13 Feb 2020

- arXiv: Learning

TL;DR: It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.

...read moreread less

16.3K

•Proceedings Article•10.1109/CVPR.2017.16

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

R. Qi Charles, +3 more

- 21 Jul 2017

TL;DR: This paper designs a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input and provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing.

...read moreread less

15.7K

•Journal Article•10.1109/TPAMI.2013.50

Representation Learning: A Review and New Perspectives

Yoshua Bengio, +2 more

- 01 Aug 2013

- IEEE Transactions on Pattern Analysis an...

TL;DR: Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.

...read moreread less

14.3K

...

Expand