Modeling Point Clouds With Self-Attention and Gumbel Subset Sampling

doi:10.1109/CVPR.2019.00344

Open AccessProceedings Article10.1109/CVPR.2019.00344

Modeling Point Clouds With Self-Attention and Gumbel Subset Sampling

Jiancheng Yang, +6 more

- 01 Jun 2019

- pp 3323-3332

530

TL;DR: This work develops Point Attention Transformers (PATs), using a parameter-efficient Group Shuffle Attention (GSA) to replace the costly Multi-Head Attention, and proposes an end-to-end learnable and task-agnostic sampling operation, named Gumbel Subset Sampling (GSS), to select a representative subset of input points.

Abstract: Geometric deep learning is increasingly important thanks to the popularity of 3D sensors. Inspired by the recent advances in NLP domain, the self-attention transformer is introduced to consume the point clouds. We develop Point Attention Transformers (PATs), using a parameter-efficient Group Shuffle Attention (GSA) to replace the costly Multi-Head Attention. We demonstrate its ability to process size-varying inputs, and prove its permutation equivariance. Besides, prior work uses heuristics dependence on the input data (e.g., Furthest Point Sampling) to hierarchically select subsets of input points. Thereby, we for the first time propose an end-to-end learnable and task-agnostic sampling operation, named Gumbel Subset Sampling (GSS), to select a representative subset of input points. Equipped with Gumbel-Softmax, it produces a "soft" continuous subset in training phase, and a "hard" discrete subset in test phase. By selecting representative subsets in a hierarchical fashion, the networks learn a stronger representation of the input sets with lower computation cost. Experiments on classification and segmentation benchmarks show the effectiveness and efficiency of our methods. Furthermore, we propose a novel application, to process event camera stream as point clouds, and achieve a state-of-the-art performance on DVS128 Gesture Dataset.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

10.1609/aaai.v35i2.16232

Boundary-Aware Geometric Encoding for Semantic Segmentation of Point Clouds

Jingyu Gong, +6 more

TL;DR: This paper proposes a boundary-aware geometric encoding method for 3D point cloud segmentation, incorporating a boundary prediction module and geometric convolution operation to improve feature extraction and aggregation, achieving state-of-the-art performance on ScanNet v2 and S3DIS benchmarks.

...read moreread less

Journal Article•10.1109/LRA.2021.3097268

Geometry Guided Network for Point Cloud Registration

Taewon Min, +2 more

- 14 Jul 2021

TL;DR: Geometry guided point cloud registration (G $^2$ Net) as discussed by the authors uses spherical positional encoding and geometry consistency loss to learn globally unique point features by assigning global geometric positional information into irregular 3D points.

...read moreread less

•Journal Article•10.1109/TIP.2020.3019925

Point2SpatialCapsule: Aggregating Features and Spatial Relationships of Local Regions on Point Clouds Using Spatial-Aware Capsules

Xin Wen, +3 more

- 07 Sep 2020

- IEEE Transactions on Image Processing

TL;DR: A novel deep learning network for aggregating features and spatial relationships of local regions on point clouds, which aims to learn more discriminative shape representation and outperforms the state-of-the-art methods in the 3D shape classification, retrieval and segmentation tasks under the well-known ModelNet and ShapeNet datasets.

...read moreread less

Journal Article•10.1109/tmm.2023.3304896

Robust Geometry-Dependent Attack for 3D Point Clouds

Dai Zong Liu, +2 more

- IEEE Transactions on Multimedia

TL;DR: A novel Geometry-Dependent Attack (GDA), which aims to generate more robust adversarial point clouds with lower perturbation costs by capturing and preserving the geometry-guided topology information.

...read moreread less

Journal Article•10.1007/s11831-024-10108-4

The Applications of 3D Input Data and Scalability Element by Transformer Based Methods: A Review

Abubakar Sulaiman Gezawa, +3 more

- 23 Apr 2024

- Archives of Computational Methods in Eng...

...

Expand

References

•Proceedings Article•10.1109/CVPR.2016.90

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

198.7K

•Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

138.5K

•Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- 12 Jun 2017

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

94.2K

•Posted Content

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 22 Dec 2014

- arXiv: Learning

TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.

...read moreread less

82.5K

•Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018

- arXiv: Computation and Language

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

81.7K

...

Expand

Modeling Point Clouds With Self-Attention and Gumbel Subset Sampling

Chat with Paper

AI Agents for this Paper

Citations

Boundary-Aware Geometric Encoding for Semantic Segmentation of Point Clouds

Geometry Guided Network for Point Cloud Registration

Point2SpatialCapsule: Aggregating Features and Spatial Relationships of Local Regions on Point Clouds Using Spatial-Aware Capsules

Robust Geometry-Dependent Attack for 3D Point Clouds

The Applications of 3D Input Data and Scalability Element by Transformer Based Methods: A Review

References

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization

Attention is All you Need

Adam: A Method for Stochastic Optimization

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Related Papers (5)

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

Dynamic Graph CNN for Learning on Point Clouds

3D ShapeNets: A deep representation for volumetric shapes

PointConv: Deep Convolutional Networks on 3D Point Clouds