TL;DR: This paper proposes EWS, a weakly supervised binary semantic image segmentation framework that uses one-pixel annotations to achieve competitive results with low computational costs, eliminating the need for background annotations and hyperparameter tuning.
Abstract: • Binary segmentation with sparse one-pixel annotations, even a single one per dataset. • Our method operates without requiring background annotations. • Novel contrastive loss using class-of-interest one-pixel annotations. • Dynamic contrastive loss hyperparameter computation based on image features. Despite recent advancements, Unsupervised Semantic Segmentation (USS) methods still exhibit a significant performance deficit compared to supervised approaches, particularly in binary semantic segmentation. This limitation arises because, without supervision, USS methods struggle to distinguish foreground from background image regions, particularly when the foreground contains small or uncommon objects. This issue is addressed by our proposed Extremely Weakly Supervised Binary Semantic Segmentation (EWS) framework. EWS expects minimal supervision, consisting only of a small set of one-pixel annotations explicitly belonging to the foreground class across the entire image dataset. Our approach leverages these one-pixel annotations and employs two contrastive losses to map visual transformer features into well-separated foreground and background feature clusters. Additionally, we propose a novel loss function to eliminate the need for hyperparameter tuning of the contrastive loss threshold, by dynamically computing it based on the similarity between the input image features. Even if we employ a single one-pixel annotation, EWS achieves competitive results in binary segmentation tasks while maintaining low computational costs, making it an efficient solution for critical segmentation applications. GitHub Repo: https://github.com/matJTzimas/EWS
TL;DR: OctMamba proposes a unified framework for point cloud geometry compression, jointly modeling spatial, channel, and topological redundancies with linear complexity, outperforming baselines and achieving state-of-the-art performance on LiDAR and dynamic human point cloud benchmarks.
Abstract: • Jointly models spatial, channel, and topological redundancies, moving beyond conventional spatial-only designs. • Embedding Mamba layers locally within specialized subcomponents instead of as a global backbone, enabling structured context modeling. • Achieves efficient long-range modeling with linear complexity, yielding a smaller model and faster decoding while outperforming baselines. Existing learned point cloud compression frameworks face two major limitations: (1) they focus almost exclusively on spatial redundancy and (2) rely on architectures built around local-global transformers or global Mamba blocks. Transformers incur quadratic complexity, while global Mamba lacks the granularity to capture structured correlations across multiple dimensions. We propose OctMamba, the first unified framework to jointly exploit spatial, channel, and topological redundancies, dimensions previously overlooked in point cloud geometry compression. Our approach introduces a new architectural principle: embedding Mamba modules within specialized subcomponents rather than applying them globally, challenging existing design paradigms. OctMamba combines two modules: Spatial-Channel Coupled Grouping Mamba (SCCGM) for spatial-channel fusion and Local Graph CNN-Mamba (LGCM) for topological encoding. This design enables efficient long-range modeling with linear complexity, delivering a smaller model and faster decoding while outperforming transformer-based and global Mamba baselines. On SemanticKITTI, OctMamba reduces bitrate by 60.2% over GPCC (D1 PSNR) and achieves state-of-the-art performance across LiDAR and dynamic human point cloud benchmarks with practical speed and scalability. By introducing multi-dimensional redundancy modeling, OctMamba has the potential to influence future research on efficient point cloud compression. Source code will be released.