Proceedings Article10.1145/3503161.3547922
Deep Evidential Learning with Noisy Correspondence for Cross-modal Retrieval
10 Oct 2022
43
TL;DR: Yang et al. as discussed by the authors proposed a generalized Deep Evidential Cross-modal Learning framework (DECL), which integrates a novel Cross-Modal Evidential Learning paradigm (CEL) and a Robust Dynamic Hinge loss (RDH) with positive and negative learning.
read more
Abstract: Cross-modal retrieval has been a compelling topic in the multimodal community. Recently, to mitigate the high cost of data collection, the co-occurred pairs (e.g., image and text) could be collected from the Internet as a large-scaled cross-modal dataset, e.g., Conceptual Captions. However, it will unavoidably introduce noise (i.e., mismatched pairs) into training data, dubbed noisy correspondence. Unquestionably, such noise will make supervision information unreliable/uncertain and remarkably degrade the performance. Besides, most existing methods focus training on hard negatives, which will amplify the unreliability of noise. To address the issues, we propose a generalized Deep Evidential Cross-modal Learning framework (DECL), which integrates a novel Cross-modal Evidential Learning paradigm (CEL) and a Robust Dynamic Hinge loss (RDH) with positive and negative learning. CEL could capture and learn the uncertainty brought by noise to improve the robustness and reliability of cross-modal retrieval. Specifically, the bidirectional evidence based on cross-modal similarity is first modeled and parameterized into the Dirichlet distribution, which not only provides accurate uncertainty estimation but also imparts resilience to perturbations against noisy correspondence. To address the amplification problem, RDH smoothly increases the hardness of negatives focused on, thus embracing higher robustness against high noise. Extensive experiments are conducted on three image-text benchmark datasets, i.e., Flickr30K, MS-COCO, and Conceptual Captions, to verify the effectiveness and efficiency of the proposed method. The code is available at \urlhttps://github.com/QinYang79/DECL.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation
Zhuohang Dang,Minnan Luo,Chengyou Jia,Guang Dai,Xiaojun Chang,Jingdong Wang +5 more
TL;DR: Noisy correspondence learning with self-reinforcing errors mitigation framework (SREM) alleviates noisy correspondences and improves cross-modal retrieval performance by refining sample filtration and leveraging negative matches.
Bias Mitigation and Representation Optimization for Noise-Robust Cross-modal Retrieval
Yu Liu,Haipeng Chen,Guihe Qin,Jincai Song,Xun Yang +4 more
TL;DR: This paper proposes BMRO, a framework for bias mitigation and representation optimization in noise-robust cross-modal retrieval, utilizing a Bias Estimator and Adaptive Representation Optimizer to enhance accurate sample division and tailored optimization strategies for clean and noisy samples.
ROAD: Robust Unsupervised Domain Adaptation with Noisy Labels
Yanglin Feng,Hongyuan Zhu,Dezhong Peng,Xiaocui Peng,Peng Hu +4 more
- 26 Oct 2023
TL;DR: A robust unsupervised domain adaptation framework (ROAD), which prevents the network model from overfitting noisy labels to capture accurate discrimination knowledge for domain adaptation, and a Robust Adaptive Weighted Learning mechanism (RSWL) is proposed to adaptively assign weights to each sample based on its reliability to enforce the model to focus more on reliable samples and less on unreliable samples, thereby mining robust discrimination knowledge against noisy labels in the source domain.
EVIL: Evidential Inference Learning for Trustworthy Semi-Supervised Medical Image Segmentation
Yingyu Chen,Ziyuan Yang,Chenyu Shen,Zhiwen Wang,Qing Yang,Yi Zhang +5 more
- 18 Apr 2023
TL;DR: EVIL introduces a novel semi-supervised medical image segmentation framework that effectively utilizes uncertainty quantification and consistency regularization for accurate segmentation with few labeled data.
Semantic Embedding Uncertainty Learning for Image and Text Matching
Yan Wang,Yunzhi Su,Wenhui Li,Chenggang Yan,Bolun Zheng,Xuanya Li,Anjin Liu +6 more
- 01 Jul 2023
TL;DR: A novel Semantic Embedding Uncertainty Learning (SEUL) is proposed, which represents the embedding uncertainty of image and text as Gaussian distributions and simultaneously learns the salient embedding and uncertainty in the common space.
References
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.
Microsoft COCO: Common Objects in Context
Tsung-Yi Lin,Michael Maire,Serge Belongie,James Hays,Pietro Perona,Deva Ramanan,Piotr Dollár,C. Lawrence Zitnick +7 more
- 06 Sep 2014
TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Momentum Contrast for Unsupervised Visual Representation Learning
Kaiming He,Haoqi Fan,Yuxin Wu,Saining Xie,Ross Girshick +4 more
- 14 Jun 2020
TL;DR: This article proposed Momentum Contrast (MoCo) for unsupervised visual representation learning, which enables building a large and consistent dictionary on-the-fly that facilitates contrastive learning.
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
TL;DR: This work proposes to use the visual denotations of linguistic expressions to define novel denotational similarity metrics, which are shown to be at least as beneficial as distributional similarities for two tasks that require semantic inference.
Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning
Piyush Sharma,Nan Ding,Sebastian Goodman,Radu Soricut +3 more
- 01 Jul 2018
TL;DR: The Conceptual Captions dataset as discussed by the authors contains an order of magnitude more images than the MS-COCO dataset and represents a wider variety of both images and image caption styles.