Journal Article10.1109/TIP.2020.2966082
A Multimodal Saliency Model for Videos With High Audio-Visual Correspondence
203
TL;DR: The proposed MMS model has captured the influence of audio, which is not considered in the latest deep learning based saliency models, and it is found that an average of 5% performance gain is obtained.
read more
Abstract: Audio information has been bypassed by most of current visual attention prediction studies. However, sound could have influence on visual attention and such influence has been widely investigated and proofed by many psychological studies. In this paper, we propose a novel multi-modal saliency (MMS) model for videos containing scenes with high audio-visual correspondence. In such scenes, humans tend to be attracted by the sound sources and it is also possible to localize the sound sources via cross-modal analysis. Specifically, we first detect the spatial and temporal saliency maps from the visual modality by using a novel free energy principle. Then we propose to detect the audio saliency map from both audio and visual modalities by localizing the moving-sounding objects using cross-modal kernel canonical correlation analysis, which is first of its kind in the literature. Finally we propose a new two-stage adaptive audiovisual saliency fusion method to integrate the spatial, temporal and audio saliency maps to our audio-visual saliency map. The proposed MMS model has captured the influence of audio, which is not considered in the latest deep learning based saliency models. To take advantages of both deep saliency modeling and audio-visual saliency modeling, we propose to combine deep saliency models and the MMS model via a later fusion, and we find that an average of 5% performance gain is obtained. Experimental results on audio-visual attention databases show that the introduced models incorporating audio cues have significant superiority over state-of-the-art image and video saliency models which utilize a single visual modality.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Perceptual image quality assessment: a survey
Guangtao Zhai,Xiongkuo Min +1 more
TL;DR: This survey provides a general overview of classical algorithms and recent progresses in the field of perceptual image quality assessment and describes the performances of the state-of-the-art quality measures for visual signals.
556
Enhanced YOLO v3 Tiny Network for Real-Time Ship Detection From Visual Image
TL;DR: Wang et al. as discussed by the authors proposed an enhanced YOLO v3 tiny network for real-time ship detection, which can be used in video surveillance to realize the accurate classification and positioning of six types of ships (including ore carrier, bulk cargo carrier, general cargo ship, container ship, fishing boat, and passenger ship).
120
Perceptual Quality Assessment of Low-light Image Enhancement
TL;DR: In this paper, low-light image enhancement algorithms (LIEA) can light up images captured in dark or back-lighting conditions, however, LIEA may introduce various distortions such as structure damage, color shift, etc.
74
Multimodality in VR: A Survey
31 Jan 2022
TL;DR: A survey of multimodal experiences in VR can be found in this paper , where the authors review the body of work addressing multimodality in VR and its role and benefits in user experience.
RIHOOP: Robust Invisible Hyperlinks in Offline and Online Photographs.
TL;DR: Li et al. as discussed by the authors proposed an end-to-end neural network with an encoder to hide messages and a decoder to extract messages, which can make the hyperlinks invisible for human eyes but detectable for mobile devices equipped with a camera.
67
References
Static and space-time visual saliency detection by self-resemblance.
Hae Jong Seo,Peyman Milanfar +1 more
TL;DR: A novel unified framework for both static and space-time saliency detection, which results in a saliency map where each pixel indicates the statistical likelihood of saliency of a feature matrix given its surrounding feature matrices.
746
Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model.
TL;DR: Zhang et al. as mentioned in this paper proposed a convolutional long short-term memory (LSTM) network to iteratively refine the predicted saliency map by focusing on the most salient regions of the input image.
669
Using free energy principle for blind image quality assessment
TL;DR: A new no-reference (NR) image quality assessment (IQA) metric is proposed using the recently revealed free-energy-based brain theory and classical human visual system (HVS)-inspired features to predict an image that the HVS perceives from a distorted image based on the free energy theory.
653
Saliency Detection: A Boolean Map Approach
Jianming Zhang,Stan Sclaroff +1 more
- 01 Dec 2013
TL;DR: A novel Boolean Map based Saliency model, based on a Gestalt principle of figure-ground segregation, that consistently achieves state-of-the-art performance compared with ten leading methods on five eye tracking datasets.
Visual Saliency Based on Scale-Space Analysis in the Frequency Domain
TL;DR: A new bottom-up paradigm for detecting visual saliency is proposed, characterized by a scale-space analysis of the amplitude spectrum of natural images, and it is shown that the convolution of the image amplitude spectrum with a low-pass Gaussian kernel of an appropriate scale is equivalent to an image saliency detector.
625
Related Papers (5)
Hamed R. Tavakoli,Ali Borji,Juho Kannala,Esa Rahtu +3 more
- 02 Jun 2020
Xiongkuo Min,Guangtao Zhai,Chunjia Hu,Ke Gu +3 more
- 01 Dec 2015
Antoine Coutrot,Nathalie Guyader +1 more
- 01 Jan 2016