SibNet: Sibling Convolutional Encoder for Video Captioning

doi:10.1145/3240508.3240667

Proceedings Article10.1145/3240508.3240667

SibNet: Sibling Convolutional Encoder for Video Captioning

Sheng Liu, +2 more

- 15 Oct 2018

- pp 1425-1434

119

TL;DR: This work introduces a novel Sibling Convolutional Encoder (SibNet) for video captioning, which utilizes a two-branch architecture to collaboratively encode videos.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.1109/CVPR42600.2020.01329

Object Relational Graph With Teacher-Recommended Learning for Video Captioning

Ziqi Zhang, +6 more

- 14 Jun 2020

TL;DR: Zhang et al. as mentioned in this paper proposed an object relational graph (ORG) based encoder, which captures more detailed interaction features to enrich visual representation, and designed a teacher-recommended learning (TRL) method to make full use of the successful external language model (ELM) to integrate the abundant linguistic knowledge into the caption model.

...read moreread less

410

Journal Article•10.48550/arXiv.2205.14100

GIT: A Generative Image-to-text Transformer for Vision and Language

Jianfeng Wang, +8 more

- 27 May 2022

TL;DR: This paper designs and train a GIT to unify vision-language tasks such as image/video captioning and question answering and presents a new scheme of generation-based image classiﬁcation and scene text recognition, achieving decent performance on standard benchmarks.

...read moreread less

340

•Proceedings Article•10.1109/ICCV.2019.00273

Controllable Video Captioning With POS Sequence Guidance Based on Gated Fusion Network

Bairui Wang, +5 more

- 01 Oct 2019

TL;DR: A gating strategy is proposed to dynamically and adaptively incorporate the global syntactic POS information into the decoder for generating each word, which not only boosts the video captioning performance but also improves the diversity of the generated captions.

...read moreread less

183

•Journal Article•10.1609/AAAI.V34I07.6918

Heuristic Black-Box Adversarial Attacks on Video Recognition Models

Zhipeng Wei, +6 more

- 03 Apr 2020

TL;DR: A heuristic black-box adversarial attack model that generates adversarial perturbations only on the selected frames and regions is proposed that can significantly reduce the computation cost and lead to more than 28% reduction in query numbers for the untargeted attack on both datasets.

...read moreread less

141

•Proceedings Article•10.1145/3343031.3351088

Black-box Adversarial Attacks on Video Recognition Models

Linxi Jiang, +4 more

- 15 Oct 2019

TL;DR: In this paper, the authors proposed the first black-box video attack framework, called V-BAD, which is equivalent to estimating the projection of the adversarial gradient on a selected subspace.

...read moreread less

137

...

Expand

References

•Proceedings Article•10.1109/CVPR.2016.90

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

198.7K

•Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

138.5K

Journal Article•10.1162/NECO.1997.9.8.1735

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997

- Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

99K

•Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- 12 Jun 2017

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

94.2K

•Posted Content

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 22 Dec 2014

- arXiv: Learning

TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.

...read moreread less

82.5K

...

Expand

SibNet: Sibling Convolutional Encoder for Video Captioning

Chat with Paper

AI Agents for this Paper

Citations

Object Relational Graph With Teacher-Recommended Learning for Video Captioning

GIT: A Generative Image-to-text Transformer for Vision and Language

Controllable Video Captioning With POS Sequence Guidance Based on Gated Fusion Network

Heuristic Black-Box Adversarial Attacks on Video Recognition Models

Black-box Adversarial Attacks on Video Recognition Models

References

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization

Long short-term memory

Attention is All you Need

Adam: A Method for Stochastic Optimization

Related Papers (5)

MSR-VTT: A Large Video Description Dataset for Bridging Video and Language

Describing Videos by Exploiting Temporal Structure

Collecting Highly Parallel Data for Paraphrase Evaluation

Deep Residual Learning for Image Recognition

Sequence to Sequence -- Video to Text