Deep Video Codec Control

doi:10.48550/arxiv.2308.16215

Preprint10.48550/arxiv.2308.16215

Deep Video Codec Control

Christoph Reich, +4 more

- 01 Jan 2023

1

TL;DR: Standard video codecs designed for minimizing video distortion w.r.t. human quality assessment significantly degrade deep vision model performance. This paper presents the first end-to-end learnable deep video codec control that considers both bandwidth constraints and downstream deep vision performance.

Abstract: Standardized lossy video coding is at the core of almost all real-world video processing pipelines. Rate control is used to enable standard codecs to adapt to different network bandwidth conditions or storage constraints. However, standard video codecs (e.g., H.264) and their rate control modules aim to minimize video distortion w.r.t human quality assessment. We demonstrate empirically that standard-coded videos vastly deteriorate the performance of deep vision models. To overcome the deterioration of vision performance, this paper presents the first end-to-end learnable deep video codec control that considers both bandwidth constraints and downstream deep vision performance, while adhering to existing standardization. We demonstrate that our approach better preserves downstream deep vision performance than traditional approaches.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

10.1109/cvprw63382.2024.00580

A Perspective on Deep Vision Performance with Standard Image and Video Codecs

Christoph Reich, +4 more

TL;DR: This study examines the impact of standardized image and video codecs (JPEG, H.264) on deep vision performance, finding significant accuracy deterioration across various tasks, including semantic segmentation, localization, and dense prediction, with compression rates reducing accuracy by up to 80%.

...read moreread less

References

•Posted Content

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 10 Dec 2015

- arXiv: Computer Vision and Pattern Recog...

TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

...read moreread less

117.9K

•Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- 12 Jun 2017

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

94.2K

•Proceedings Article

Rectified Linear Units Improve Restricted Boltzmann Machines

Vinod Nair, +1 more

- 21 Jun 2010

TL;DR: Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.

...read moreread less

18.4K

•Posted Content

Rethinking Atrous Convolution for Semantic Image Segmentation

Liang-Chieh Chen, +3 more

- 17 Jun 2017

- arXiv: Computer Vision and Pattern Recog...

TL;DR: The proposed `DeepLabv3' system significantly improves over the previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.

...read moreread less

9.9K

•Journal Article•10.1109/TCSVT.2012.2221191

Overview of the High Efficiency Video Coding (HEVC) Standard

Gary J. Sullivan, +3 more

- 01 Dec 2012

- IEEE Transactions on Circuits and System...

TL;DR: The main goal of the HEVC standardization effort is to enable significantly improved compression performance relative to existing standards-in the range of 50% bit-rate reduction for equal perceptual video quality.

...read moreread less

9K

...

Expand