Block-matching algorithm

Topic Tools

Papers published on a yearly basis

Papers

Proceedings Article•

Visual odometry

[...]

David Nister¹, Oleg Naroditsky¹, James R. Bergen¹•Institutions (1)

Sarnoff Corporation¹

1 Jan 2004

TL;DR: A system that estimates the motion of a stereo head or a single moving camera based on video input in real-time with low delay and the motion estimates are used for navigational purposes.

...read moreread less

Abstract: We present a system that estimates the motion of a stereo head or a single moving camera based on video input. The system operates in real-time with low delay and the motion estimates are used for navigational purposes. The front end of the system is a feature tracker. Point features are matched between pairs of frames and linked into image trajectories at video rate. Robust estimates of the camera motion are then produced from the feature tracks using a geometric hypothesize-and-test architecture. This generates what we call visual odometry, i.e. motion estimates from visual input alone. No prior knowledge of the scene nor the motion is necessary. The visual odometry can also be used in conjunction with information from other sources such as GPS, inertia sensors, wheel encoders, etc. The pose estimation method has been applied successfully to video from aerial, automotive and handheld platforms. We focus on results with an autonomous ground vehicle. We give examples of camera trajectories estimated purely from images over previously unseen distances and periods of time.

...read moreread less

1,916 citations

Proceedings Article•10.1109/ICCV.2015.515•

Sequence to Sequence -- Video to Text

[...]

Subhashini Venugopalan¹, Marcus Rohrbach², Jeff Donahue², Raymond J. Mooney¹, Trevor Darrell², Kate Saenko³ - Show less +2 more•Institutions (3)

University of Texas at Austin¹, University of California, Berkeley², University of Massachusetts Lowell³

7 Dec 2015

TL;DR: In this article, an end-to-end sequence to sequence model was proposed to generate captions for videos, which can learn the temporal structure of the sequence of frames as well as the sequence model of the generated sentences, i.e. a language model.

...read moreread less

Abstract: Real-world videos often have complex dynamics, methods for generating open-domain video descriptions should be senstive to temporal structure and allow both input (sequence of frames) and output (sequence of words) of variable length. To approach this problem we propose a novel end-to-end sequence-to-sequence model to generate captions for videos. For this we exploit recurrent neural networks, specifically LSTMs, which have demonstrated state-of-the-art performance in image caption generation. Our LSTM model is trained on video-sentence pairs and learns to associate a sequence of video frames to a sequence of words in order to generate a description of the event in the video clip. Our model naturally is able to learn the temporal structure of the sequence of frames as well as the sequence model of the generated sentences, i.e. a language model. We evaluate several variants of our model that exploit different visual features on a standard set of YouTube videos and two movie description datasets (M-VAD and MPII-MD).

...read moreread less

1,669 citations

Journal Article•10.1007/BF01210504•

Automatic partitioning of full-motion video

[...]

Hong-Jiang Zhang¹, Atreyi Kankanhalli¹, Stephen W. Smoliar¹•Institutions (1)

National University of Singapore¹

03 Jan 1993-Multimedia Systems

TL;DR: A twin-comparison approach has been developed to solve the problem of detecting transitions implemented by special effects, and a motion analysis algorithm is applied to determine whether an actual transition has occurred.

...read moreread less

Abstract: Partitioning a video source into meaningful segments is an important step for video indexing. We present a comprehensive study of a partitioning system that detects segment boundaries. The system is based on a set of difference metrics and it measures the content changes between video frames. A twin-comparison approach has been developed to solve the problem of detecting transitions implemented by special effects. To eliminate the false interpretation of camera movements as transitions, a motion analysis algorithm is applied to determine whether an actual transition has occurred. A technique for determining the threshold for a difference metric and a multi-pass approach to improve the computation speed and accuracy have also been developed.

...read moreread less

1,398 citations

Journal Article•10.1109/83.503915•

Extraction of high-resolution frames from video sequences

[...]

Richard R. Schultz¹, Robert L. Stevenson²•Institutions (2)

University of North Dakota¹, University of Notre Dame²

01 Jun 1996-IEEE Transactions on Image Processing

TL;DR: A novel observation model based on motion compensated subsampling is proposed for a video sequence and Bayesian restoration with a discontinuity-preserving prior image model is used to extract a high-resolution video still given a short low-resolution sequence.

...read moreread less

Abstract: The human visual system appears to be capable of temporally integrating information in a video sequence in such a way that the perceived spatial resolution of a sequence appears much higher than the spatial resolution of an individual frame. While the mechanisms in the human visual system that do this are unknown, the effect is not too surprising given that temporally adjacent frames in a video sequence contain slightly different, but unique, information. This paper addresses the use of both the spatial and temporal information present in a short image sequence to create a single high-resolution video frame. A novel observation model based on motion compensated subsampling is proposed for a video sequence. Since the reconstruction problem is ill-posed, Bayesian restoration with a discontinuity-preserving prior image model is used to extract a high-resolution video still given a short low-resolution sequence. Estimates computed from a low-resolution image sequence containing a subpixel camera pan show dramatic visual and quantitative improvements over bilinear, cubic B-spline, and Bayesian single frame interpolations. Visual and quantitative improvements are also shown for an image sequence containing objects moving with independent trajectories. Finally, the video frame extraction algorithm is used for the motion-compensated scan conversion of interlaced video data, with a visual comparison to the resolution enhancement obtained from progressively scanned frames.

...read moreread less

1,134 citations

Journal Article•10.1145/2461912.2461966•

Phase-based video motion processing

[...]

Neal Wadhwa¹, Michael Rubinstein¹, Frédo Durand¹, William T. Freeman¹•Institutions (1)

Massachusetts Institute of Technology¹

21 Jul 2013

TL;DR: A technique to manipulate small movements in videos based on an analysis of motion in complex-valued image pyramids that supports larger amplification factors and is significantly less sensitive to noise is introduced.

...read moreread less

Abstract: We introduce a technique to manipulate small movements in videos based on an analysis of motion in complex-valued image pyramids. Phase variations of the coefficients of a complex-valued steerable pyramid over time correspond to motion, and can be temporally processed and amplified to reveal imperceptible motions, or attenuated to remove distracting changes. This processing does not involve the computation of optical flow, and in comparison to the previous Eulerian Video Magnification method it supports larger amplification factors and is significantly less sensitive to noise. These improved capabilities broaden the set of applications for motion processing in videos. We demonstrate the advantages of this approach on synthetic and natural video sequences, and explore applications in scientific analysis, visualization and video enhancement.

...read moreread less

902 citations

...

Expand

Year	Papers
2025	1
2024	3
2023	16
2022	34
2021	9
2020	11

Topic Tools

Papers published on a yearly basis

Papers

Visual odometry

Sequence to Sequence -- Video to Text

Automatic partitioning of full-motion video

Extraction of high-resolution frames from video sequences

Phase-based video motion processing

Related Topics (5)

Performance Metrics