1. What are the two categories of recent video super-resolution methods?
Recent video super-resolution (VSR) methods are classified into two categories: methods based on a sliding window and those based on recurrent computation. Sliding window methods, such as CNN-based VSR methods [15, 25, 13, 16, 18, 11], receive several consecutive frames as input, traverse them with a sliding window, and predict an SR image of their center frame. However, these methods suffer from high computational costs and limited input frames, making it difficult to handle long-term dependencies. On the other hand, methods based on recurrent computation, like those proposed by Huang et al. [8] and Chan et al. [4], utilize reconstructed high-quality images at previous time steps or their features to generate high-quality images at the current time step. These methods better utilize temporal information and employ high-order grid connections and flow-guided alignment for improved performance.
read more
2. What is the critical problem in reference-based super-resolution?
The critical problem in reference-based super-resolution (RefSR) is accurately aligning the reference image (Ref image) with the low-resolution image (LR image). This alignment is crucial for fusing their image features in a subsequent step to generate high-quality super-resolved (SR) images. Inaccurate alignment can lead to poor fusion of image features, resulting in lower quality SR images. Various methods have been proposed to address this problem, such as estimating optical flows between the images (Zheng et al. [24]), using patch matching (Zhang et al. [22]), adopting attention mechanisms for feature fusion (Yang et al. [20]), and proposing an aligned attention method (Wang et al. [17]). Huang et al. [9] also decouple the RefSR task into two sub-tasks to reduce misuse and underuse of the Ref feature. Lee et al. [12] further integrate RefSR with Video Super-Resolution (VSR) in their RefVSR method.
read more
3. What is the main purpose of RefVSR?
RefVSR aims to integrate reference-based SR and video SR to solve the problem of generating high-quality super-resolution images. It uses I Ref and Re-fVSR to propagate scene image features and compensate for motion, resulting in enriched features and high-quality SR images. However, it has two drawbacks: not deriving all information from inputs and propagating a confidence map that is not well-founded compared to other components. The method updating the confidence map appears heuristic, and the fusion of features from two streams is done using a single module.
read more
4. How does deformable convolution aid in Ref feature alignment?
Deformable convolution (DCN) is employed in Ref feature alignment to compensate for errors in the estimated optical flow. It enhances the alignment process by combining optical flow with DCN, as demonstrated by Huang et al. [9]. DCN allows for more fine-grained alignment by adaptively compensating for the field of view (FoV) difference in an image. It achieves this by computing an offset for the optical flow and adjusting it based on the image content. The DCN approach involves embedding the Ref and LR images into feature maps, extracting 3x3 patches with a stride of 1 using a shared encoder, and calculating the cosine distance between pairs of feature patches. The matching index and confidence map are then determined, resulting in a refined alignment. This method has proven to be more effective than optical flow-based warping, as it provides better warping of Ref features. Overall, DCN improves the accuracy and sharpness of textures in Ref frames, contributing to enhanced Ref feature alignment.
read more