TL;DR: A new feature extractor, Bi-Weighted Oriented Optical Flow (Bi-WOOF) is proposed to encode essential expressiveness of the apex frame of a video, with a proposed technique achieving a state-of-the-art F1-score recognition performance.
Abstract: Despite recent interest and advances in facial micro-expression research, there is still plenty of room for improvement in terms of micro-expression recognition. Conventional feature extraction approaches for micro-expression video consider either the whole video sequence or a part of it, for representation. However, with the high-speed video capture of micro-expressions (100–200 fps), are all frames necessary to provide a sufficiently meaningful representation? Is the luxury of data a bane to accurate recognition? A novel proposition is presented in this paper, whereby we utilize only two images per video, namely, the apex frame and the onset frame. The apex frame of a video contains the highest intensity of expression changes among all frames, while the onset is the perfect choice of a reference frame with neutral expression. A new feature extractor, Bi-Weighted Oriented Optical Flow (Bi-WOOF) is proposed to encode essential expressiveness of the apex frame. We evaluated the proposed method on five micro-expression databases—CAS(ME) 2 , CASME II, SMIC-HS, SMIC-NIR and SMIC-VIS. Our experiments lend credence to our hypothesis, with our proposed technique achieving a state-of-the-art F1-score recognition performance of 0.61 and 0.62 in the high frame rate CASME II and SMIC-HS databases respectively.
TL;DR: To model features of normal video events, the newly-emergent one-class Extreme Learning Machine (OCELM) is introduced as the data description algorithm with a tremendous reduction in training time, which makes the approach easier for model updating and more applicable to fast learning from the rapidly generated surveillance data.
TL;DR: The experimental results on 80 videos from two datasets indicate the superior performance of the proposed key frame extraction approach, which aims to find the representative frames of the video and filter out similar frames from the representative frame set.
Abstract: Key frame extraction is an efficient way to create the video summary which helps users obtain a quick comprehension of the video content. Generally, the key frames should be representative of the video content, meanwhile, diverse to reduce the redundancy. Based on the assumption that the video data are near a subspace of a high-dimensional space, a new approach, named as key frame extraction in the summary space, is proposed for key frame extraction in this paper. The proposed approach aims to find the representative frames of the video and filter out similar frames from the representative frame set. First of all, the video data are mapped to a high-dimensional space, named as summary space. Then, a new representation is learned for each frame by analyzing the intrinsic structure of the summary space. Specifically, the learned representation can reflect the representativeness of the frame, and is utilized to select representative frames. Next, the perceptual hash algorithm is employed to measure the similarity of representative frames. As a result, the key frame set is obtained after filtering out similar frames from the representative frame set. Finally, the video summary is constructed by assigning the key frames in temporal order. Additionally, the ground truth, created by filtering out similar frames from human-created summaries, is utilized to evaluate the quality of the video summary. Compared with several traditional approaches, the experimental results on 80 videos from two datasets indicate the superior performance of our approach.
TL;DR: A novel approach for diamond search algorithm has been recommended to overcome the problem encountered by several existing block matching algorithms especially with full search algorithm in reference of peak signal-to-noise ratio, required number of examine or search points as well as computational complexity.
Abstract: Motion estimation is a progression used to estimate motion vectors between two or more images with a high degree of temporal redundancy. It is commonly used in video compression to attain high compression ratios as well as used in several applications for object tracking. In this paper a novel approach for diamond search algorithm has been recommended to overcome the problem encountered by several existing block matching algorithms especially with full search algorithm in reference of peak signal-to-noise ratio, required number of examine or search points as well as computational complexity. Simulation results reflect that recommended algorithm acting well compared to all existing algorithms. Experimentally 88–99% of the motion vectors are found inside the circle which has radius of 3-pixel unit and fixed on the place of zero motion. The proposed algorithm is used to implement various standards examples such as MPEG1 and MPEG4.
TL;DR: The result shows that the stereo vision distance measurement using Semi-Global Block Matching gives a good result, and the obtained best result contains error of less than 1% for 1m distance.
Abstract: Stereo vision has become an attractive topic research in the last decades. Many implementations such as the autonomous car, 3D movie, 3D object generation, are produced using this technique. The advantages of using two cameras in stereo vision are the disparity map between images. Disparity map will produce distance estimation of the object. Distance measurement is a crucial parameter for an autonomous car. The distance between corresponding points between the left and right images must be precisely measured to get an accurate distance. One of the most challenging in stereo vision is to find corresponding points between left and right images (stereo matching). This paper proposed distance measurement using stereo vision using Semi-Global Block Matching algorithm for stereo matching purpose. The object is captured using a calibrated stereo camera. The images pair then optimized using WLS Filter to reduce noises. The implementation results of this algorithm are furthermore converted to a metric unit for distance measurement. The result shows that the stereo vision distance measurement using Semi-Global Block Matching gives a good result. The obtained best result of this work contains error of less than 1% for 1m distance
TL;DR: A context-aware adaptive pattern-based ME algorithm for multimedia IoT platform to improve video compression with up to 40 and 36% reduction in encoding time for low-delay main and random access main (RA-main) profiles respectively in HEVC test model 16.10.
Abstract: Shaping video data into fast-responding transmission and high resolution output video using cost-effective video processing is desirable in many applications including Internet of Things (IoT) applications. In association with rapid development of IoT smart sensor applications, real-time processing of huge-amount of data for a video signal has become necessary leading to video compression technology. Motion estimation (ME) is necessary for improving the quality, but it has high computational complexity in video compression system. The present article, therefore, proposes a context-aware adaptive pattern-based ME algorithm for multimedia IoT platform to improve video compression. In the proposed algorithm, the motions are classified into large or small based on distortion value. Accordingly, the search pattern is chosen either small diamond search pattern (SDSP) or large diamond search pattern (LDSP) in each and every step of ME; allowing adaptive processing of large and small abstract information. Compared to conventional fast algorithms, the experimental results demonstrate up to 40 and 36% reduction in encoding time for low-delay main (LB-main) and random access main (RA-main) profiles respectively in HEVC test model 16.10 encoder with bit-rate loss of 0.071 and 0.246% for both the profiles, ensuring quality video and searching precision.
TL;DR: Two hardware architectures of the diamond pattern search algorithm for HEVC video coding with sequential and parallel techniques, are proposed, which can meet the real-time processing of the FHD @ 30 frames per second.
Abstract: High efficiency video coding (HEVC) is the latest video coding standard aimed to replace the H.264/AVC standard according to its high coding performance, which allows it to be mostly suitable for application in high definition videos. However, this performance is accompanied by a high computational complexity due principally to the motion estimation (ME) algorithm. As in H.264/AVC, the ME in HEVC is a highly computational demanding part that takes the largest part of the whole encoding time. Hence, many fast algorithms have been proposed in order to reduce computation, but, the majority, do not study how they can be effectively implemented by hardware. In this paper, two hardware architectures of the diamond pattern search algorithm for HEVC video coding with sequential and parallel techniques, are proposed. These architectures are based on parallel processing techniques. The sequential and parallel VHDL codes have been verified and can achieve at a high frequency on a Virtex-7 field-programmable gate-array design (FPGA) circuit. Compared to other designs, our parallel design provides better efficient implementation of available resources on FPGA. Our architecture can meet the real-time processing of the FHD @ 30 frames per second.
TL;DR: This paper proposes two hybrid algorithms: Artificial Bee Colony with differential Evolution and Harmony Search with Differential Evolution based motion estimation algorithms that outperformed other algorithms considering various parameters.
TL;DR: Experimental results show that the proposed method has good real-time video stabilization for a vehicle camera moving at various speeds and better stabilization performance than other methods for high-vibrating frames when both real- time processing and acceptable stabilization result are considered.
Abstract: Most previous methods of real-time video stabilization are only effective for low-vibrating frames which are usually captured by in-vehicle camera at the low-speed moving. To overcome their ineffectiveness on high-vibrating frames, this paper presents a real-time video stabilization system for the video sequences captured by a fast-moving in-vehicle camera without additional sensors. The proposed method is composed of four parts: frame-shaking judgment, feature classification, evaluating global motion and rotation angle, and frame compensation. Feature points and their motion vectors are employed for judging whether the current frame is shaking or not, and then a conversion matrix is deduced through the perspective projection for classifying such feature points into background or foreground type. Next, the optical flows of background’s feature points are mapped to polar coordinates for obtaining the representative optical-flow cluster of the background. Finally, such a cluster is utilized to calculate the global motion and rotation angle for compensation followed by the Kalman filtering in order to provide the better video stabilization. Experimental results show that the proposed method has good real-time video stabilization for a vehicle camera moving at various speeds and better stabilization performance than other methods for high-vibrating frames when both real-time processing and acceptable stabilization result are considered.
TL;DR: Experimental results results show the improvement for the proposed approach over other block matching algorithms in terms of the performance measures.
Abstract: Block matching (BM) motion estimation plays an inevitable role in video coding applications. BM approaches are used for data compression. The compression is achieved by removing the temporal redundancy in the video sequences. In the BM process, each video frame is subdivided into macroblocks. Each macroblock in the current frame is compared with the previous frame. The main objective is to minimize sum absolute difference. In this work, some modifications have been performed on conventional artificial bee colony algorithm to improve the conventional BM systems. An initial pattern is used in the proposed algorithm to reduce the computational cost. The computational cost is represented in terms of search points and convergence time. Experimental results results show the improvement for the proposed approach over other block matching algorithms in terms of the performance measures.
TL;DR: A new and improved iterative and adaptive search strategy for block-based motion estimation along with its efficient hardware implementation and significantly improved results both in terms of algorithmic metrics (PSNR) as well hardware performance (speed, area).
Abstract: Motion estimation (ME) plays an important part in the functioning of the video codec by identifying and reducing the temporal redundancies in between successive frames of a video sequence. Block matching algorithm (BMA) has been accepted as one of the finest approaches for motion estimation due to its efficiency and ease of implementation. This paper presents a new and improved iterative and adaptive search strategy for block-based motion estimation along with its efficient hardware implementation. Since it is expected that there will be more demand for streaming video services on mobile devices, designing fine tuning algorithm with dedicated efficient hardware would provide significant benefits. The present motion estimation algorithm is adaptive in nature that takes into consideration the motion content of the current frame while predicting the motion vector. The adaptive nature of the search eases the complexity of motion estimation and the algorithm makes use of the correlation present among the motion vectors of the neighboring blocks to lower the number of search position. Traditionally, such adaptive algorithms are executed by CPU cores running a software stack. Since software involves a significant amount of overheads like fetching into cache, branches, stalls etc., the efficiency of the proposed algorithm can be overshadowed by the hardware platform. To avoid this, compact hardware architecture was developed which stands ahead of other existing architectures as shown in comparison. The VLSI design for the proposed algorithm presented in this work deals with the generation of the adaptive search pattern and use of interleaved memory organization fasten the operational speed. A profitable data re-use scheme and involvement of minimum processing elements required for parallelization reduce the on-chip area. Working at a frequency of 243 MHz, the proposed design can process 66 720p HD (1280 × 720) frames in one second consuming an area of 38.2 K gate equivalent. Hence, the proposed design can be incorporated in video codecs to be used in commercial devices like camcorders, smart phones and other portable, battery-powered video consumer devices. The proposed research method achieves significantly improved results both in terms of algorithmic metrics (PSNR) as well hardware performance (speed, area).
TL;DR: In this paper, an efficient decoding of video content that may involve intra block copy operations, such as copying pixel data from one region of a frame to another region of the same frame is described.
Abstract: Efficient decoding of video content that may involve intra block copy operations, such as copying pixel data from one region of a frame to another region of the same frame is described. For example, a method to decode the video content may involve identifying the video frame in which intra block copy operation is to be performed, prior to the intra block copy operation being initiated. A video decoder may prefetch the pixel data from the source region to a local buffer with low memory latency such that the source pixel data to be copied into the destination blocks in the video frame is readily available. Thus, costly, and time consuming memory access may be avoided, and in turn a video decoding pipeline may operate smoothly without any stalling.
TL;DR: A video key frame extraction algorithm based on sliding window, the global feature Gist and local feature point detection algorithm SURF is designed and implemented and results show that key frames extracted in the algorithm are of high quality and can basically cover the main content of the original video.
Abstract: With the rapid development of the Internet and P2P technology, multimedia resources are gradually adding and used widely Since network traffic increases sharply, how to choose the interested information for a number of Internet users is challenging So, technologies and applications, such as video search, video fast browsing, video index and storage are in great demand Behind these technologies and applications, an important problem is how to quickly browse massive video data and obtain the main content of the video To solve this problem, different key frame extraction algorithms have been proposed Due to the diversity of video content, different video have different characteristics So the design of general video key frame extraction algorithm to solve the problem is not the reality The main trend for the problem is to design the key frame extraction algorithm based on the characteristics of the video itself In this article, we mainly focus on videos with edited boundaries and shot conversions Aiming at this kind of video, we have designed and implemented video key frame extraction algorithm based on sliding window, the global feature Gist and local feature point detection algorithm SURF In this algorithm, we use Gist feature to construct the global scene information of frames,and the SURF key point detection algorithm to extract local key points as local feature for each frame Then, shot segmentation based on sliding window and shot merging algorithm is applied to dividing the original video into several shots After that,we select the most representative frames in each video shot as key frames Finally we evaluate the result of the algorithm from the subjective and objective perspective Results show that key frames extracted in the algorithm are of high quality and can basically cover the main content of the original video
TL;DR: This paper presents a fast and effective method based on features to obtain real-time video stabilization for vehicle video recorder system that has good performance for video stabilization.
Abstract: This paper presents a fast and effective method based on features to obtain real-time video stabilization for vehicle video recorder system. The corresponding feature points are first obtained from two consecutive frames and then optical flows are calculated based on these points. Next, the obtained optical flows are mapped to polar coordinates to obtain clusters and remove incorrect optical flows. These obtained clusters are used to evaluate the global motion and rotation angle. Finally, the obtained global motion and rotation angle are smoothed and then compensated to obtain the stabilized video. Experimental results show that the proposed method has good performance for video stabilization.
TL;DR: Through extensive numerical simulations, it is demonstrated that the proposed framework does not only extend the video sensor lifetime by 54%, but it also performs significant end-to-end video quality enhancement of 35% in terms of Mean Squared Error (MSE) measurement.
Abstract: In this paper, we propose an energy-efficient joint video encoding and transmission framework for network lifetime extension, under an end-to-end video quality constraint in the Wireless Video Sensor Networks (WVSN). This framework integrates an energy-efficient and adaptive intra-only video encoding scheme based on the H.264/AVC standard, that outputs two service differentiated macroblocks categories, namely the Region Of Interest and the Background. Empirical models describing the physical behavior of the measured energies and distortions, during the video encoding and transmission phases, are derived. These models enable the video source node to dynamically adapt its video encoder’s configuration in order to meet the desired quality, while extending the network lifetime. An energy-efficient and reliable multipath multi-priority routing protocol is proposed to route the encoded streams to the sink, while considering the remaining energy, the congestion level as well as the packet loss rates of the intermediate nodes. Moreover, this protocol interacts with the application layer in order to bypass congestion situations and continuously feed it with current statistics. Through extensive numerical simulations, we demonstrate that the proposed framework does not only extend the video sensor lifetime by 54%, but it also performs significant end-to-end video quality enhancement of 35% in terms of Mean Squared Error (MSE) measurement.
TL;DR: The experimental results show that the PSNR values gain about 10 dB averagely and the proposed scheme in this paper improves the video quality significantly comparing with the exiting VECDH schemes.
Abstract: The video error concealment with data hiding (VECDH) method aims to conceal video errors due to transmission according to the auxiliary data directly extracted from the received video file. It has the property that can well reduce the error propagated between spatially/temporally correlated macro-blocks. It is required that, the embedded information at the sender side should well capture/reflect the video characteristics. Moreover, the retrieved data should be capable of correcting video errors. The existing VECDH algorithms often embed the required information into the corresponding video frames to gain the transparency. However, at the receiver side, the reconstruction process may loss important information, which could result in a seriously distorted video. To improve the concealment performance, we propose an efficient VECDH algorithm based on compressed sensing (CS) in this paper. For the proposed method, the frame features to be embedded in every video frame are generated from the frame residuals CS measurements and scrambled with other frame features as marked data. The marked data is embedded into the corresponding frames by modulating color-triples for its least impacts on the carriers. For the receiver, the extracted data is used to reconstruct residuals to conceal errors. Error positions are located using the set theory. Since the CS has the ability to sample a signal within a lower sampling rate than the Shannon–Nyquist rate, the original signal could be reconstructed very well in theory. This indicates that the proposed method could benefit from the CS, and therefore keep better error concealment behavior. The experimental results show that the PSNR values gain about 10 dB averagely and the proposed scheme in this paper improves the video quality significantly comparing with the exiting VECDH schemes.
TL;DR: A novel block matching algorithm named efficient direction-oriented search, which aims to dynamically switch between search regions based on the location of minimum distortion error, and even outperforms the full search algorithm with a significantly lower computational cost.
Abstract: Motion estimation is one of the most crucial and time-consuming component of video compression methods. However, much research has been done to improve computational complexity at the expense of the loss in performance of matching of blocks. A novel block matching algorithm named efficient direction-oriented search is proposed. For this, the proposed algorithm firstly aims to dynamically switch between search regions based on the location of minimum distortion error. The search region dimension is also made adaptive for faster convergence. Then the computational complexity is reduced by using a proposed horizontal, vertical wings diamond search pattern and, two ± 45 ∘ inclined hexagon-shaped direction-oriented search patterns. For further speed-up in the search process, partial distortion calculations are employed. A method for optimal threshold value selection based on the distortion statistics for different partial distortion calculations is presented. The performance of the proposed algorithm is evaluated for different video sequences containing: slow, medium, fast, and directional motion content. The experimental results indicate that significant improvement in speed-up can be achieved while maintaining the better peak signal-to-noise-ratio performance. For directional motion video sequences, the proposed method even outperforms the full search algorithm with a significantly lower computational cost.
TL;DR: A two-step approach is proposed for the recovery of cardiac MR images in the presence of free breathing motion, which shows improved structural similarity index (SSIM), peak signal-to-noise ratio (PSNR), and mean square error (MSE) with different acceleration factors for the proposed method.
Abstract: Transformed domain sparsity of Magnetic Resonance Imaging (MRI) has recently been used to reduce the acquisition time in conjunction with compressed sensing (CS) theory. Respiratory motion during MR scan results in strong blurring and ghosting artifacts in recovered MR images. To improve the quality of the recovered images, motion needs to be estimated and corrected. In this article, a two-step approach is proposed for the recovery of cardiac MR images in the presence of free breathing motion. In the first step, compressively sampled MR images are recovered by solving an optimization problem using gradient descent algorithm. The -norm based regularizer, used in optimization problem, is approximated by a hyperbolic tangent function. In the second step, a block matching algorithm, known as Adaptive Rood Pattern Search (ARPS), is exploited to estimate and correct respiratory motion among the recovered images. The framework is tested for free breathing simulated and in vivo 2D cardiac cine MRI data. Simulation results show improved structural similarity index (SSIM), peak signal-to-noise ratio (PSNR), and mean square error (MSE) with different acceleration factors for the proposed method. Experimental results also provide a comparison between k-t FOCUSS with MEMC and the proposed method.
TL;DR: A novel unsupervised video object segmentation approach via distractor-aware online adaptation (DOA) is proposed that achieves state-of-the-art results on two benchmark datasets, DAVIS 2016 and FBMS-59 datasets.
Abstract: Unsupervised video object segmentation is a crucial application in video analysis without knowing any prior information about the objects. It becomes tremendously challenging when multiple objects occur and interact in a given video clip. In this paper, a novel unsupervised video object segmentation approach via distractor-aware online adaptation (DOA) is proposed. DOA models spatial-temporal consistency in video sequences by capturing background dependencies from adjacent frames. Instance proposals are generated by the instance segmentation network for each frame and then selected by motion information as hard negatives if they exist and positives. To adopt high-quality hard negatives, the block matching algorithm is then applied to preceding frames to track the associated hard negatives. General negatives are also introduced in case that there are no hard negatives in the sequence and experiments demonstrate both kinds of negatives (distractors) are complementary. Finally, we conduct DOA using the positive, negative, and hard negative masks to update the foreground/background segmentation. The proposed approach achieves state-of-the-art results on two benchmark datasets, DAVIS 2016 and FBMS-59 datasets.
TL;DR: The mathematics of bilinear interpolation are utilized for the selection of the candidate motion vectors that minimize the error criterion, by estimating local minima in the error surface with arbitrary accuracy.
Abstract: The present paper focuses on high-accuracy block-based sub-pixel motion estimation utilizing a straightforward error minimization approach. In particular, the mathematics of bilinear interpolation are utilized for the selection of the candidate motion vectors that minimize the error criterion, by estimating local minima in the error surface with arbitrary accuracy. The implemented approach favors optimum accuracy over computational load demands, making it ideal as a benchmark for faster methods to compare against; however, it is not best suited to real-time critical applications (i.e. video compression). Other video processing needs relying on motion vectors and requiring high-resolution/accuracy can also take advantage of the proposed solution (and its simplified nature in terms of underlying theoretical complexity), such as motion-compensation filtering for super resolution image enhancement, motion analysis in sensitive areas (e.g. high-speed video monitoring, medical imaging, motion analysis in sport science, big-data visual surveillance, etc.). The proposed method is thoroughly evaluated using both real video and synthetic motion sequences from still images, adopting well-tested block-based motion estimation evaluation procedures. Assessment includes comparisons to a number of existing block-based methods with respect to PSNR and SSIM metrics over ground-truth samples. The conducted evaluation takes into consideration both the original (arbitrary-accuracy) and the truncated motion vectors (after rounding them to the nearest half, quarter, or eighth of a pixel), where superior performance with more accurate motion vector estimation is revealed. In this context, the degree to which sub-pixel motion estimation methods actually produce sub-pixel motion vectors is investigated, and the implications thereof are discussed.
TL;DR: Various block-matching motion estimation algorithms are discussed such as Full search FS or Exhaust Search, Three-Step search TSS, New Three- step search NTSS, Four-StepSearch FSS, Diamond search DS etc.
Abstract: Motion estimation has traditionally been used in video encoding only, however, it can also be used to solve various real-life problems. Nowadays, researchers from different fields are turning towards motion estimation. Motion estimation has become a serious problem in many video applications. It is a very important part of video compression technique and it provides improved bit rate reduction and coding efficiency. The process of motion estimation is used to improve compression quality and it also reduces computation time. Block-based motion estimation algorithms are used as they require less memory for processing of any video file. It also reduces the complexity involved in computations. In this article, various block-matching motion estimation algorithms are discussed such as Full search FS or Exhaust Search, Three-Step search TSS, New Three-Step search NTSS, Four-Step search FSS, Diamond search DS etc.
TL;DR: In this paper, two algorithms are discussed, which are based on Differential Evolution and Harmony Search, and a new algorithm is proposed by hybridizing these two algorithms to get better results.
Abstract: In video compression, the most efficient technique for motion estimation is Block Matching and there are many algorithms to implement it. In this paper, two such algorithms are discussed, which are based on Differential Evolution (DE) and Harmony Search (HS), and a new algorithm is proposed by hybridizing these two algorithms to get better results. In the proposed algorithm, pitch adjustment operation of HS is replaced by mutation and crossover operations of DE.
TL;DR: An improved block-matching technique is proposed that incorporates a chaotic-based sine-cosine optimization algorithm along with a fitness approximation (FA) strategy and demonstrates that the proposed method yields potential improvements over other competent schemes.
Abstract: Motion estimation (ME) plays an important role in a video coding solution to achieve a low bit rate. The selection of the optimal motion vector (MV) has a significant impact on the quality of the compressed video. Block-matching (BM) algorithm is one of the widely accepted ME techniques to estimate the motion between the successive frames. In any BM technique, the motion vectors (MVs) are obtained for the current frame over a pre-defined search region in the previous frame by minimizing certain matching criterion. However, the computation of these matching criteria is highly expensive (in terms of the computational time). Hence, the block-based ME (BME) can be realized as an optimization problem which aims at finding the best-matched block within a specified search region. In this context, an improved block-matching technique is proposed that incorporates a chaotic-based sine-cosine optimization algorithm along with a fitness approximation (FA) strategy. The proposed approach has been compared with several other BM techniques in terms of different parameters, namely, the peak-signal-to-noise-ratio (PSNR), PSNR degradation ratio (\(D_{PSNR}\)), and the number of search points. The analysis of the results obtained demonstrates that the proposed method yields potential improvements over other competent schemes.
TL;DR: A modified diamond search pattern (MDS) algorithm is proposed using small diamond shape search pattern in initial step and large diamond shape (LDS) in further steps for motion estimation and performs better than DS and CDS on average search point and average computation time.
Abstract: Object tracking is one of the main fields within computer vision. Amongst various methods/ approaches for object
detection and tracking, the background subtraction approach makes the detection of object easier. To the detected
object, apply the proposed block matching algorithm for generating the motion vectors. The existing diamond
search (DS) and cross diamond search algorithms (CDS) are studied and experiments are carried out on various
standard video data sets and user defined data sets. Based on the study and analysis of these two existing algorithms
a modified diamond search pattern (MDS) algorithm is proposed using small diamond shape search pattern in initial
step and large diamond shape (LDS) in further steps for motion estimation. The initial search pattern consists of five
points in small diamond shape pattern and gradually grows into a large diamond shape pattern, based on the point
with minimum cost function. The algorithm ends with the small shape pattern at last. The proposed MDS algorithm
finds the smaller motion vectors and fewer searching points than the existing DS and CDS algorithms. Further, object
detection is carried out by using background subtraction approach and finally, MDS motion estimation algorithm
is used for tracking the object in color video sequences. The experiments are carried out by using different video
data sets containing a single object. The results are evaluated and compared by using the evaluation parameters like
average searching points per frame and average computational time per frame. The experimental results show that
the MDS performs better than DS and CDS on average search point and average computation time.
TL;DR: A hierarchy-based block matching method that facilitates the transmission of high bit-rate videos over standard communication methods is proposed based on the frequency domain, where the algorithm examines the similarities between a chosen frequency subset, which significantly reduces the total number of comparisons and the total mathematical computations required per block.
Abstract: Although the advancements in hardware solutions are growing exponentially along with the communication channels capacity, high quality video encoders for real-time applications are still considered an open area of research. The majority of researchers interested in video encoders target their investigations towards motion estimation and block matching algorithms. Many algorithms that aim to reduce the total number of required mathematical operations when compared to Full Search have been proposed. However, the results often converge to local minima and a significant amount of computations is still required. Therefore, in this research, a hierarchy-based block matching method that facilitates the transmission of high bit-rate videos over standard communication methods is proposed. The proposed algorithm is based on the frequency domain, where the algorithm examines the similarities between a chosen frequency subset, which significantly reduces the total number of comparisons and the total mathematical computations required per block.
TL;DR: An extensive evaluation of a QoE-aware video rate evolution model based on buffer state changes shows an improvement in the stability, average video rate and system utilisation, while at the same time a reduction in the start-up delay and convergence time is achieved by the modified players.
Abstract: HTTP adaptive video streaming matches video quality to the capacity of a changing context. A variety of schemes that rely on buffer state dynamics for video rate selection have been proposed. However, these schemes are predominantly based on heuristics, and appropriate models describing the relationship between video rate and buffer levels have not received sufficient attention. In this paper, we present a QoE-aware video rate evolution model based on buffer state changes. The scheme is evaluated within a real-world Internet environment. The results of an extensive evaluation show an improvement in the stability, average video rate and system utilisation, while at the same time a reduction in the start-up delay and convergence time is achieved by the modified players.
TL;DR: This paper proposes a configurable hardware implementation of the two stage algorithm for FPGAs, for real-time, low-power processing, and is able to mitigate flicker artifacts with a framerate of > 30 fps for a resolution of [$1280\times 1088$] on a Xilinx Virtex-6 LX240T FPGA.
Abstract: Flicker mitigation is an computer vision algorithm class, reducing the amplitude modulation effect of pulsed LED light sources captured with an unsynchronized discrete exposure image sensor. These disturbing effects prevent legal admission for digital side mirror systems in automotive and traffic sign recognition from digital speed limit signs. Based on a bidirectional dense optical flow block matching algorithm between successive frames of a video sequence and a threshold based classification, flicker can be detected and mitigated. In this paper we propose a configurable hardware implementation of the two stage algorithm for FPGAs, for real-time, low-power processing. The proposed architecture is able to mitigate flicker artifacts with a framerate of > 30 fps for a resolution of [ $1280\times 1088$ ] on a Xilinx Virtex-6 LX240T FPGA.
TL;DR: This paper presents a review of motion estimation based on block matching algorithm and also includes analytical study of fixed and variable block matching algorithms.
Abstract: With the recent advances in video technology, there is an increasing need for a more reliable, efficient and robust generic framework for video processing and its analysis. In this regard the Motion estimation has for many years demanding area of research because of its diversity of use in real-time applications. Motion estimation using block matching algorithm is used in many applications in video processing. This paper presents a review of motion estimation based on block matching algorithm and also includes analytical study of fixed and variable block matching algorithms
TL;DR: Li et al. as discussed by the authors proposed an image region copying tampering detection method based on Local Intensity Order Pattern (LIOP) features and block matching, which can cope with situations including rotation, scaling, JPEG (Joint Photographic Experts Group) compression, noise adding and the like than other features.
Abstract: The invention mainly aims at the field of digital image evidence collection, and particularly relates to an image region copying tampering detection method based on LIOP (Local Intensity Order Pattern) features and block matching. A method based on feature points and a method based on partitioning are combined, and the advantages of two categories of methods are blended. Firstly, the LIOP featuresare selected as an image feature extraction algorithm and can better cope with situations including rotation, scaling, JPEG (Joint Photographic Experts Group) compression, noise adding and the like than other features; and after features are matched, a new matching pair expression model is used for expressing and screening a matching pair, redundant matching pairs are removed, accuracy is improved, and computation complexity is lowered. According to the matching pair, image segmentation is carried out, features are subjected to partitioning extraction, then, a block matching algorithm is usedfor matching tampering, and finally, accurate positioning is carried out. The algorithm is high in detection accuracy, meanwhile, various types of images are subjected to copying and pasting tampering, including rotation, scaling, noise adding, compression and the like, and the method has a good effect.
TL;DR: A new method is proposed that can generate side information with a better quality and thus better compression and is optimized for perceptual quality metrics and leads to better side information generation compared to conventional MSE or SAD based motion compensation currently used in the literature.
Abstract: In the popular video coding trend, the encoder has the task to exploit both spatial and temporal redundancies present in the video sequence, which is a complex procedure. As a result almost all video encoders have five to ten times more complexity than their decoders. In a video compression process, one of the main tasks at the encoder side is motion estimation which is to extract the temporal correlation between frames. Distributed video coding (DVC) proposed the idea that can lead to low complexity encoders and higher complexity decoders. DVC is a new paradigm in video compression based on the information theoretic ideas of Slepian-Wolf and Wyner-Ziv theorems. Wyner-Ziv coding is naturally robust against transmission errors and can be used for joint source and channel coding. Side Information is one of the key components of the Wyner-Ziv decoder. Better side information generation will result in better functionality of Wyner-Ziv coder. In this paper we proposed a new method that can generate side information with a better quality and thus better compression. We have used HVS (human visual system) based image quality metrics as our quality criterion. The motion estimation we used in the decoder is modified due to these metrics such that we could obtain finer side information. The motion compensation is optimized for perceptual quality metrics and leads to better side information generation compared to con- ventional MSE (mean squared error) or SAD (sum of absolute difference) based motion compensation currently used in the literature. Better motion compensation means better compression.