TL;DR: A novel observation model based on motion compensated subsampling is proposed for a video sequence and Bayesian restoration with a discontinuity-preserving prior image model is used to extract a high-resolution video still given a short low-resolution sequence.
Abstract: The human visual system appears to be capable of temporally integrating information in a video sequence in such a way that the perceived spatial resolution of a sequence appears much higher than the spatial resolution of an individual frame. While the mechanisms in the human visual system that do this are unknown, the effect is not too surprising given that temporally adjacent frames in a video sequence contain slightly different, but unique, information. This paper addresses the use of both the spatial and temporal information present in a short image sequence to create a single high-resolution video frame. A novel observation model based on motion compensated subsampling is proposed for a video sequence. Since the reconstruction problem is ill-posed, Bayesian restoration with a discontinuity-preserving prior image model is used to extract a high-resolution video still given a short low-resolution sequence. Estimates computed from a low-resolution image sequence containing a subpixel camera pan show dramatic visual and quantitative improvements over bilinear, cubic B-spline, and Bayesian single frame interpolations. Visual and quantitative improvements are also shown for an image sequence containing objects moving with independent trajectories. Finally, the video frame extraction algorithm is used for the motion-compensated scan conversion of interlaced video data, with a visual comparison to the resolution enhancement obtained from progressively scanned frames.
TL;DR: The algorithm has been designed mainly for 50 Hz to 75 Hz frame rate up-conversion with applications in a multimedia environment, but it can also be used in advanced television receivers to remove artifacts due to low scan rate.
Abstract: A frame interpolation algorithm for frame rate up-conversion of progressive image sequences is proposed. The algorithm is based on simple motion compensation and linear interpolation. A motion vector is searched for each pixel in the interpolated image and the resulting motion field is median filtered to remove inconsistent vectors. Averaging along the motion trajectory is used to produce the interpolated pixel values. The main novelty of the proposed method is the motion compensation algorithm which has been designed with low computational complexity as an important criterion. Subsampled blocks are used in block matching and the vector search range is constrained to the most likely motion vectors. Simulation results show that good visual quality has been obtained with moderate complexity. The algorithm has been designed mainly for 50 Hz to 75 Hz frame rate up-conversion with applications in a multimedia environment, but it can also be used in advanced television receivers to remove artifacts due to low scan rate.
TL;DR: The Motion Compensation Hypothesis is applied to the problem of video compression, and fast search Algorithms are used to estimate the value of a single frame.
Abstract: Preface. 1. The Problem of Video Compression. 2. Video Compression Techniques. 3. Motion Compensation and Estimation. 4. Experiments on Current Motion Estimation Techniques. 5. The Motion Compensation Hypothesis. 6. Fast Search Algorithms: New Results. 7. Conclusions and Research Directions. Bibliography. Index.
TL;DR: In this article, a method of splicing two compressed video signals which have been encoded according to the standard adopted by the Moving Picture Experts Group (MPEG) determines an amount of null information that is to be inserted between the two video signals in order to ensure that an input buffer of an MPEG decoder does not overflow after receiving the spliced video signals.
Abstract: A method of splicing two compressed video signals which have been encoded according to the standard adopted by the Moving Picture Experts Group (MPEG) determines an amount of null information that is to be inserted between the two video signals in order to ensure that an input buffer of an MPEG decoder does not overflow after receiving the spliced video signals. The method allows a splice to occur after any access unit (picture) in the first compressed video signal. The amount of null information is determined from the data rates of the first and second compressed video signals and the amount of new data which is provided to the buffer before the data is retrieved from the buffer for both the first and second video signals. The video signals are spliced by inserting the null information, as sequence stuffing bits into a buffer immediately after the selected picture in the first video signal. The second video signal is transmitted to the buffer immediately after these stuffing bits.
TL;DR: In this article, a motion-compensated interframe prediction is achieved by determining motion vectors of respective pixels according to representative motion vectors with weighting and determining accurate motion vectors between video frames, dividing each frame into areas optimal to an objective figure, size and location and performing effective encoding and decoding of the motion vectors.
Abstract: In a video-coding and video-decoding device, motion-compensated interframe prediction is achieved by determining motion vectors of respective pixels according to representative motion vectors with weighting, and determining accurate motion vectors between video frames, dividing each frame into areas optimal to an objective figure, size and location and performing effective encoding and decoding of the motion vectors. According to the invention a motion-compensated interframe predicting portion generates a predicted video-frame by varying codable area according to a reference video-frame received from a frame memory portion and an input video frame and obtains side-information. A motion vector searching portion searches a motion vector. An effective-area selecting portion selects a valid or invalid mask depending upon a position of a processable object, divides a processable area of the input video frame into suitable areas, a variable-area predicted-frame generating portion generates a predicted frame by affine transformation and translational displacement. An area-diving pattern deciding portion outputs side-information such as the predicted image, motion vectors and divided areas, a side-information coding portion encodes an additional motion vector as a difference from an average basic-motion vector or predictively encodes a motion vector from a median value of three neighboring motion-vectors. A difference between the input video-frame and the predicted video-frame from the predicting portion is encoded, transferred and stored.
TL;DR: In this article, a speech detection algorithm locates specific words in the audio portion data of the video tape and passes the locations where the specific words are found to the video analysis algorithm.
Abstract: A method and apparatus to automatically index the locations of specified events on a video tape. The events, for example, include touchdowns, fumbles and other football-related events. An index to the locations where these events occur are created by using both speech detection and video analysis algorithms. A speech detection algorithm locates specific words in the audio portion data of the video tape. Locations where the specific words are found are passed to the video analysis algorithm. A range around each of the locations is established. Each range is segmented into shots using a histogram technique. The video analysis algorithm analyzes each segmented range for certain video features using line extraction techniques to identify the event. The final product of the video analysis is a set of pointers (or indexes) to the locations of the events in the video tape.
TL;DR: In this article, a digital motion video processing circuit can capture, playback, manipulate and manipulate motion video information using the system memory of a computer as a data buffer for holding compressed video data from the circuit.
Abstract: A digital motion video processing circuit can capture, playback and manipulate digital motion video information using the system memory of a computer as a data buffer for holding compressed video data from the circuit. The system memory may be accessed by the circuit over a standard bus. A controller in the circuit directs data flow between an input/output port which transfer a stream of pixel data and to the standard bus. The controller directs data to and from either the standard bus or the input/output port through processing circuitry for compression, decompression, scaling and buffering. The standard bus may be a peripheral component interconnect (PCI) bus. The motion video processing circuit has a data path including pixel data and timing data indicative of a size of an image defined by the pixel data. The timing data is used and/or generated by each component which processes the pixel data, thereby enabling each component to process the pixel data without prior knowledge of the image format. By having processors for handling two compression formats for motion video data connected to this data path, conversion between compression formats, such as between MPEG to Motion JPEG, can be performed.
TL;DR: In this article, an error signal suppressor containing two filters is applied to the video compression error signal to reduce or eliminate random and high frequency impulse noise, which reduces the overall video compression bitrate by up to between 10% and 20% which provides corresponding increases in video compression and transmission efficiency.
Abstract: A video compression error signal in a video compression scheme is affected by random and high frequency impulse noise. An error signal suppressor containing two filters is applied to the video compression error signal. The first filter reduces or eliminates random noise. The second filter eliminates high frequency impulse noise. Random and high frequency noise is reduced or eliminated from frequencies that are unimportant to human visual perception. The error signal suppressor reduces the overall video compression bitrate by up to between 10% and 20% which provides corresponding increases in video compression and transmission efficiency. The error signal suppressor is used in video compression encoding schemes such as MPEG to reduce random and high frequency noise.
TL;DR: Results of the simulations described in this paper show that the presented scheme enables up to 45% bitrate reductions compared to the ITU-T Rec.
Abstract: This paper presents novel methods for motion compensated prediction and prediction error coding of video sequences. The scheme utilises segmentation of the video frames into nonregularly shaped segments composed of small square blocks which can be encoded with a very low number of bits. This two-step segmentation is obtained by quadtree-like splitting of image blocks followed by a motion assisted merging algorithm which yields segments characterised by uniformity of motion. Motion fields are compactly encoded using 2-D separable orthonormal polynomials. The number and order of these polynomials is established for each segment separately in an adaptive manner. To improve the efficiency of coding of the residual error, after motion compensated prediction the proposed scheme utilises the spatial properties of the prediction frame available in the encoder and the decoder. Results of the simulations described in this paper show that the presented scheme enables up to 45% bitrate reductions compared to the ITU-T Rec. H.263 video coder while achieving the same objective quality of coded video.
TL;DR: In this article, a motion vector between the current macroblock and the best match macroblock in the reference picture is constructed, which is then used to find a best matching macroblock.
Abstract: Temporal compression of a digital video data stream with hierarchically searching in at least one search unit for pixels in a reference picture to find a best match for the current macroblock. This is followed by constructing a motion vector between the current macroblock and the best match macroblock in the reference picture.
TL;DR: In this article, a method for determining quantization level versus bit-rate characteristics of raw video signals in video frames during a pre-encoding phase for video technologies such as MPEG and MPEG-2 is presented.
Abstract: A system and method for determining quantization level versus bit-rate characteristics of raw video signals in video frames during a pre-encoding phase for video technologies such as MPEG and MPEG-2. During a pre-encoding phase, various quantization levels are assigned to various parts of a frame, and the frame is then pre-encoded to determine a bit-rate for each quantization level used in the pre-encoding phase. Depending on the embodiment, quantization levels are assigned in one of many ways: checkerboard style, block style or any other distribution that avoids statistical anomalies. The method and system repeat the pre-encoding for plural frames, recording all quantization level versus bit-rate statistics on a frame by frame basis. These statistics are then used during encoding or re-encoding of a digital video to control the number of bits allocated to one segment of the digital video as compared to another segment, based on a target quality and target storage size for each segment. The resulting encoded digital video is stored on a digital storage medium, such as a compact disc.
TL;DR: In this article, a system for playing video data ahead of corresponding audio data in order to help maintain synchronization between the audio data and the video data is presented, where an adaptive offset time is applied to the initial start time of a decoded frame of video data.
Abstract: A system for playing video data ahead of corresponding audio data in order to help maintain synchronization between the audio data and the video data. Two software objects or filters are used to process the video data. An initial start time of the video data is determined and, if possible, the frame of video data is decoded or else it is selectively dropped in order to help maintain synchronization. An adaptive offset time is applied to the initial start time of a decoded frame of video data in order to produce an adjusted start time for the decoded frame. The offset time can be adapted to include a refresh offset related to sweep delays in computer monitors, a target offset which helps to build in a play-ahead margin for future late frames, and an earliness offset which is diminishing over time to help smooth transitions due to changing apparent video processing power. Additionally, the playing of video data can be slowed down in response to a low condition of the audio buffer. This avoids sound breaks and also helps to maintain synchronization.
TL;DR: In this paper, a video cut (containing a frame) designated by a user is automatically and correctly extracted directly from a video image under playing operation at high speed, where a judgement is made as to whether or not the frame of the video image is designated by the user.
Abstract: Only a video cut (containing a frame) designated by a user is automatically and correctly extracted directly from a video image under playing operation at high speed. A judgement is made as to whether or not the frame of the video image is designated by the user. When such a user designation is made, a detection is made of a change point in the video cut containing this designated frame (reverse playing direction, forward playing direction). Under such a condition that a frame image immediately before this change point is displayed, the playing operation of the video image is brought into a pause state. Accordingly, only the video cut desired by the user can be extracted during a single playing operation of the video image, and the video image can be edited in a high efficiency.
TL;DR: In this paper, the macroblocks of received video frames are identified, selected, processed and stored to facilitate later combination into a single fully intra-coded composite video frame suitable for use during VTR trick play operation.
Abstract: Method and apparatus for generating a fully intra-coded video frame from a received progressive refresh bitstream representing a series of inter-coded video frames Intra-coded macroblocks of received video frames are identified, selected, processed and stored to facilitate later combination into a single fully intra-coded composite video frame suitable for use during VTR trick play operation As part of the intra-coded macroblock selection process, in various embodiments, the large sets of adjacent macroblocks are given priority over previously selected macroblocks that correspond to non-adjacent positions within a video frame or which correspond to a smaller set of adjacent video frames As part of the macroblock processing performed prior to storage of selected intra-coded macroblocks, the amount of data used to represent each intra-coded macroblock is reduced and the macroblocks are processed so as to be represented in a consistent manner which facilitates the subsequent combination of intra-coded macroblocks from different frames into a single low resolution frame suitable for use during trick play operation
TL;DR: Simulation results on different sequences show that the proposed algorithm is effective for detecting abrupt scene changes in MPEG video.
Abstract: A new scene change detection algorithm is developed for MPEG encoded video sequences. The proposed method is very simple that it only requires the bit-rate information at macroblock level as well as the number of various motion-predicted blocks for detection. Those information can be extracted easily from the encoded bit-stream without decompression. Simulation results on different sequences show that the proposed algorithm is effective for detecting abrupt scene changes in MPEG video.
TL;DR: In this paper, the authors propose a four-frame, first-in-first-out (FIFO) buffer for video data on a hard disk, where each group can store one scaled-down frame (or field) of video data.
Abstract: In a preferred embodiment, when full-motion video data is to be captured on a hard disk, a full-motion video memory on a video controller card has its addresses segmented into four groups, where each group can store one scaled-down frame (or field) of video data. The video memory is arranged to effectively act as a four-frame, first-in first-out (FIFO) buffer. The holding time of a single frame of data (i.e., four times the conventional holding time) in the video memory is sufficient to allow for the unpredictable variations in the hard drive timing so that frames are not arbitrarily dropped by worst case timing/accessing times of the hard drive. Hence, the average bandwidth and timing of the hard drive, rather than the instantaneous worst case bandwidth and timing of the hard drive, is used when designing the system. Additionally, video data may be read from and written into the same frame area in the video memory as long as the read (capture) and write (video-in) pointers have been determined to not overlap while accessing the same frame area. This more efficiently utilizes the capabilities of the hard drive.
TL;DR: In this article, a system and a method for motion-compensated de-interlacing of interlaced video frames that generates high quality progressive video frames, keeps computational complexity low and requires the use of only a single image frame is provided.
Abstract: A system and a method for motion-compensated de-interlacing of interlaced video frames that generates high quality progressive video frames, keeps computational complexity low and requires the use of only a single image frame is provided. The system and method of this invention determine if global motion (camera motion) is present in the scene. If global motion is detected, it is estimated and compensated. The globally-compensated image is then analyzed to determine whether local motion is present. If local motion is detected, the image pixels affected by the local motion are interpolated using motion-adaptive techniques. If no local motion and no global motion is detected, the image pixels are interleaved.
TL;DR: In this paper, the authors explore the benefits of encoding the motion vectors with other accuracies, and of encoding different motion vector with different accuracies within the same frame, and derive expressions for the encoding rates for both motion vectors and difference frames, in terms of the accuracies.
Abstract: In block-based motion-compensated video coding, a fixed-resolution motion field with one motion vector per image block is used to improve the prediction of the frame to be coded. All motion vectors are encoded with the same fixed accuracy, typically 1 or 1/2 pixel accuracy. In this work, we explore the benefits of encoding the motion vectors with other accuracies, and of encoding different motion vectors with different accuracies within the same frame. To do this, we analytically model the effect of motion vector accuracy and derive expressions for the encoding rates for both motion vectors and difference frames, in terms of the accuracies. Minimizing these expressions leads to simple formulas that indicate how accurately to encode the motion vectors in a classical block-based motion-compensated video coder. These formulas also show that the motion vectors must be encoded more accurately where more texture is present, and less accurately when there is much interframe noise. We implement video coders based on our analysis and present experimental results on real video frames. These results suggest that our equations are accurate, and that significant bit rate savings can be achieved when our optimal motion vector accuracies are used.
TL;DR: This paper shows that using an active color illumination will considerably improve the quality of the matching results of high quality when using the new hierarchical chromatic block matching algorithm.
Abstract: Stereo is a well-known technique for obtaining depth information from digital images. Nevertheless, this technique still suffers from a lack in accuracy and/or long computation time needed to match stereo images. A new hierarchical algorithm using an image pyramid for obtaining dense depth maps from color stereo images is presented. We show that matching results of high quality are obtained when using the new hierarchical chromatic block matching algorithm. Most stereo matching algorithms can not compute correct dense depth maps in homogenous image regions. This paper shows that using an active color illumination will considerably improve the quality of the matching results. We present results for synthetic and for real images.
TL;DR: In this article, the quantization parameters for a block transform based video compression algorithm can be controlled by a quantizer selector so as to control compressed video frame size, where the selection of the appropriate quantization parameter for the nth macroblock of a current frame is based on the cumulative number of compressed bits appearing in the first n-1 macroblocks of the current frame and a previous frame.
Abstract: In a video image compression and transmission system, quantization parameters for a block transform based video compression algorithm can be controlled by a quantizer selector so as to control compressed video frame size. The selection of the appropriate quantization parameter for the nth macroblock of a current frame is based on the cumulative number of compressed bits appearing in the first n-1 macroblocks of a current frame and a previous frame. By controlling the quantization parameter is such a manner, the overall system reacts more quickly to changes in complexity in the video sequence and allocates bits more accurately to different parts of the video frame according to a past history of bit allocation. To efficiently utilize the bandwidth of a transmission medium (such as POTS), a bit count of the contents of the transmit buffer is sent to a buffer regulator in a video controller where it is compared to a low water mark threshold. If the bit count falls below the threshold, an uncompressed video frame is scheduled for compression by a video compressor. By using the low water mark threshold, latency in the overall system is reduced and an efficient use of transmission medium bandwidth is achieved.
TL;DR: In this article, a system for detecting a point of change between video shots from a video having a plurality of succeeding frames is presented, which includes video playback apparatus for playing a video chronologically one frame at a time and a display for displaying the video.
Abstract: A system for detecting a point of change between video shots from a video having a plurality of succeeding frames. The system includes video playback apparatus for playing a video chronologically one frame at a time, and a display for displaying the video. The a processing device for calculating a feature quantity of video image data for each frame, determining a first correlation coefficient between a feature quantity of a current frame and a feature quantity calculated from an immediately preceding frame and determining a second correlation coefficient between the feature quantity of the current frame and a feature quantity of at least two frames preceding the current frame, and indicating on the display a point of change between video shots when the first correlation coefficient and the second correlation coefficient are out of predetermined allowable ranges. The correlation coefficients of each frame is stored and can be used by the processing device to dynamically change a reference used for detecting a point of change between video shots. The change in the reference is performed based on the stored correlation coefficients or feature quantities of past frames.
TL;DR: In this paper, a four-buffer MPEG decoder is provided for decoding MPEG video frames, including I-frames, P-frames and B-frames; the decoding, displaying and discarding of I-frame and P-frame are handled by a four buffer frame controller and control method.
Abstract: A four-buffer MPEG decoder is provided for decoding MPEG video frames. A four-buffer frame controller and control method manage the four frame buffers including decoding, displaying and discarding of I-frames, P-frames and B-frames so that video data decoding is accelerated. The four-buffer frame controller and control method frees one frame buffer when the frame buffer contains obsolete data, defined as data which is no longer useful for decoding additional frames and for which storage is not necessary for displaying pictures in a correct temporal order. One example of an obsolete frame is a B-frame that is displayed. Another example is a P-frame for I-frame which is no longer used for motion compensation and has been displayed.
TL;DR: In this paper, the authors proposed a motion compensated (MC) coding of video and a prediction scheme which allows fast and compact encoding of motion vector fields retaining at the same time very low prediction error.
Abstract: This invention relates to motion compensated (MC) coding of video and to a MC prediction scheme which allows fast and compact encoding of motion vector fields retaining at the same time very low prediction error. By reducing prediction error and number of bits needed for representation of motion vector field, substantial savings of bit rate are achieved. Reduction of bit rate needed to represent motion field is achieved by merging segments in video frames, by adaptation of motion field model and by utilization of motion field model based on orthogonal polynomials.
TL;DR: In this article, a method for displaying a video sequence depicting motion of at least one video object by sequential presentation of a plurality of frames is presented, where a low resolution representation of the video object is stored in a low-resolution representation and a quality metric is calculated to determine when updating is necessary.
Abstract: A method for displaying a video sequence depicting motion of at least one video object by sequential presentation of a plurality of frames includes storing a low resolution representation of the video object, generating and displaying a first frame of the sequence based on the low resolution representation, updating the low resolution representation, and generating and displaying a subsequent frame of the sequence based on the updated representation The method may include calculating a quality metric to determine when updating is necessary
TL;DR: A complex-valued discrete wavelet transform is used to decompose each frame into a subsampled directionally bandpass filtered hierarchy and is defined so that at each level there is an approximate correspondence between local translation and coefficient phase shift.
Abstract: This paper describes a new wavelet-based approach to the motion estimation problem for digital video. A complex-valued discrete wavelet transform is used to decompose each frame into a subsampled directionally bandpass filtered hierarchy. The transform is defined so that at each level there is an approximate correspondence between local translation and coefficient phase shift. This relationship is used to estimate motion within each orientation subband. The estimates are combined over all orientations and scales using a coarse-to-fine refinement strategy to produce a fractional-pel accurate motion field with a directional confidence measure. The technique is suitable for video compression schemes and can also be used for stereo vision and image registration.
TL;DR: In this article, a variable coefficient, non-separable spatio-temporal interpolation filter is used to deinterlace an interlaced video signal to produce a progressive video signal.
Abstract: A variable coefficient, non-separable spatio-temporal interpolation filter is used to deinterlace an interlaced video signal to produce a progressive video signal. The interlaced video signal is input to a video memory which in turn provides a reference and plurality of offset video signals representing the pixel being interpolated and spatially and temporally neighboring pixels. A coefficient index, transmitted with the interlaced video as an auxiliary signal, or derived from motion vectors transmitted with the interlaced video, or derived directly from the interlaced video signal, is applied to a coefficient memory to select a set of filter coefficients. The reference and offset video signals are weighted together with the filter coefficients in the spatio-temporal interpolation filter, such as a FIR filter, to produce an interpolated video signal. The interpolated video signal is interleaved with the reference video signal, suitably delayed to compensate for filter processing time, to produce the progressive video signal.
TL;DR: In this paper, a method and apparatus for performing multi-stage motion estimation on an input video sequence to be encoded is presented, where an original image in the video sequence, such as a CCIR601 image, is preprocessed to generate first, second and third reduced resolution images which may be a QQSIF image, a QSIF Image and a SIF Image, respectively, which are 1/64 size, 1/16 size, and 1/4 size, respectively.
Abstract: A method and apparatus for performing multi-stage motion estimation on an input video sequence to be encoded. An original image in the video sequence, such as a CCIR601 image, is preprocessed to generate first, second and third reduced resolution images which may be a QQSIF image, a QSIF image and a SIF image, respectively, which are 1/64 size, 1/16 size and 1/4 size, respectively, relative to the original CCIR601 image. A first stage motion vector search is performed on the 1/64 size QQSIF image using a (0,0) motion vector starting point and a first search range suitable for detecting global motion. A second stage motion vector search is performed on the 1/16 size QSIF image using the (0,0) starting point and a second search range smaller than the first search range and suitable for detecting local motion. A third stage motion vector search is performed on the 1/4 size SIF image using starting points based on scaled versions of the motion vectors identified in the first and second stage searches, and a search range smaller than the first and second search ranges. A fourth stage search is then performed on the original image using the motion vectors identified in the third search stage, and a motion compensation type for the original image is determined based on the results of the fourth stage search. An early field/frame decision may be made prior to the performing the fourth stage search, and may be based on a comparison of motion vectors from the first and second stage searches.
TL;DR: In this paper, a method (100, 200) and device (300, 400, 600) for containing and concealing errors occurring in a transmitted video bitstream is disclosed.
Abstract: A method (100, 200) and device (300, 400, 600) for containing and concealing errors occurring in a transmitted video bitstream is disclosed. Using a plurality of predetermined scanning patterns (500), particular macroblocks are chosen per frame to be transmitted as intra-macroblocks. Degradation of visual quality that is due to an extended error burst is efficiently limited (110). Concealment of areas within a video sequence that are affected by short error bursts and/or random errors is achieved by estimating lost macroblock information using remaining uncorrupted macroblocks (112, 200). For each such lost macroblock, predictions of the intensity information are generated using candidate motion vectors from selected uncorrupted neighboring macroblocks (208), and the candidate motion vector that produces the least mean-squared difference between luminance values at the boundary between the predicted macroblock and the neighboring macroblocks is selected for the concealment (210).
TL;DR: In this paper, video data is divided into a number of frames and each frame is then divided into macroblocks, and if the dependency count for the macroblock exceeds or is equal to a threshold, the dependency block is intra-coded using low bit rate coding algorithm.
Abstract: A method and apparatus for compressing video data to improve its tolerance to error, especially with a low bit rate network. With this invention video data is divided into a number of frames and each frame is divided into a number of macroblocks. A dependency count is assigned to each macroblock, and if the dependency count for the macroblock exceeds or is equal to a threshold, the dependency block is intra-coded using low bit rate coding algorithm. If the dependency count is below the threshold, the macroblock is inter-coded.
TL;DR: A new approach to detect shot changes for video segmentation is proposed, which is based on the processing of MPEG compressed video data, which takes advantage of the information implied in the compressed data.
Abstract: Video segmentation is an elementary operation for video index construction. A video sequence is usually decomposed into several basic meaningful segments. We propose a new approach to detect shot changes for video segmentation, which is based on the processing of MPEG compressed video data. This approach takes advantage of the information implied in the compressed data. The reference ratios among video frames are analyzed to determine their similarities. A shot change is detected if the similarity degrees of a frame and its adjacent frames are low. A function is used to quantize the results into the shot change probabilities. Considering the motion variations of video contents between frames, a conversion function is designed to increase the correctness of the shot change detection. The experimental results show the performance of our approach.