TL;DR: A twin-comparison approach has been developed to solve the problem of detecting transitions implemented by special effects, and a motion analysis algorithm is applied to determine whether an actual transition has occurred.
Abstract: Partitioning a video source into meaningful segments is an important step for video indexing. We present a comprehensive study of a partitioning system that detects segment boundaries. The system is based on a set of difference metrics and it measures the content changes between video frames. A twin-comparison approach has been developed to solve the problem of detecting transitions implemented by special effects. To eliminate the false interpretation of camera movements as transitions, a motion analysis algorithm is applied to determine whether an actual transition has occurred. A technique for determining the threshold for a difference metric and a multi-pass approach to improve the computation speed and accuracy have also been developed.
TL;DR: Since no compression hardware is needed for the PVC to encode and decode video data, the cost and complexity of developing multimedia applications, such as video phone and multimedia e-mail systems, can be greatly reduced.
Abstract: A novel software-based video compression algorithm, the Popular Video Coder (PVC), is presented in this paper, and a video phone system, the Popular Phone, is also implemented based on the PVC. The PVC simplifies the traditional video coder by removing the transform and the motion estimation parts and modifies the quantizer and entropy coder. Two novel coding algorithms, the adaptive quantizer and the modified windowed Huffman-like coder, are used in the PVC to encode the video data with a quality picture at a high compression ratio. The video quality of the proposed coder is as good as that of the MPEG coder when the input is a low-resolution and slow-motion video, and the computational complexity of the PVC is much lower than that of the Motion Picture Expert Group (MPEG). Since no compression hardware is needed for the PVC to encode and decode video data, the cost and complexity of developing multimedia applications, such as video phone and multimedia e-mail systems, can be greatly reduced. Furthermore, some networking issues, such as error control and flow control, are discussed in connection with applying the PVC to implement the Popular Phone.
TL;DR: A motion-adaptive variable-bit-rate (VBR) video codec is considered, and a motion-classified model is developed to represent the characteristics of various classes of motion activities, including scene changes, which captures the motion of various video scenes through a first-order autoregressive process with time-varying parameters.
Abstract: A motion-adaptive variable-bit-rate (VBR) video codec is considered, and a motion-classified model is developed to represent the characteristics of various classes of motion activities, including scene changes. The codec switches between interframe, motion-compensated, and intraframe coding corresponding to low, medium, and high amounts of motion and scene changes, respectively. The model captures the motion of various video scenes by providing the statistics of VBR-coded video traffic through a first-order autoregressive process with time-varying parameters. The parameters of this model are obtained from a VBR-coded sample video sequence with the objective of matching the bit-rate distribution and the autocorrelation among the bit rates. The validity and accuracy of the model are evaluated, and the characteristics of aggregated traffic sources obtained with the model are discussed. >
TL;DR: In this paper, an approach for reproducing video data from a record medium on which is recorded, in multiplexed form, video data, reference time data representing a reference time, and video time data represented the time at which decoding of the video data reproduced from the record medium should begin is presented.
Abstract: Apparatus for reproducing video data from a record medium on which is recorded, in multiplexed form, video data, reference time data representing a reference time, and video time data representing the time at which decoding of the video data reproduced from the record medium should begin The reference time data is separated from the reproduced multiplexed data and used to generate timing data The video data and video time data are temporarily stored in a video buffer and a video time data extractor is connected to the output of the video buffer to extract the video time data from the contents of the video buffer The video buffer also is connected to a video decoder which decodes the video data temporarily stored in the video buffer, the operation of the video decoder being controlled as a function of a comparison between the generated timing data and the extracted video time data
TL;DR: In this article, a method for coding an input video signal with a field rate of 60 Hz derived from a motion picture film source using 2-3 pulldown was proposed. But this method was not suitable for video streaming.
Abstract: A method for coding an input video signal with a field rate of 60 Hz derived from a motion picture film source using 2-3 pulldown. In the method, duplicate fields are detected in the input video signal. Each duplicate field is eliminated from the input video signal to produce a progressive video signal comprising plural frames with a frame rate of 24 Hz. Finally, the progressive video signal is coded to produce a coded video signal. Preferably, when a duplicate field is detected in the input video signal, a control signal is generated in response to each detected duplicated field. Each control signal is then included in the coded video signal.
TL;DR: In this article, a motion estimator identifies a motion block within several video frames and determines an approximate velocity vector using a trimmed square estimation procedure, which is applied to each pixel in the motion block based on the velocity vector in order to determine a revised trajectory for the pixels.
Abstract: In a video signal noise reduction system, image pixels are tracked across multiple frames and then averaged to produce respective noise reduced pixel values. A motion estimator identifies a motion block within several video frames and determines an approximate velocity vector using a trimmed square estimation procedure. Trajectory correction is applied to each pixel in the motion block based on the velocity vector in order to determine a revised trajectory for the pixels. This correction is accomplished by determining a difference in position of a motion block between successive video frames. Based upon this revised trajectory, appropriate pixels corresponding to the motion block are obtained. These pixels are used in conjunction with pixel values obtained from each of the processed frames to produce an averaged video frame. Each pixel of the averaged video frame replaces the corresponding pixel in the original frame if the difference between the original pixel and the corresponding averaged pixel is less than the median difference between all of the original and averaged pixels.
TL;DR: In this paper, an input frame to be motion compensated is partitioned into smaller blocks of pixel data and the sub-pixel resolution block is subjected to motion estimation and compensation to obtain a predicted block which will then be coded to obtain coded bit stream.
Abstract: An input frame to be motion compensated is partitioned into smaller blocks of pixel data. First, motion estimation is performed on each block in the full-pixel resolution. Then, the full-pixel resolution motion vector is refined to half-pixel accuracy by searching the surrounding half-pixel resolution blocks in the vertical and horizontal position with respect to the full-pixel resolution motion vector. An absolute magnitude of a horizontal component of the full-pixel resolution motion vector is examined to see if it is greater or less than a predetermined threshold to detect any significant movement. If the horizontal component absolute magnitude is less than the threshold, a frame-based interpolation will be used for forming the sub-pixel resolution block. If the horizontal component absolute magnitude is greater than the threshold, a field-based interpolation will be used instead. The sub-pixel resolution block is subjected to motion estimation and compensation to obtain a predicted block which will then be coded to obtain a coded bit stream.
TL;DR: In this article, the amount of information present in each video frame is determined by compressing the video in accordance with a spatial algorithm, such as JPEG, and the information is analyzed on a frame-by-frame basis, providing information identifying the presence of scene changes.
Abstract: Scene changes in a video sequence are detected by generating data representing the amount of information present in each video frame. This information is processed in such a way that significant changes in the information content are identified as positions where scene changes are likely to occur. The amount of information present in each video frame is determined by compressing the video in accordance with a spatial algorithm, such as JPEG. Under such compression techniques, the amount of data present after compression will vary, depending upon the amount of information present in the original scenes. Thus, for each frame, information is available identifying the amount of data present in the compressed video. This information is analysed on a frame-by-frame basis, providing information identifying the presence of scene changes. In a facility for editing the video, or editing audio for synchronisation against a video track, the scene change information may be displayed to an operator. In a security system, the occurrence of a scene change may be identified as an intruder entering an area observed by a camera. On detecting an intruder, in this way, various measures may be taken, such as activating a video tap recorder.
TL;DR: It is noted that this is the first single-chip solution proposed that is capable of processing NTSC-resolution video in real time in 40-MHz operation.
Abstract: A motion estimation processor compatible with CCITT H.261 and MPEG (Moving Pictures Experts Group) standards is described. A half-pel precision processing unit with an exhaustive block matching unit for integer-pel precision search is introduced. The necessary processing power for the exhaustive block matching is implemented with a one-dimensional array structure using a subsampling technique. The problem of communication bandwidth to the frame memory, which is a bottleneck of half-pel precision motion estimation, is solved by introducing a candidate pixel buffer in the interprocessor data transfer. It is noted that this is the first single-chip solution proposed that is capable of processing NTSC-resolution video in real time in 40-MHz operation.
TL;DR: In this paper, the signal for the current frame is weightedly averaged with signals for a future and prior frame, where the future frames are given less weight as they differ more from the current frames.
Abstract: Motion video is represented by digital signals. The digital signals can be compressed by coding to reduce bitspace. Noise in the signal, however, reduces the efficiency of coding. The present invention is a system and method for reducing noise in video signals by filtering. The signal for the current frame is weightedly averaged with signals for a future and prior frame. The future and prior frames are given less weight as they differ more from the current frame. When motion compensation information is available, the motion compensated future and prior frames can be used for averaging, further improving filtering.
TL;DR: It is shown that while due to cell loss for a single source loosely packed cells result in a smaller number of lost macroblocks, in a multiplexed network closely packed cells have a marginally better performance.
Abstract: The output from an H.261 or MPEG video coder consists of macroblocks of differing lengths. Two methods of packing the macroblocks into ATM cells are considered, depending on whether or not part of an uncompleted macroblock may be carried over to the next cell. It is shown that while due to cell loss for a single source loosely packed cells result in a smaller number of lost macroblocks, in a multiplexed network closely packed cells have a marginally better performance. >
TL;DR: An apparatus for expanding a compressed digital video signal representing a motion picture to provide a digital video output signal is described in this article, which includes a frame memory comprising no more than four pages, each page storing one frame.
Abstract: An apparatus for expanding a compressed digital video signal representing a motion picture to provide a digital video output signal. The compressed digital video signal comprises plural interlaced frames with a frame rate of 24 Hz. The digital video output signal comprises plural pictures with a picture rate of at least 49 Hz. The apparatus includes a frame memory comprising no more than four pages, each page storing one frame. The apparatus also includes an expander for expanding the compressed digital video signal to derive a reconstructed interlaced frame from each frame of the compressed digital video signal. Finally, the apparatus includes a controller that controls writing of each reconstructed interlaced frame into one page of the frame memory. The controller also controls reading out of the reconstructed interlaced frames stored in the pages of the frame memory to provide the pictures of the digital video output signal. Reading out is controlled to effect 2-3 pull down conversion of the reconstructed interlaced frames stored in the frame memory with a frame rate of 24 Hz to provide the pictures of the digital video output signal with a picture rate of at least 49 Hz.
TL;DR: A criterion which controls the motion estimation process in order to optimize its performance is obtained, and this criterion is applied to the split procedure of an adaptive multigrid block matching technique.
Abstract: Motion estimation and compensation techniques are widely used in video coding. This paper addresses the problem of the trade-off between the motion and the prediction error information. Under some realistic hypotheses, the transmission cost of these two components can be estimated. Therefore, we obtain a criterion which controls the motion estimation process in order to optimize its performance. As a particular application, this criterion is applied to the split procedure of an adaptive multigrid block matching technique. Simulation results are presented, showing the significant improvements due to the method.
TL;DR: Numbers and subjective tests confirm that various adaptations of an adaptive frame/field motion-compensated video coding scheme provide significant improvement as compared to purely MPEG-1 based coding.
Abstract: The second phase of the Motion Pictures Experts Group (MPEG-2) activity is in progress and is primarily aimed at coding of high resolution video with high quality at bit-rates of 4 to 9 Mbit/s. In addition, this phase is also required to address many issues including forward and backward compatibility with the first phase (MPEG-1) standard. For MPEG-2, an adaptive frame/field motion-compensated video coding scheme is proposed. This scheme builds on the proven framework of DCT and motion-compensation based techniques already optimized in MPEG-1 for coding of lower resolution video at low bit-rates. Various adaptations include techniques to improve efficiency of coding for interlaced video source as well as improving quality by better exploitation of characteristics of the video scenes. Statistics and subjective tests confirm that these adaptations provide significant improvement as compared to purely MPEG-1 based coding. We then discuss issues of compatibility with the MPEG-1 standard and of implementation complexity of the proposed scheme.
TL;DR: In this article, a method of motion-compensated video processing such as standards conversion, uses motion vectors assigned on a pixel-by-pixel basis, where pixels of an input video signal are written to locations in a video store (24) determined by the motion vector assigned to that pixel.
Abstract: Method of motion-compensated video processing such as standards conversion, uses motion vectors assigned on a pixel-by-pixel basis. The pixels of an input video signal are written to locations in a video store (24) determined by the motion vector assigned to that pixel. Multiple vectors can address a single written pixel to enable mixing of backward and forward vectors. A confidence value governs the accumulation of pixels and the later interpolation between motion compensated fields.
TL;DR: It is observed that the model captures the coded video behavior for each block type and their combination reasonably well and can be used to further study the cell-generation process of full-motion video codecs and the aggregation of such video sources at the statistical multiplexers.
Abstract: The statistical characteristics of full-motion video sources using motion-adaptive variable-bit-rate coding techniques is studied. Analytical models are developed to describe the behavior of the coded video signals based on the encoder structure. The video-compression algorithm used is in compliance with the general MPEG syntax and bit-stream definition. Statistical characteristics associated with each block type and their aggregate are presented. A composite model to represent the number of bits per field for the encoded video traffic that comprises multiple autoregressive models for the number of blocks per field and the number of bits in each coded block is derived. The statistics measured from a sample video sequence are compared to those obtained by the model and it is observed that the model captures the coded video behavior for each block type and their combination reasonably well. This model can be used to further study the cell-generation process of full-motion video codecs and the aggregation of such video sources at the statistical multiplexers. >
TL;DR: In this article, a method for estimation of global error motion vectors, which represent unwanted global picture instabilities in a picture sequence in digital video signals, is presented, using a motion vector estimator with an adaptively variable measuring time distance, as well as spatial processing and temporal processing, in order to estimate a sequence of global motion vectors.
Abstract: A method for estimation of global error motion vectors, which represent unwanted global picture instabilities in a picture sequence in digital video signals. The method uses a motion vector estimator with an adaptively variable measuring time distance, as well as spatial processing and temporal processing, in order to estimate a sequence of global motion vectors, from which a sequence of global error motion vectors is separated in order to allow stabilization of a picture sequence in digital video signals.
TL;DR: A 10.5-GOPS video encoder chip is described which implements CCITT H.261, P*64, and MPEG (Motion Picture Experts Group) (P-frame) encoding algorithms at rates up to 30 frames/s with a resolution of up to 352*288 pixels per frame (CIF format).
Abstract: A 10.5-GOPS video encoder chip is described which implements CCITT H.261, P*64, and MPEG (Motion Picture Experts Group) (P-frame) encoding algorithms (including exhaustive motion estimation) at rates up to 30 frames/s with a resolution of up to 352*288 pixels per frame (CIF format). The chip accepts input video through either a video bus or a 16-b host bus and produces the final encoded bit stream in its output FIFO. A completely self-contained and glueless interface in this chip makes it possible to directly connect it to industry standard DRAM chips (1 MB) needed for frame store. The block diagram of the encoder chip is shown, and the characteristics of the major modules are listed. >
TL;DR: A general optimization approach to the problem of lowest-distortion coded video subject to some constraints on delay, rate, and buffer conditions is presented and a certain formulation of the optimization objective is motivated.
Abstract: A video encoder has the task of producing lowest-distortion coded video subject to some constraints on delay, rate, and bufferconditions. We present a general optimization approach to this problem in a framework of delayed coding and we motivatea certain formulation of the optimization objective. Two forms of distortion measures are considered, namely, the maximumdistortion and the total distortion, each defined over a segment of the video to be coded. These distortion measures are chosenfor their mathematical tractability and practical importance. A solution (computational algorithm) for each case is described.Subject to some conditions, the solutions may be suboptimal. Simulation results show an improved performance with thisapproach compared to a simple typical approach which varies the quantization scale linearly with the encoder buffer level. 1. INTRODUCTION The video coding problem addressed here can he described with the help of Fig. 1 . In the figure, ii isthe index for"video units" where each video unit is some subset of the video sequence being coded. For example, for coders compliantto CCITT's Recommendation
TL;DR: A 450-MOPS video decoder that decompresses both H.261 and MPEG (Motion Picture Experts Group) compressed video streams is described, which features a mix of dedicated hardware functions and programmable processors.
Abstract: A 450-MOPS video decoder that decompresses both H.261 and MPEG (Motion Picture Experts Group) compressed video streams is described. The decoder accepts bit rates up to 4 Mb/s and provides decoded frames of up to 352*288 pixels (CIF) at up to 30 frame/s operating at 45 MHz. The decoder places no restrictions on the H.261 bit streams. It decodes any combination of intra and predictive frames in QCIF or CIF format. In MPEG mode, it decodes any stream conforming to the MPEG constrained parameters, including any combination of intra, predictive, and bidirectional frames with half-pixel motion vectors. The architecture features a mix of dedicated hardware functions and programmable processors. The design methodology used for the decoder included extensive high-level modeling at two levels: a C++ behavioral model and a set of clock-cycle-accurate C models at the block level. >
TL;DR: A video coding algorithm which combines the high visual quality of hybrid motion-compensated transform-based video coding techniques with the functional advantages of scalable, multi-resolution video is described.
Abstract: In this paper, we describe a video coding algorithm which combines the high visual quality of hybrid motion-compensated transform-based video coding techniques with the functional advantages of scalable, multi-resolution video. The technique produces a hierarchical video data representation by incorporating a simple frequency domain pyramid in a hybrid motion-compensated prediction/discrete cosine transform video coding algorithm. Compared to a single-layer hybrid scheme, this method has a very low penalty in coding efficiency and code complexity.
TL;DR: In this paper, feedback is introduced between a video CODEC and the intended communications channel such that the characteristics of the channel are used to drive multiple video output buffers, sharing an original temporal video reference, but having different subsequent temporal video images.
Abstract: Feedback is introduced between a video CODEC and the intended communications channel such that the characteristics of the channel are used to drive multiple video output buffers. These multiple output buffers share an original temporal video reference, but have different subsequent temporal video images. The communications channel interface then picks the subsequent video image buffer that best matches the current conditions experienced by it. By using a predictor of the channel performance, the video algorithm can be tuned to provide video output buffers with the best guess of how the buffers should be configured. A number of subsequent histories of an image are buffered until the receiving channel indicates it is ready to receive the next. Then the appropriate output buffer having the corresponding temporal change in the video is used to supply the next frame change information to the receiving station.
TL;DR: In this paper, a video processor system has separate and independent video processors for performing a variety of video processor functions required for encoding and decoding video signals, each of the separate video processors performs its own individual set of video processors functions.
Abstract: A video processor system has separate and independent video processors for performing a variety of video processor functions required for encoding and decoding video signals. Each of the separate video processors performs its own individual set of video processor functions. During the encode process the first video processor performs motion estimation to provide motion estimation information which it applies to the second video processor. The second video processor receives the motion estimation information and performs forward and inverse discrete cosine transforms, quantization and dequantization, frame addition and frame differencing, as well as run length encoding. The run length encoding operation produces run/value pairs which are then applied to the first video processor. The first video processor performs variable length encoding upon the run/value pairs. During the decoding process the first video processor performs a variable length decode and applies the variable length decoded data to the second video processor. The second video processor performs run length decoding, dequantization, inverse discrete cosine transforms and frame addition according to the received variable length data. The inverse transformed data produced by this operation is then applied to the first video processor.
TL;DR: In this paper, a method of processing a digital video signal to derive motion vectors representing motion between successive fields or frames of the video signal comprises compares the contents of blocks of pixels in a first field or frame of a video signal with the contents in a plurality of blocks in a following field or frames, and produces for each block in the first field and frame a correlation surface representing the difference between the contents so compared in the two fields and frames.
Abstract: A method of processing a digital video signal to derive motion vectors representing motion between successive fields or frames of the video signal comprises compares the contents of blocks of pixels in a first field or frame of the video signal with the contents of a plurality of blocks of pixels in a following field or frame, and produces for each block in the first field or frame a correlation surface representing the difference between the contents so compared in the two fields or frames. A grown correlation surface is produced for each block in the first field or frame by weighting the correlation surfaces for that block and a plurality of other blocks in an area around that block so as to accentuate features of the correlation surface for that block relative to those for the other blocks, and summing the weighted correlation surfaces. From each grown correlation surface, a motion vector is derived representing the motion of the content of the corresponding block between the two frames in dependence upon a minimum difference value represented by the grown correlation surface.
TL;DR: In this article, a video signal is converted to digital form and the data of sequential frames of the signal are arranged in a plurality of blocks of pixel data that numerically represent visual characteristics of the respective pixels of the frame image.
Abstract: In a method and apparatus for producing a signal for transmission to a receiver, a video signal is converted to digital form and the data of sequential frames of the signal are arranged in a plurality of blocks of pixel data that numerically represent visual characteristics of the respective pixels of the frame image. Each block is further organized as a matrix of pixel data. The pixel data of the blocks of a "previous" video frame, and a current video frame, are stored in a memory. A row of each block of the current video signal is compared with the corresponding row of the previous video frame, and a list is made of blocks in which the averages of the pixel data exceed a predetermined threshold. The pixel data of the listed blocks with lossy compression, and is encoded for transmission along with high definition data of a predetermined number of blocks of unchanged data. The data of the "previous" frame stored in memory is updated in memory, to continually store a replica of an image that corresponds to the image that should be currently stored in the receiving station.
TL;DR: In this article, motion vectors from one video frame to another are detected by segmenting a present frame of video data into plural blocks and then comparing a block in the present frame to a corresponding block in a preceding frame to detect rotational and zoom movement of the present block relative to the preceding block, in addition to rectilinear movement.
Abstract: Motion vectors from one video frame to another are detected by segmenting a present frame of video data into plural blocks and then comparing a block in the present frame to a corresponding block in a preceding frame to detect rotational and zoom movement of the present block relative to the preceding block, in addition to rectilinear movement.
TL;DR: In this paper, the block having the minimum absolute error is found by calculating absolute errors of 1024 (64×16) blocks in the search area and the horizontal and vertical positions of the found block are the motion vectors to be obtained.
Abstract: A method of and an apparatus for motion estimation of video data in a HDTV, capable of providing a block matching algorithm with real time processing. Even when the full search with a complex in hardware is implemented, absolute errors output from blocks are processed in parallel. 16 processor elements are used for calculating absolute errors of 8×8 blocks, so that 16 absolute errors can be calculated for every clock. The block having the minimum absolute error is found by calculating absolute errors of 1024 (64×16) blocks in the search area. The horizontal and vertical positions of the found block are the motion vectors to be obtained. As 16 absolute errors are calculated for every clock, it is possible to accomplish the real time processing in that each block has the size of 64 (8×8) pixels and thus the motion vectors can be found after 64 clocks. Accordingly, motion vectors of video data with a very large amount of information, for example, in HDTVs can be estimated in real time. The search area may be designed to be defined by predetermined values within the range from (-8, +7) to (-64, +63) in horizontal and/or vertical direction.
TL;DR: In this article, a block-by-block matching technique is used to obtain motion vectors for a given block by matching the search block from the current frame against its corresponding search region in the preceding frame.
Abstract: Motion vectors are obtained, using a known block matching technique which is based on a block-by-block processing between a current and a preceding frames. Whenever an equivalent value to the minimum absolute value of difference which has the earliest priority is found during the block matching, the closer of the two vectors derived therefrom to a predetermined motion vector is chosen. The closest vector which has survived the process of matching the search block from the current frame against its corresponding search region in the preceding frame is assigned as the motion vector of the given block and stored in a vector memory as a predetermined motion vector to repeat the search process for a subsequent block in the current frame.
TL;DR: In this paper, a method of processing an input 60 field/second video signal generated by 3232 pulldown to produce an output video signal, comprises producing from the input signal a series of progressive scan format frames, each frame corresponding to a respective one of the input fields, and comparing blocks of pixels in each progressive scan frame with blocks in the following frame to derive motion vectors representing the motion of the content of respective blocks between frames.
Abstract: A method of processing an input 60 field/second video signal generated by 3232 pulldown to produce an output video signal, comprises producing from the input signal a series of progressive scan format frames, each frame corresponding to a respective one of the input fields, and comparing blocks of pixels in each progressive scan frame with blocks of pixels in the following frame to derive motion vectors representing the motion of the content of respective blocks between frames. The motion vectors are utilized to monitor the field sequence of the input signal, and fields or frames of the output video signal are produced using input fields or progressive scan frames selected in dependence upon the field sequence of the input signal, at least some of the output fields or frames being produced by motion compensated temporal interpolation utilizing the motion vectors.
TL;DR: In this article, a motion compensated video signal processing apparatus comprises a subsampler for horizontally subsampling an input digital video signal, a motion vector processor for generating motion vectors from the subsampled video signal; and a motion compensation video processor for processing the input video signal according to the motion vectors.
Abstract: Motion compensated video signal processing apparatus comprises a subsampler for horizontally subsampling an input digital video signal, to generate a subsampled video signal; a motion vector processor for generating motion vectors from the subsampled video signal; and a motion compensated video processor for processing the input digital video signal according to the motion vectors, to generate an output digital video signal.