TL;DR: New techniques to detect and analyze periodic motion as seen from both a static and a moving camera are described and the periodicity is analyzed robustly using the 2D lattice structures inherent in similarity matrices.
Abstract: We describe new techniques to detect and analyze periodic motion as seen from both a static and a moving camera. By tracking objects of interest, we compute an object's self-similarity as it evolves in time. For periodic motion, the self-similarity measure is also periodic and we apply time-frequency analysis to detect and characterize the periodic motion. The periodicity is also analyzed robustly using the 2D lattice structures inherent in similarity matrices. A real-time system has been implemented to track and classify objects using periodicity. Examples of object classification (people, running dogs, vehicles), person counting, and nonstationary periodicity are provided.
TL;DR: This paper presents a fast and simple method using a timed motion history image (tMHI) for representing motion from the gradients in successively layered silhouettes, and demonstrates the approach with recognition of waving and overhead clapping motions to control a music synthesis program.
Abstract: This paper uses a simple method for representing motion in successively layered silhouettes that directly encode system time termed the timed Motion History Image (tMHI). This representation can be used to both (a) determine the current pose of the object and to (b) segment and measure the motions induced by the object in a video scene. These segmented regions are not "motion blobs", but instead motion regions naturally connected to the moving parts of the object of interest. This method may be used as a very general gesture recognition "toolbox". We use it to recognize waving and overhead clapping motions to control a music synthesis program.
TL;DR: In this article, a method and apparatus for image compression using temporal and resolution layering of compressed image frames is proposed, which allows a form of modularized decomposition of an image that supports flexible application of a variety of image enhancement techniques.
Abstract: A method and apparatus for image compression using temporal and resolution layering of compressed image frames. In particular, layered compression allows a form of modularized decomposition of an image that supports flexible application of a variety of image enhancement techniques. Further, the invention provides a number of enhancements to handle a variety of video quality and compression problems. Most of the enhancements are preferably embodied as a set of tools which can be applied to the tasks of enhancing images and compressing such images. The tools can be combined by a content developer in various ways, as desired, to optimize the visual quality and compression efficiency of a compressed data stream, particularly a layered compressed data stream. Such tools include improved image filtering techniques, motion vector representation and determination, de-interlacing and noise reduction enhancements, motion analysis, imaging device characterization and correction, an enhanced 3-2 pulldown system, frame rate methods for production, a modular bit rate technique, a multi-layer DCT structure, variable length coding optimization, an augmentation system for MPEG-2 and MPEG-4, and guide vectors for the spatial enhancement layer.
TL;DR: A learning approach to model the hand configuration space directly based on the linear behavior observed in the real motion data collected by a CyberGlove, and it is shown that the proposed model is helpful for capturing articulated motion.
Abstract: Hand motion capture is one of the most important parts of gesture interfaces. Many current approaches to this task generally involve a formidable nonlinear optimization problem in a large search space. Motion capture can be achieved more cost-efficiently when considering the motion constraints of a hand. Although some constraints can be represented as equalities or inequalities, there exist many constraints which cannot be explicitly represented. In this paper, we propose a learning approach to model the hand configuration space directly. The redundancy of the configuration space can be eliminated by finding a lower-dimensional subspace of the original space. Finger motion is modeled in this subspace based on the linear behavior observed in the real motion data collected by a CyberGlove. Employing the constrained motion model, we are able to efficiently capture finger motion from video inputs. Several experiments show that our proposed model is helpful for capturing articulated motion.
TL;DR: A new methodology to allow image mosaicing in more general cases of camera motion is presented, performed by projecting thin strips from the images onto manifolds which are adapted to the camera motion.
Abstract: Image mosaicing is commonly used to increase the visual field of view by pasting together many images or video frames. Existing mosaicing methods are based on projecting all images onto a predetermined single manifold: A plane is commonly used for a camera translating sideways, a cylinder is used for a panning camera, and a sphere is used for a camera which is both panning and tilting. While different mosaicing methods should therefore be used for different types of camera motion, more general types of camera motion, such as forward motion, are practically impossible for traditional mosaicing. A new methodology to allow image mosaicing in more general cases of camera motion is presented. Mosaicing is performed by projecting thin strips from the images onto manifolds which are adapted to the camera motion. While the limitations of existing mosaicing techniques are a result of using predetermined manifolds, the use of more general manifolds overcomes these limitations.
TL;DR: A novel approach for estimating articulated body posture and motion from monocular video sequences is proposed, characterized using real and artificially generated body postures, showing promising results.
Abstract: A novel approach for estimating articulated body posture and motion from monocular video sequences is proposed. Human pose is defined as the instantaneous two dimensional configuration (i.e. the projection onto the image plane) of a single articulated body in terms of the position of a predetermined sets of joints. First, statistical segmentation of the human bodies from the background is performed and low-level visual features are found given the segmented body shape. The goal is to be able to map these generally low level visual features to body configurations. The system estimates different mappings, each one with a specific cluster in the visual feature space. Given a set of body motion sequences for training, unsupervised clustering is obtained via the Expectation Maximization algorithm. For each of the clusters, a function is estimated to build the mapping between low-level features to 2D pose. Given new visual features, a mapping from each cluster is performed to yield a set of possible poses. From this set, the system selects the most likely pose given the learned probability distribution and the visual feature of the proposed approach is characterized using real and artificially generated body postures, showing promising results.
TL;DR: This work presents methods of motion analysis and algorithms for automatic camera control that mimic the actions of a human operator, using inexpensive and widely available hardware.
Abstract: We describe computationally and materially inexpensive methods for panoramic video imaging. Digitally combining images from an array of inexpensive video cameras results in a wide-field panoramic camera, from inexpensive off-the-shelf hardware. We present methods that both correct lens distortion and seamlessly merge images into a panoramic video image. Electronically selecting a region of this results in a rapidly steerable "virtual camera". Because the camera is fixed with respect to the background, simple motion analysis can be used to track objects and people of interest. We present methods of motion analysis and algorithms for automatic camera control that mimic the actions of a human operator, using inexpensive and widely available hardware.
TL;DR: An algorithm is presented for simultaneously recovering dense scene shape and scene flow by carving away hexels, or points in the 6D space of all possible shapes and flows that are inconsistent with the images captures at either time instant, or across time.
Abstract: The motion of a non-rigid scene over time imposes more constraints on its structure than those derived from images at a single time instant alone. An algorithm is presented for simultaneously recovering dense scene shape and scene flow (i.e. the instantaneous 3D motion at every point in the scene). The algorithm operates by carving away hexels, or points in the 6D space of all possible shapes and flows that are inconsistent with the images captures at either time instant, or across time. The recovered shape is demonstrated to be more accurate than that recovered using images at a single time instant. Applications of the combined scene shape and flow include motion capture for animation, retiming of videos, and non-rigid motion analysis.
TL;DR: In this article, a tensor product of B-splines is used to describe the motion of the heart using a four-dimensional tensor-product of b-spline tensors.
Abstract: In MRI tagging, magnetic tags-spatially encoded magnetic saturation planes-are created within tissues acting as temporary markers. Their deformation pattern provides useful qualitative and quantitative information about the functional properties of underlying tissue and allows non-invasive analysis of mechanical function. The measured displacement at a given tag point contains only unidirectional information; in order to track the full 3D motion, these data have to be combined with information from other orthogonal tag sets over all time frames. Here, we provide a method to describe the motion of the heart using a four-dimensional tensor product of B-splines. In vivo validation of this tracking algorithm is performed using different tagging sets on the same heart. Using the validation results, the appropriate control point density was determined for normal cardiac motion tracking. Since our motion fields are parametric and based on an image plane based Cartesian coordinate system, trajectories or other derived values (velocity, acceleration, strains ...) can be calculated for any desired point within the volume spanned by the control points. This method does not rely on specific chamber geometry, so the motion of any tagged structure can be tracked. Examples of displacement and strain analysis for both ventricles are also presented.
TL;DR: In this article, a method for authoring video documents includes the steps of inputting video data to be processed, segmenting the video data into shots by identifying breaks between the shots, subdividing the shots into sub-shots using motion analysis to provide location information for motions of objects of interest, describing boundaries for the object of interest in video data, and creating an anchorable information unit file based on the boundaries of the objects to identify portions of video data.
Abstract: A method for authoring video documents includes the steps of inputting video data to be processed, segmenting the video data into shots by identifying breaks between the shots, subdividing the shots into subshots using motion analysis to provide location information for motions of objects of interest, describing boundaries for the objects of interest in the video data such that the objects of interest are represented by the boundaries in the shots and creating an anchorable information unit file based on the boundaries of the objects of interest such that objects of interest are used to identify portions of the video data. A system is also included.
TL;DR: In this paper, a pre-recorded video of a master's swing motion is stored as first frame sequences in computer memory, and target cues indicative of motion progress are associated with each first frame sequence.
Abstract: A pre-recorded video of a master's swing motion is stored as first frame sequences in computer memory. Target cues indicative of motion progress are associated with each first frame sequence. A video recording of the student performing the swing motion is stored in computer memory as second frame sequences. Reference cues indicating motion progress of the student are inserted into or associated with each student frame. The first frames are aligned with and normalized to the second frames, and then the first frames are synchronized to corresponding second frames using the target cues and the reference cues. The corresponding first and second frame pairs are superimposed, and immediately thereafter displayed to allow the student to analyze differences between his swing motion and the master's swing motion.
TL;DR: A novel approach to camera motion analysis is proposed to index videos compressed in MPEG-1 or MPEG-2, which fits the motion vectors in the MPEG stream into the two-dimensional affine model to detect basic camera operations automatically.
Abstract: A novel approach to camera motion analysis is proposed to index videos compressed in MPEG-1 or MPEG-2. Specifically, it fits the motion vectors in the MPEG stream into the two-dimensional affine model to detect basic camera operations automatically. The proposed approach involves (1) the construction of motion vector fields (MVFs) by normalizing the types of motion vectors and filtering out noise; and (2) the qualitative interpretation of camera motions from the estimated model parameters in two levels (frame and temporal segment). Fine segmentation can also be obtained for a video, based on the homogeneity of camera motion in each unit. The advantages of our method lie in its computational efficiency and robustness to noisy environments such as false motion vectors and object motion. The proposed approach is validated by an experiment with real compressed video sequences.
TL;DR: This paper presents an algorithmic approach to the problem of detecting independently moving objects in 3D scenes that are viewed under camera motion that employs both the fundamental constraints in an algorithm that does not demand a priori availability of correspondences or flow.
Abstract: This paper presents an algorithmic approach to the problem of detecting independently moving objects in 3D scenes that are viewed under camera motion. There are two fundamental constraints that can be exploited for the problem: 1) two/multiview camera motion constraint (for instance, the epipolar/trilinear constraint) and 2) shape constancy constraint. Previous approaches to the problem either use only partial constraints, or rely on dense correspondences or flow. We employ both the fundamental constraints in an algorithm that does not demand a priori availability of correspondences or flow. Our approach uses the plane-plus-parallax decomposition to enforce the two constraints. It is also demonstrated that for a class of scenes, called sparse 3D scenes in which genuine parallax and independent motions may be confounded, how the plane-plus-parallax decomposition allows progressive introduction, and verification of the fundamental constraints. Results of the algorithm on some difficult sparse 3D scenes are promising.
TL;DR: This paper describes the whole moving object extraction system with the general framework and component designs and shows their effectiveness with two test sequences.
Abstract: As the proliferation of compressed video sequences in MPEG formats continues, the ability to perform video analysis directly in the compressed domain becomes increasingly attractive. The availability of motion vectors and pixel values in coded forms can indirectly provide motion and intensity information for object analysis, avoiding the need to re-perform motion estimation. Albeit that the embedded motion field is contaminated with matching modeling errors and measurement errors, we will illustrate several motion field filtering and correction techniques to combat with noisy motion fields. We strive to reconstruct smooth true motion fields with a minimal amount of decoding, reducing computational resource and time requirement. In this paper, we describe the whole moving object extraction system with the general framework and component designs and show their effectiveness with two test sequences.
TL;DR: A novel system which integrates 3D scene flow and structure recovery in order to complement each other's performance, and does not assume rigidity of the scene motion, thus allowing for non-rigid motion in the scene.
Abstract: Scene flow is the 3D motion field of points in the world. Given N (N>1) image sequences gathered with a N-eye stereo camera or N calibrated cameras, we present a novel system which integrates 3D scene flow and structure recovery in order to complement each other's performance. We do not assume rigidity of the scene motion, thus allowing for non-rigid motion in the scene. In our work, images are segmented into small regions. We assume that each small region is undergoing similar motion, represented by a 3D affine model. Nonlinear motion model fitting based on both optical flow constraints and stereo constraints is then carried over each image region in order to simultaneously estimate 3D motion correspondences and structure. To ensure the robustness, several regularization constraints are also introduced. A recursive algorithm is designed to incorporate the local and regularization constraints. Experimental results on both synthetic and real data demonstrate the effectiveness of our integrated 3D motion and structure analysis scheme.
TL;DR: In this paper some applications of motion analysis are investigated for a compact panoramic optical system (panoramic annular lens) and algorithms which can analyze this low-resolution image to yield motion information for surveillance and smoke detection are developed.
Abstract: In this paper some applications of motion analysis are investigated for a compact panoramic optical system (panoramic annular lens). Panoramic image acquisition makes multiple or mechanically controlled camera systems needless for many applications. Panoramic annular lens' main advantage to other omnidirectional monitoring systems is that it is a cheap, small, compact device with no external hyperboloidal, spherical, conical or paraboloidal reflecting surface as in other panoramic optical devices. By converting the annular image captured with an NTSC camera to a rectangular one, we get a low-resolution (2.8 pixels/degrees horizontally and 3 pixels/degree vertically) image. We developed algorithms which can analyze this low-resolution image to yield motion information for surveillance and smoke detection.
TL;DR: A motion-based keyframe computing and selection strategy is proposed to compactly represent the content of shots and a scene change detection algorithm is presented by measuring the similarity of the representative keyframes in shots.
Abstract: We present a scheme for automatically partitioning videos into scenes. A scene is generally referred to as a group of shots taken at the same site. We first propose a motion annotation algorithm based on the analysis of spatiotemporal image volumes. The algorithm characterizes the motions within shots by extracting and analyzing the motion trajectories encoded in the temporal slices of image volumes. A motion-based keyframe computing and selection strategy is thus proposed to compactly represent the content of shots. With these techniques, we further present a scene change detection algorithm by measuring the similarity of the representative keyframes in shots.
TL;DR: A region-based approach using active contours to segment moving objects and a local motion estimation based on the level sets resulting from the segmentation process that is well adapted to track deformable objects is proposed.
Abstract: In this paper, we propose a new method for detecting and tracking moving objects using active contours. The first contribution of our work is to propose a region-based approach using active contours to segment moving objects. A criterion including constraints on the defined domains is introduced and embedded in a dynamical scheme. Discontinuities between regions are explicitly taken into account by performing the derivative of the criterion according to the distribution theory. A PDE-driven active contour is obtained and implemented with the level set method. The final segmentation provides the partition into regions that minimize the criterion and thus moving objects are detected. The second contribution of our work is to propose a local motion estimation based on the level sets resulting from the segmentation process. This local motion estimation is well adapted to track deformable objects. Finally, the method is evaluated on real sequences.
TL;DR: This paper presents a novel approach to detect and estimate distortion occurring in fingerprint video streams, and directly works on MPEG-{1, 2} encoded fingerprint video bitstreams to estimate interfield flow without decompression, and uses flow characteristics to investigate temporal behaviour of the fingerprints.
Abstract: Distortions in fingerprint images arising from the elasticity of finger skin and the pressure and movement of fingers during image capture lead to great difficulties in establishing a match between multiple images acquired from a single finger. In a single fingerprint image depicting a finger at some given instant of time, it is difficult to get any distortion information. Further, static two-dimensional or three-dimensional (electronic) copies of fingerprints can be fabricated and used to spoof remote biometric security systems since the input required by the systems is not a function of time. This paper addresses these issues, by proposing the novel use of fingerprint video sequences to investigate and exploit dynamic behaviors manifested by fingers over time during image acquisition. In particular, we present a novel approach to detect and estimate distortion occurring in fingerprint video streams. Our approach directly works on MPEG-{1, 2} encoded fingerprint video bitstreams to estimate interfield flow without decompression, and uses flow characteristics to investigate temporal behaviour of the fingerprints. The joint temporal and motion analysis leads to a novel technique to detect and characterize distortion reliably. The proposed method has been tested on the NIST 24 database and the results are very promising.
TL;DR: A robust technique for detecting nonstationary periodic motion from a moving and static camera and for discriminating motion symmetries (periodic motion classification), which applies to classifying running humans and canines.
Abstract: We describe a robust technique for detecting nonstationary periodic motion from a moving and static camera. We also describe a robust technique for discriminating motion symmetries (periodic motion classification), which we apply to classifying running humans (bipeds) and canines (quadrupeds). The system has been implemented to run in real-time (30 Hz) on standard PC workstations.
TL;DR: A technique has been developed to objectively quantify and visualize motion in the orbit during gaze using color-coding, which shows both magnitude and orientation of all flow vectors without cluttering.
Abstract: Orbital soft-tissue motion analysis aids in the localization and diagnosis of orbital disorders. A technique has been developed to objectively quantify and visualize motion in the orbit during gaze. T1-weighted MR volume sequences are acquired during gaze and soft-tissue motion is quantified using optical flow techniques. The flow field is visualized using color-coding: orientation of the flow vector is coded by hue and magnitude by saturation of the pixel. Current clinical circumstances limit MR image acquisition to short sequences and short acquisition times. The effect of these limitations on the performance of optical flow computation has been studied for four representative optical flow algorithms: on short (nine frames) and long (21 frames) simulated sequences of rotation of a magnetic resonance (MR) imaged object, on short measured MR sequences of controlled rotation of the same object and on short MR sequences of motion in the orbit. On the short simulated and motion-controlled sequences, the Lucas and Kanade algorithm showed the best performance with respect to both accuracy and robustness. These motion estimates were accurate to within 20%. Motion in the orbit ranged between 0.05 and 0.25 mm//spl deg/ gaze. Color-coding was found to be attractive as a visualization technique, because it shows both magnitude and orientation of all flow vectors without cluttering.
TL;DR: A model-based method to analyze the human walking motion using Hidden Markov Model (HMM) and posture patterns to describe the motion type and it is shown that this system not only analyzes the motion characteristics of the human body, but also recognizes the motiontype of the input image sequences.
TL;DR: Real-time human motion analysis based on real-time inverse kinematics, which can estimate human postures with limited perceptual cues such as positions of a head, hands and feet, is presented.
Abstract: The paper presents real-time human motion analysis based on real-time inverse kinematics. Our purpose is to realize a mechanism of human-machine interaction via human gestures, and, as a first step, we have developed a computer-vision-based human motion analysis system. In general, man-machine "smart" interaction requires a real-time human full-body motion capturing system without special devices or markers. However, since such a vision-based human motion capturing system is essentially unstable and can only acquire partial information because of self-occlusion, we have to introduce a robust pose estimation strategy, or an appropriate human motion synthesis based on motion filtering. To solve this problem, we have developed a method based on inverse kinematics, which can estimate human postures with limited perceptual cues such as positions of a head, hands and feet. We outline a real-time and on-line human motion capture system and demonstrate a simple interaction system based on the motion capture system.
TL;DR: Four different localized-calibration methods developed based on the DLT (direct linear transformation) algorithm in an effort to reduce the error due to refraction in underwater motion analysis demonstrated the potential to minimize object space deformation.
Abstract: Four different localized-calibration methods were developed based on the DLT (direct linear transformation) algorithm in an effort to reduce the error due to refraction in underwater motion analysis. Their applicability in underwater motion analysis was assessed based on a simulated 3D calibration trial with 2 cameras and a hexahedral calibration frame. It was concluded from the analysis of the calibration results that (a) all methods substantially reduced the maximum reconstruction error and demonstrated the potential to minimize object space deformation, (b) localization methods based on overlapped control volumes/areas revealed superior performance than those based on distinct volumes/areas, and (c) the 2D DLT-based localization algorithm provided more accurate object space reconstruction than the 3D DLT-based algorithm.
TL;DR: A specialized motion analysis is suggested which provides an accurate, explicit model of the interpolated motion path of attentive tracking displays as well as apparent motion.
TL;DR: While estimators based on L1 are not robust in the breakdown point sense, experiments show that the proposed method is robust enough to allow accurate motion recovery over hundreds of consecutive frames.
TL;DR: The goal of tracking the user's fingertips as fast as possible in real time is adopted, so that the system could be compared with other input devices by using models such as Fitts' law.
Abstract: One trend in computing environments today is to move towards more 'natural' interaction. Another is to make hardware invisible to the user. Both these ideas converge into ubiquitous computing-the Digital Desk is an example of this idea. In this paper, we concentrate on an input device for the Digital Desk, namely the user's fingertip, which is made to act like a mouse. Tracking such an input device is common to a number of augmented reality environments and involves vision and motion analysis. However, previous attempts have focused more on the vision aspect of tracking general objects than on using the information already known about the user's hand, which is the approach taken in this paper. We adopted the goal of tracking the user's fingertips as fast as possible in real time, so that the system could be compared with other input devices by using models such as Fitts' law. Our system is shown to comply with the law adequately.
TL;DR: Experimental results confirm both the feasibility and the effectiveness of the proposed method, and an application example of the 3D human body posture estimation to a motion recognition system is presented.
Abstract: This paper proposes a new real-time method of estimating human postures in 3D form trinocular images. The proposed method extracts feature points of the human body by applying a type of function analysis to contours of human silhouettes. To overcome self-occlusion problems, dynamic compensation is carried out using the Kalman filter and all feature points are tracked. The 3D coordinates of the feature points are reconstructed by considering the geometrical relationship between the three cameras. Experimental results confirm both the feasibility and the effectiveness of the proposed method, and an application example of the 3D human body posture estimation to a motion recognition system is presented.
TL;DR: In this paper, a method for geometrically analyzing motion has been proposed, with the steps of: choosing a set of points having at least three individual points to define a single realization of a motion, sequentially collecting Cartesian coordinates of the sets of points at different times during the motion from a start point to an end point, and transforming the sets at the different times to a common coordinate system thereby defining a trajectory of the motion.
Abstract: A method for geometrically analyzing motion having the steps of: choosing a set of points having at least three individual points to define a single realization of a motion; sequentially collecting Cartesian coordinates of the set of points at different times during the motion from a start point to an end point; treating the collection of sets of points as a sample of the motion; and transforming the sets of points at the different times to a common coordinate system thereby defining a trajectory of the motion. In a preferred implementation of the method of the present invention, the method further has the steps of: choosing a set of points having at least three individual points to define a single realization of a motion; sequentially collecting Cartesian coordinates of the set of points at different times during the motion from a start point to an end point; treating the collection of sets of points as a sample of the motion; transforming the sets of points at the different times to a common coordinate system thereby defining a trajectory of the motion; and calculating elliptic Fourier coefficients describing the trajectory of the motion independent of any difference in the spacing of the different times.
TL;DR: In this method, an upper body orientation detection and a heuristic contour analysis are performed on the human silhouettes extracted from the trinocular images so that representative points such as the top of the head can be located.
Abstract: This paper proposes a new real-time method for estimating human postures in 3D from trinocular images. In this method, an upper body orientation detection and a heuristic contour analysis are performed on the human silhouettes extracted from the trinocular images so that representative points such as the top of the head can be located. The major joint positions are estimated based on a genetic algorithm-based learning procedure. 3D coordinates of the representative points and joints are then obtained from the two views by evaluating the appropriateness of the three views. The proposed method implemented on a personal computer runs in real-time. Experimental results show high estimation accuracies and the effectiveness of the view selection process.