TL;DR: Experimental results show that motion patterns of hand gestures can be extracted and recognized accurately using motion trajectories and applied to recognize 40 hand gestures of American Sign Language.
Abstract: We present an algorithm for extracting and classifying two-dimensional motion in an image sequence based on motion trajectories. First, a multiscale segmentation is performed to generate homogeneous regions in each frame. Regions between consecutive frames are then matched to obtain two-view correspondences. Affine transformations are computed from each pair of corresponding regions to define pixel matches. Pixels matches over consecutive image pairs are concatenated to obtain pixel-level motion trajectories across the image sequence. Motion patterns are learned from the extracted trajectories using a time-delay neural network. We apply the proposed method to recognize 40 hand gestures of American Sign Language. Experimental results show that motion patterns of hand gestures can be extracted and recognized accurately using motion trajectories.
TL;DR: In this paper, the authors developed a first-generation smart camera system that can detect people and analyze their movement in real-time, which is a leading edge application for embedded system research.
Abstract: Recent technological advances are enabling a new generation of smart cameras that represent a quantum leap in sophistication. While today's digital cameras capture images, smart cameras capture high-level descriptions of the scene and analyze what they see. These devices could support a wide variety of applications including human and animal detection, surveillance, motion analysis, and facial identification. Video processing has an insatiable demand for real-time performance. Smart cameras leverage very large-scale integration to meet this need in a low-cost, low-power system with substantial memory. Moving well beyond pixel processing and compression, these VLSI systems run a wide range of algorithms to extract meaning from streaming video. Recently, Princeton University researchers developed a first-generation smart camera system that can detect people and analyze their movement in real time. Because they push the design space in so many dimensions, these smart cameras are a leading-edge application for embedded system research.
TL;DR: An algorithm is developed that is capable of extracting motion elements and recombining them in novel ways, including the action performed and a motion signature that captures the distinctive pattern of movement of a particular individual.
Abstract: Human motion is the composite consequence of multiple elements, including the action performed and a motion signature that captures the distinctive pattern of movement of a particular individual. We develop an algorithm that is capable of extracting these motion elements and recombining them in novel ways. The algorithm analyzes motion data spanning multiple subjects performing different actions. The analysis yields a generative motion model that can synthesize new motions in the distinctive styles of these individuals. Our algorithms can also recognize people and actions from new motions by comparing motion signatures and action parameters.
TL;DR: The overall goal of the ongoing project is to develop methods for spatio-temporal analysis of relative motion within groups of moving point objects, such as GPS-tracked animals, using the analysis concept called REMO (RElative MOtion).
Abstract: The overall goal of the ongoing project is to develop methods for spatio-temporal analysis of relative motion within groups of moving point objects, such as GPS-tracked animals. Whereas recent efforts of dealing with dynamic phenomena within the GIScience community mainly concentrated on modeling and representation, this research project concentrates on the analytic task. The analysis is performed on a process level and does not use the traditional cartographic approach of comparing snapshots. The analysis concept called REMO (RElative MOtion) is based on the comparison of motion parameters of objects over time. Therefore the observation data is transformed into a 2.5-dimensional analysis matrix, featuring a time axis, an object axis and motion parameters. This matrix reveals basic searchable relative movement patterns. The current approach handles points in a pure featureless space. Case study data of GPS-observed animals and political entities in an ideological space are used for illustration purposes.
TL;DR: This paper presents a method to recover the full-motion (3 rotations and 3 translations) of the head using a cylindrical model and uses the iteratively re-weighted least squares (IRLS) technique in conjunction with the image gradient to deal with non-rigid motion and occlusion.
Abstract: This paper presents a method to recover the full-motion (3 rotations and 3 translations) of the head using a cylindrical model. The robustness of the approach is achieved by a combination of three techniques. First, we use the iteratively re-weighted least squares (IRLS) technique in conjunction with the image gradient to deal with non-rigid motion and occlusion. Second, while tracking, the templates are dynamically updated to diminish the effects of self-occlusion and gradual lighting changes and keep tracking the head when most of the face is not visible. Third, because the dynamic templates may cause error accumulation, we re-register images to a reference frame when head pose is close to a reference pose. The performance of the real-time tracking program was evaluated in three separate experiments using image sequences (both synthetic and real) for which ground truth head motion is known. The real sequences included pitch and yaw of as large as 40/spl deg/ and 75/spl deg/ respectively. The average recovery accuracy of the 3D rotations was found to be about 3/spl deg/.
TL;DR: A two-step detection/tracking method for pedestrian detection and tracking using a night vision video camera installed on the vehicle to deal with the nonrigid nature of human appearance on the road is proposed.
Abstract: This paper presents a method for pedestrian detection and tracking using a night vision video camera installed on the vehicle. To deal with the nonrigid nature of human appearance on the road, a two-step detection/tracking method is proposed. The detection phase is performed by a support vector machine (SVM) with size-normalized pedestrian candidates and the tracking phase is a combination of Kalman filter prediction and mean shift tracking. The detection phase is further strengthened by information obtained by a road detection module that provides key information for pedestrian validation. Experimental comparisons have been carried out on gray-scale SVM recognition vs. binary SVM recognition and entire body detection vs. upper body detection.
TL;DR: This paper shows how the multi-frame subspace constraints can be used for constraining the 2D correspondence estimation process itself, and shows that these constraints are valid not only for affine cameras, but also for a variety of imaging models, scene models, and motion models.
Abstract: When a rigid scene is imaged by a moving camera, the set of all displacements of all points across multiple frames often resides in a low-dimensional linear subspace. Linear subspace constraints have been used successfully in the past for recovering 3D structure and 3D motion information from multiple frames (e.g., by using the factorization method of Tomasi and Kanade (1992, International Journal of Computer Vision, 9:137–154)). These methods assume that the 2D correspondences have been precomputed. However, correspondence estimation is a fundamental problem in motion analysis. In this paper we show how the multi-frame subspace constraints can be used for constraining the 2D correspondence estimation process itself.
We show that the multi-frame subspace constraints are valid not only for affine cameras, but also for a variety of imaging models, scene models, and motion models. The multi-frame subspace constraints are first translated from constraints on correspondences to constraints directly on image measurements (e.g., image brightness quantities). These brightness-based subspace constraints are then used for estimating the correspondences, by requiring that all corresponding points across all video frames reside in the appropriate low-dimensional linear subspace.
The multi-frame subspace constraints are geometrically meaningful, and are {not} violated at depth discontinuities, nor when the camera-motion changes abruptly. These constraints can therefore replace {heuristic} constraints commonly used in optical-flow estimation, such as spatial or temporal smoothness.
TL;DR: The application of the video-based motion analysis system with surface markers to thumb kinematics is warranted and the similarities of the two different marker techniques throughout the motion cycle were high.
TL;DR: This paper proposes a new and fast FS motion estimation algorithm, which obtains faster elimination of inappropriate motion vectors using efficient matching units from localization of a complex area in image data and suggests two fast matching scan algorithms.
Abstract: To reduce the amount of computations for a full search (FS) algorithm for fast motion estimation, we propose a new and fast FS motion estimation algorithm. The computational reduction of our FS motion estimation algorithm comes from fast elimination of impossible motion vectors. We obtain faster elimination of inappropriate motion vectors using efficient matching units from localization of a complex area in image data. In this paper, we show three properties in block matching of motion estimation. We suggest two fast matching scan algorithms: one from adaptive matching scan and the other from fixed dithering order. Experimentally, we remove the unnecessary computations by about 30% with our proposed algorithm compared with the conventional fast FS algorithms.
TL;DR: The inverted distance transform of the edge map is used as an edge indicator function for contour detection and the problem of background clutter can be relaxed by taking the object motion into account.
Abstract: We propose a new method for contour tracking in video. The inverted distance transform of the edge map is used as an edge indicator function for contour detection. Using the concept of topographical distance, the watershed segmentation can be formulated as a minimization. This new viewpoint gives a way to combine the results of the watershed algorithm on different surfaces. In particular, our algorithm determines the contour as a combination of the current edge map and the contour, predicted from the tracking result in the previous frame. We also show that the problem of background clutter can be relaxed by taking the object motion into account. The compensation with object motion allows to detect and remove spurious edges in background. The experimental results confirm the expected advantages of the proposed method over the existing approaches.
TL;DR: An application-oriented solution which has proven accurate, reliable and efficient as demonstrated by experiments on numerous real situations is developed.
Abstract: The paper is concerned with the detection and tracking of obstacles from a camera mounted on a vehicle with a view to driver assistance. To achieve this goal, we have designed a technique entirely based on image motion analysis. We perform the robust estimation of the dominant image motion assumed to be due to the camera motion. Then by considering the outliers to the estimated dominant motion, we can straightforwardly detect obstacles in order to assist car driving. We have added to the detection step a tracking module that also relies on a motion consistency criterion. Time-to-collision is then computed for each validated obstacle. We have thus developed an application-oriented solution which has proven accurate, reliable and efficient as demonstrated by experiments on numerous real situations.
TL;DR: In this article, a generative method combining statistical models and algorithms from both texture and motion analysis is proposed to detect textured motion patterns in natural scenes, such as falling snow, raining, flying birds, firework and waterfall.
Abstract: Natural scenes contain rich stochastic motion patterns which are characterized by the movement of a large number of small elements, such as falling snow, raining, flying birds, firework and waterfall. In this paper, we call these motion patterns textured motion and present a generative method that combines statistical models and algorithms from both texture and motion analysis. The generative method includes the following three aspects. 1). Photometrically, an image is represented as a superposition of linear bases in atomic decomposition using an overcomplete dictionary, such as Gabor or Laplacian. Such base representation is known to be generic for natural images, and it is low dimensional as the number of bases is often 100 times smaller than the number of pixels. 2). Geometrically, each moving element (called moveton), such as the individual snowflake and bird, is represented by a deformable template which is a group of several spatially adjacent bases. Such templates are learned through clustering. 3). Dynamically, the movetons are tracked through the image sequence by a stochastic algorithm maximizing a posterior probability. A classic second order Markov chain model is adopted for the motion dynamics. The sources and sinks of the movetons are modeled by birth and death maps. We adopt an EM-like stochastic gradient algorithm for inference of the hidden variables: bases, movetons, birth/death maps, parameters of the dynamics. The learned models are also verified through synthesizing random textured motion sequences which bear similar visual appearance with the observed sequences.
TL;DR: A frame rate up-conversion algorithm using adaptive motion compensation (MC) that reduces the blocking artifacts due to block-based processing is proposed.
Abstract: We propose a new frame rate up-conversion (FRC) algorithm using the adaptive motion compensation (MC) that reduces the blocking artifacts due to block-based processing. In the proposed scheme, after conventional motion estimation (ME) between two adjacent frames is performed to construct the motion vectors for the frame to be interpolated, the motion analysis is used to determine the type of motion and the motion-compensated interpolation (MCI) is applied adaptively. Unlike conventional MCI algorithms, the proposed technique utilizes similar neighboring motion vectors to produce multiple motion trajectories. When the proposed MCI is applied, multiple motion trajectories are considered in order to increase the accuracy of the MCI. The proposed method provides high quality format conversion with significantly reduced blocking artifacts.
TL;DR: The proposed kinematic-based approach for automatic human motion analysis from IR image sequences achieves good performance in gait analysis with different view angles With respect to the walking direction, and is promising for further gait recognition.
Abstract: In an infrared (IR) image sequence of human walking, the human silhouette can be reliably extracted from the background regardless of lighting conditions and colors of the human surfaces and backgrounds in most cases. Moreover, some important regions containing skin, such as face and hands, can be accurately detected in IR image sequences. In this paper, we propose a kinematic-based approach for automatic human motion analysis from IR image sequences. The proposed approach estimates 3D human walking parameters by performing a modified least squares fit of the 3D kinematic model to the 2D silhouette extracted from a monocular IR image sequence, where continuity and symmetry of human walking and detected hand regions are also considered in the optimization function. Experimental results show that the proposed approach achieves good performance in gait analysis with different view angles With respect to the walking direction, and is promising for further gait recognition.
TL;DR: This paper proposes a new motion descriptor to capture motion pattern from video clip by transforming motion vector field to a number of directional slices of energy, which form a multi-dimensional vector, called motion texture.
Abstract: Motion is an important cue for video content perception. However, the lacking of effective motion representation becomes a barrier for automatic video content analysis. In this paper, we propose a new motion descriptor to capture motion pattern from video clip. First, we transform motion vector field to a number of directional slices of energy. Then, these slices are measured by a set of moments. As a result, a multi-dimensional vector, called motion texture, is formed The effectiveness and efficiency of the proposed representation had been validated by motion-based shot retrieval experiments.
TL;DR: A new combination of medical knowledge, image processing and regression analysis can be used to label human motion in image sequences.
Abstract: We describe a new method for analyzing and extracting human gait motion by combining statistical methods with image processing. The periodic motion of human gait is modeled by trigonometric-polynomial interpolant functions. The gait description is derived by topological analysis guided by medical studies that selects areas from which joint angles are derived by regression analysis. Then, the interpolant functions are fitted to the gait data and whilst showing fidelity to earlier medical studies, also show recognition capability. As such, a new combination of medical knowledge, image processing and regression analysis can be used to label human motion in image sequences.
TL;DR: Two methods for gesture analysis and mapping to music are presented, both independent of specific orientation or location of the subject, and the first deals with gestural segmentation, while the second uses pattern recognition.
Abstract: We report research performed on gesture analysis and mapping to music. Various movements were recorded using 3D optical motion capture. Using this system, we produced animations from movements/dance, and generate in parallel the soundtrack from the dancer's movements. Prior to the actual sound mapping process, we performed various motion analyses. We present here two methods, both independent of specific orientation or location of the subject. The first deals with gestural segmentation, while the second uses pattern recognition.
TL;DR: In this paper, a structural analysis based method for fault diagnosis purposes is presented, which uses the structural model of the system and utilizes the matching idea to extract system's inherent redundant information.
Abstract: The paper presents a structural analysis based method for fault diagnosis purposes. The method uses the structural model of the system and utilizes the matching idea to extract system's inherent redundant information. The structural model is represented by a bipartite directed graph. FDI possibilities are examined by further analysis of the obtained information. The method is illustrated by applying on the LTI model of motion of a fixed-wing aircraft.
TL;DR: The results suggest that the body dynamics have rich complexity in phase space, and within an envelope, small nodes may exist to give variation and controllability without damaging stability.
Abstract: A novel concept for controlling a humanoid robot, global dynamics, is investigated by motion capture experiments. This concept generalises human/humanoid body motion as successive transitions of "envelopes", where body dynamics is exploited and high level control input is adopted only at the "nodes", where the body is unstable and control input is necessary. Dynamical rising is chosen for our experiment and full body motion is measured. By evaluating the coordination between joint angles, we have seen variation of motion and corresponding envelope volume (stable region within the phase space). Also, by analysis in the phase space (including variables both of positions and their time derivatives), we have seen variations not only according to boundary conditions for the motion (i.e., adopting physical restriction) but also according to experience. Our results suggest that the body dynamics have rich complexity in phase space. Within an envelope, small nodes may exist to give variation and controllability without damaging stability.
TL;DR: A gesture-tracking system using real-time local range on-demand and a method performing range processing only when necessary and where necessary, which results in dynamic regional range images that contain only information needed by the system.
Abstract: This paper presents a new approach to the range data utilization in a gesture-tracking system. The use of three-dimensional data is essential for human motion analysis; however, the speed of complete range estimation prohibits from including it in most real-time systems. This work describes a gesture-tracking system using real-time local range on-demand. The system represents a gesture-controlled interface for interactive visual exploration of large data sets. The paper describes a method performing range processing only when necessary and where necessary. Range data is processed only for non-static regions of interest. This is accomplished by a set of filters on the color, motion, and range data. The speed-up achieved is between 1.70 and 2.15. The algorithm also includes a robust skin-color segmentation insensitive to illumination changes. Selective range processing results in dynamic regional range images that contain only information needed by the system.
TL;DR: In this paper, the position of a reflective marker was recorded while it was moved quasi-statically over a range of 2.54 mm (0.100 inches) via a linearly-translating table.
TL;DR: An ability to accurately measure gait velocity, stance width, stride length, arm swing, cadence, and stance times from multi-view, video sequences of human movement captured in a complex home environment is demonstrated.
Abstract: In this work we introduce a new model-based approach towards the 3D tracking and extraction of gait patterns in human motion. We suggest the use of a hierarchical, structural model of the human body with a novel derivation of system dynamics from hard and soft kinematic constraints. The hard constraints place physical limitations on possible model configurations while the soft constraints represent probabilistic distributions learned from previous examples of human motion. Using the parameters of the structural and dynamic models, we derive a methodology for extracting a number of gait variables at both coarse and fine resolutions with coincident robustness and precision. In particular, we demonstrate an ability to accurately measure gait velocity, stance width, stride length, arm swing, cadence, and stance times from multi-view, video sequences of human movement captured in a complex home environment.
TL;DR: A kind of motion-based segmentation relying on an analytic representation of the motion field that permits to extract important quantities such as singularities, stream-functions or velocity potentials has the advantage to be robust, simple, and fast.
Abstract: Analyzing fluid motion is essential in number of domains and can rarely be handled using generic computer vision techniques. In this particular application context, we address two distinct problems. First we describe a dedicated dense motion estimator. The approach relies on constraints issuing from fluid motion properties and allows us to recover dense motion fields of good quality. Secondly, we address the problem of analyzing such velocity fields. We present a kind of motion-based segmentation relying on an analytic representation of the motion field that permits to extract important quantities such as singularities, stream-functions or velocity potentials. The proposed method has the advantage to be robust, simple, and fast.
TL;DR: A statistical solution to the fusion problem based on variable-bandwidth kernel density estimation and it is shown that the fusion estimate is consistent and conservative, and Superior experimental results validate the theory.
Abstract: Vision tasks, such as motion analysis, object tracking, robot localization, and 3D modeling, often require the fusion of estimates coming from different sources. Most of the fusion algorithms, however, are not robust with respect to outliers and only consider one source models. Their performance deteriorates when initial assumptions are not valid (e.g., the presence of outliers in the data or data corresponding to multiple motions). The paper presents a statistical solution to the fusion problem based on variable-bandwidth kernel density estimation. The fusion estimate is represented by the mode of a density function that exploits the uncertainty of the estimates to be fused. We show that the fusion estimate is consistent and conservative. Since our construction is founded on density estimation, it handles naturally outliers and multiple source models. We test the density-based fusion for the task of multiple motion computation. Superior experimental results validate our theory.
TL;DR: A fast technique for video segmentation based on interframe moving masks is presented, which appears to be particularly suitable for videoconferences or videotelephone applications based on head and shoulders sequences acquired with static video capture devices.
Abstract: A fast technique for video segmentation based on interframe moving masks is presented. The applied procedure belongs to the semi-automatic pixel-based methods also called interactive or supervised techniques. In order to initialize the algorithm and let the automatic segmentation phase start, a certain amount of user intervention is required. As is well known, a pixel-based approach allows a very accurate edge extraction of the moving object. The main advantage of this single feature method is its capability to increase the algorithm computational speed, necessary for real-time applications. For this reason the method appears to be particularly suitable for videoconferences or videotelephone applications based on head and shoulders sequences acquired with static video capture devices.
TL;DR: This work presents a simple but powerful computational model and associated algorithms based on the use of perceptual organizational principles, such as temporal coherence and spatial proximity, for motion segmentation, which can easily handle drastic illumination changes, occlusion events, and multiple moving objects, without theUse of training and specific object or illumination models.
TL;DR: It is shown that the distance from camera, and strong camera motion are main cases where motion vector based descriptors tend to overestimate or underestimate the intensity of motion activity.
Abstract: We present a psychophysical and analytical framework for the comparison of the performance of different analytical measures of motion activity in video segments with respect to a subjective ground truth. We first construct a test-set of video segments and conduct a psychophysical experiment to obtain a ground truth for the motion activity. Then we present several low-complexity motion activity descriptors computed from compressed domain block motion vectors. In the first analysis, we quantize the descriptors and show that they perform well against the ground truth. We also show that the MPEG-7 motion activity descriptor is among the best. In the second analysis, we find the pairs of video segments for which the human subjects unanimously rate one as higher activity than the other. Then we examine the specific cases where each descriptor fail to give the correct ordering. We show that the distance from camera, and strong camera motion are main cases where motion vector based descriptors tend to overestimate or underestimate the intensity of motion activity. We finally discuss the experimental methodology and analysis methods we used and possible alternatives. We review the applications of motion activity and how the results presented here relate to those applications.
TL;DR: Off-optical axis work is a good alternative to the orthodox setup provided the instruments are placed to the nondominant hand provided the instrument casts a shadow on its medial side and this tends to obscure the exact relations between instrument, needlepoint, and the tissue.
Abstract: Background: During complex laparoscopic operations, the surgeon often has to use both instruments to one side of the telescope (off-optical axis work). This experimental study was undertaken to compare this orthodox versus the off-optical axis endoscopic manipulations regarding the performance parameters and motion analysis and muscle work of the surgeon's dominant upper limb. Methods: Ten surgeons participated in the study; each sutured 50-mm enterotomy in pig's small bowel in each of three setups: (1) in-optical axis manipulation (one instrument on either side of the laparoscope) (2) off-optical axis manipulation (both instruments on one side of the laparoscope to the dominant hand of the surgeon), and (3) off-optical axis manipulation both instruments on one side of the laparoscope on the nondominant side). The main outcome measures were the placement error score, execution time, leakage pressure, motion analysis, and telemetric electromyography parameters of the surgeon's dominant upper limb. Results: There was no significant difference in all parameters of performance, muscle work, and fatigue between setup 1 and setup 3. However, marked degradation of all parameters of performance with increased muscle work and fatigue was observed with setup 2 compared to setups 1 and 3. The reason for the deterioration with setup 2 is related to the altered "monitor display angle" which are different from the actual physical angles. With this setup, the instrument-to-target physical angle of 30° appears on the screen as ?30° and this disturbs both the manipulation and the azimuth angles obscuring the needle–tissue entry point. In addition, the instrument casts a shadow on its medial side and this tends to obscure the exact relations between instrument, needlepoint, and the tissue. Conclusions: Off-optical axis work is a good alternative to the orthodox setup provided the instruments are placed to the nondominant hand. The marked degradation in performance encountered during off-optical axis work to the dominant hand of the surgeon is due to the resulting altered monitor display angles. The importance of these monitor display angles in influencing task performance has been previously overlooked.
TL;DR: In this article, an implementation of Hough transform for line detection in an image based on a one-dimensional voting array is described. But the implementation is limited to a single image.
Abstract: An implementation of Hough transform is described. According to one aspect, the Hough transform is used for line detection in an image based on a one-dimensional voting array.
TL;DR: This paper proposes a new approach for motion retargeting, i.e., adjusting motion-capture data to different characters and scenes and develops a motion-analysis algorithm that can identify type/structure of the motion and extract a lot of useful information, such as gait phases, foot-ground constraints, important features that should be preserved during retargeted, etc.
Abstract: This paper proposes a new approach for motion retargeting, i.e., adjusting motion-capture data to different characters and scenes. For achieving universality, the existing retargeting techniques often become absolutely impractical for most of real-life applications. In contrast, we did not try to create a technique that can deal with practically any motion, but concentrated on creating a practical solution that is able to produce realistic results for a some subset of all motions. So, the corner-stone idea of our approach is that realistic retargeting can only be achieved if the algorithm is aware about the structure and specific features of the processed motion. We applied this idea to animation of human locomotion and developed a motion-analysis algorithm that can identify type/structure of the motion and extract a lot of useful information, such as gait phases, foot-ground constraints, important features that should be preserved during retargeting, etc. Also, we developed an inverse kinematics-based retargeting solver that can take advantage of using this information and can produce accurate and realistic animations of human locomotion.