TL;DR: The humanlD gait challenge problem is introduced, to provide a means for measuring progress and characterizing the properties of gait recognition, and represents a radical departure from traditional computer vision research methodology.
Abstract: Identification of people by analysis of gait patterns extracted from video has recently become a popular research problem. However, the conditions under which the problem is "solvable" are not understood or characterized. To provide a means for measuring progress and characterizing the properties of gait recognition, we introduce the humanlD gait challenge problem. The challenge problem consists of a baseline algorithm, a set of 12 experiments, and a large data set. The baseline algorithm estimates silhouettes by background subtraction and performs recognition by temporal correlation of silhouettes. The 12 experiments are of increasing difficulty, as measured by the baseline algorithm, and examine the effects of five covariates on performance. The covariates are: change in viewing angle, change in shoe type, change in walking surface, carrying or not carrying a briefcase, and elapsed time between sequences being compared. Identification rates for the 12 experiments range from 78 percent on the easiest experiment to 3 percent on the hardest. All five covariates had statistically significant effects on performance, with walking surface and time difference having the greatest impact. The data set consists of 1,870 sequences from 122 subjects spanning five covariates (1.2 gigabytes of data). This infrastructure supports further development of gait recognition algorithms and additional experiments to understand the strengths and weaknesses of new algorithms. The experimental results are presented, the more detailed is the possible meta-analysis and greater is the understanding. It is this potential from the adoption of this challenge problem that represents a radical departure from traditional computer vision research methodology.
TL;DR: This comprehensive volume provides a detailed discourse on the mathematical models used in computational vision from leading educators and active research experts in this field and serves as a complete reference work for professionals.
Abstract: This comprehensive volume is an essential reference tool for professional and academic researchers in the filed of computer vision, image processing, and applied mathematics. Continuing rapid advances in image processing have been enhanced by the theoretical efforts of mathematicians and engineers. This marriage of mathematics and computer vision - computational vision - has resulted in a discrete approach to image processing that is more reliable when leveraging in practical tasks. This comprehensive volume provides a detailed discourse on the mathematical models used in computational vision from leading educators and active research experts in this field. Topical areas include: image reconstruction, segmentation and object extraction, shape modeling and registration, motion analysis and tracking, and 3D from images, geometry and reconstruction. The book also includes a study of applications in medical image analysis. Handbook of Mathematical Models in Computer Vision provides a graduate-level treatment of this subject as well as serving as a complete reference work for professionals.
TL;DR: A new method to robustly and efficiently analyze foreground when the authors detect background for a fixed camera view by using mixture of Gaussians models and multiple cues is presented.
Abstract: We present a new method to robustly and efficiently analyze foreground when we detect background for a fixed camera view by using mixture of Gaussians models and multiple cues. The background is modeled by three Gaussian mixtures as in the work of Stauffer and Grimson (1999). Then the intensity and texture information are integrated to remove shadows and to enable the algorithm working for quick lighting changes. For foreground analysis, the same Gaussian mixture model is employed to detect the static foreground regions without using any tracking or motion information. Then the whole static regions are pushed back to the background model to avoid a common problem in background subtraction /spl times/ fragmentation (one object becomes multiple parts). The method was tested on our real time video surveillance system. It is robust and run about 130 fpsfor color images and 150 fps for grayscale images at size 160/spl times/120 on a 2GB Pentium IV machine with MMX optimization.
TL;DR: A feature selection algorithm demonstrated that as little as three gait features, one selected from each data type, could effectively distinguish the age groups with 100% accuracy, demonstrating considerable potential in applying SVMs in gait classification for many applications.
TL;DR: This paper considers the problem of automatically learning an activity-based semantic scene model from a stream of video data and proposes a scene model that labels regions according to an identifiable activity in each region, such as entry/exit zones, junctions, paths, and stop zones.
Abstract: This paper considers the problem of automatically learning an activity-based semantic scene model from a stream of video data. A scene model is proposed that labels regions according to an identifiable activity in each region, such as entry/exit zones, junctions, paths, and stop zones. We present several unsupervised methods that learn these scene elements and present results that show the efficiency of our approach. Finally, we describe how the models can be used to support the interpretation of moving objects in a visual surveillance environment.
TL;DR: By applying an adaptive super-resolution algorithm to the video produced by the jitter camera, it is shown that resolution can be notably enhanced for stationary or slowly moving objects, while it is improved slightly or left unchanged for objects with fast and complex motions.
Abstract: Video cameras must produce images at a reasonable frame-rate and with a reasonable depth of field. These requirements impose fundamental physical limits on the spatial resolution of the image detector. As a result, current cameras produce videos with a very low resolution. The resolution of videos can be computationally enhanced by moving the camera and applying super-resolution reconstruction algorithms. However, a moving camera introduces motion blur, which limits super-resolution quality. We analyze this effect and derive a theoretical result showing that motion blur has a substantial degrading effect on the performance of super-resolution. The conclusion is that, in order to achieve the highest resolution motion blur should be avoided. Motion blur can be minimized by sampling the space-time volume of the video in a specific manner. We have developed a novel camera, called the "jitter camera," that achieves this sampling. By applying an adaptive super-resolution algorithm to the video produced by the jitter camera, we show that resolution can be notably enhanced for stationary or slowly moving objects, while it is improved slightly or left unchanged for objects with fast and complex motions. The end result is a video that has a significantly higher resolution than the captured one.
TL;DR: A new optical-flow-based method for estimating heart motion from two-dimensional echocardiographic sequences and uses a wavelet-like algorithm for computing B-spline-weighted inner products and moments at dyadic scales to increase computational efficiency.
Abstract: The quantitative assessment of cardiac motion is a fundamental concept to evaluate ventricular malfunction. We present a new optical-flow-based method for estimating heart motion from two-dimensional echocardiographic sequences. To account for typical heart motions, such as contraction/expansion and shear, we analyze the images locally by using a local-affine model for the velocity in space and a linear model in time. The regional motion parameters are estimated in the least-squares sense inside a sliding spatiotemporal B-spline window. Robustness and spatial adaptability is achieved by estimating the model parameters at multiple scales within a coarse-to-fine multiresolution framework. We use a wavelet-like algorithm for computing B-spline-weighted inner products and moments at dyadic scales to increase computational efficiency. In order to characterize myocardial contractility and to simplify the detection of myocardial dysfunction, the radial component of the velocity with respect to a reference point is color coded and visualized inside a time-varying region of interest. The algorithm was first validated on synthetic data sets that simulate a beating heart with a speckle-like appearance of echocardiograms. The ability to estimate motion from real ultrasound sequences was demonstrated by a rotating phantom experiment. The method was also applied to a set of in vivo echocardiograms from an animal study. Motion estimation results were in good agreement with the expert echocardiographic reading.
TL;DR: This method is based on statistical modeling of an image pair using constraints on appearance and motion and is extended to video by chaining the pairwise models to produce a joint probability distribution to be maximized.
Abstract: In this paper, we propose a method for jointly computing optical flow and segmenting video while accounting for mixed pixels (matting). Our method is based on statistical modeling of an image pair using constraints on appearance and motion. Segments are viewed as overlapping regions with fractional (alpha) contributions. Bidirectional motion is estimated based on spatial coherence and similarity of segment colors. Our model is extended to video by chaining the pairwise models to produce a joint probability distribution to be maximized. To make the problem more tractable, we factorize the posterior distribution and iteratively minimize its parts. We demonstrate our method on frame interpolation
TL;DR: In this article, the authors describe methods and integrated systems for camera motion analysis and moving object analysis and methods of extracting semantics mainly from camera motion parameters in videos and video segments without shot changes.
Abstract: Methods and integrated systems for camera motion analysis and moving object analysis and methods of extracting semantics mainly from camera motion parameters in videos and video segments without shot changes are described. Typical examples of such videos are a home video taken by a digital camera and a segment, or clip, of a professional video or film. The extracted semantics can be directly used in a number of video/image understanding and management applications, such as annotation, browsing, editing, frame enhancement, key-frame extraction, panorama generation, printing, retrieval, summarization. Automatic methods of detecting and tracking moving objects that do not rely on a priori knowledge of the objects are also described. The methods can be executed in real time.
TL;DR: An automated system that classifies gender by utilising a set of human gait data and an SVM classifier is used to classify gender in the gait patterns on a considerably larger database.
Abstract: We describe an automated system that classifies gender by utilising a set of human gait data. The gender classification system consists of three stages: i) detection and extraction of the moving human body and its contour from image sequences; ii) extraction of human gait signature by the joint angles and body points; and iii) motion analysis and feature extraction for classifying gender in the gait patterns. A sequential set of 2D stick figures is used to represent the gait signature that is primitive data for the feature generation based on motion parameters. Then, an SVM classifier is used to classify gender in the gait patterns. In experiments, higher gender classification performances, which are 96% for 100 subjects, have been achieved on a considerably larger database.
TL;DR: A new model-based image-matching technique can potentially be used to arrive at more precise descriptions of the mechanisms of sports injuries than what has been possible without elaborate methods for three-dimensional reconstruction from uncalibrated video sequences, e.g. for knee injuries.
TL;DR: A novel model of attentive visual motion processing is presented that addresses both decomposition of the signal into constituent features as well as the re-combination, or binding, of those features into wholes.
TL;DR: A novel method to detect fire and/or flame by processing the video data generated by an ordinary camera monitoring a scene by analyzing the video in the wavelet domain is proposed.
Abstract: The paper proposes a novel method to detect fire and/or flame by processing the video data generated by an ordinary camera monitoring a scene. In addition to ordinary motion and color clues, flame and fire flicker are detected by analyzing the video in the wavelet domain. Periodic behavior in flame boundaries is detected by performing a temporal wavelet transform. Color variations in fire are detected by computing the spatial wavelet transform of moving fire-colored regions. Other clues used in the fire detection algorithm include irregularity of the boundary of the fire-colored region and the growth of such regions in time. All of the above clues are combined to reach a final decision.
TL;DR: A fast online gait planning method based on an approximate dynamical biped model whose mass is concentrated to COG, general solution of the equation of motion is analyzed and generates physically feasible referential trajectory of the whole-body only from the next desired foot placement.
Abstract: A fast online gait planning method is proposed. Based on an approximate dynamical biped model whose mass is concentrated to COG, general solution of the equation of motion is analytically obtained. Dynamical constraint on the external reaction force due to the underactuation is resolved by boundary condition relaxation, namely, by admitting some error between the desired and actually reached state. It potentially creates responsive motion which requires strong instantaneous acceleration by accepting discontinuity of ZMP trajectory, which is designed as an exponential function. A semi-automatic continuous gait planning is also presented. It generates physically feasible referential trajectory of the whole-body only from the next desired foot placement. The validity of proposed is ensured through both simulations and experiments with a small anthropomorphic robot.
TL;DR: A frame-by-frame video-registration technique using a feature tracker to automatically determine control-point correspondences is proposed, which converts the spatio-temporal video into temporal information, thereby correcting for airborne platform motion and attitude errors.
Abstract: This paper investigates airborne helicopter video for estimating traffic parameters. Roll, pitch, and yaw of the helicopter make the video unstable, difficult to view, and the derived parameters less accurate. To correct this, a frame-by-frame video-registration technique using a feature tracker to automatically determine control-point correspondences is proposed. This converts the spatio-temporal video into temporal information, thereby correcting for airborne platform motion and attitude errors. The registration is robust, with the residual jitter being less than a few pixels over hundreds of frames. A simple vehicle-detection scheme identifies vehicle locations in the video, which are then tracked by the feature tracker, enabling us to estimate average velocity, instantaneous velocity, and other parameters automatically to within 10% of manual measurements. The entire process of registration, detection, tracking, and estimation takes only a few seconds for each frame. A prototype multimedia geographic information system (GIS) is created as a visualization tool for viewing the registered video, other airborne or satellite imagery, and data pertaining to georeferenced locations within a base map.
TL;DR: A direct method for recovering non-rigid object motion from its appearance in which the point correspondences are simultaneously established while estimating TPS parameters is presented.
Abstract: Thin plate spline (TPS) transformations have been applied to non-rigid shape matching with impressive results. However, existing methods often use a sparse set of point correspondences which are established prior to shape matching. A straightforward approach to finding point correspondences and computing TPS parameters imposes expensive computations, thereby motivating us to develop an efficient solution. In this paper, we present a direct method for recovering non-rigid object motion from its appearance in which the point correspondences are simultaneously established while estimating TPS parameters. The motion parameters are estimated in a stiff-to-flexible approach and the principal appearance deformations are learned that can be utilized for motion analysis and recognition. Numerous experiments demonstrate the efficiency and efficacy of the proposed algorithm in modeling the motion details of non-rigid objects undergoing shape deformation and pose variation.
TL;DR: The paper demonstrates how ultrasonic hand tracking can be used to improve the performance of a wearable, accelerometer and gyroscope based activity recognition system and introduces several methods of fusing the ultrasound and motion sensor information.
Abstract: The paper demonstrates how ultrasonic hand tracking can be used to improve the performance of a wearable, accelerometer and gyroscope based activity recognition system. Specifically we target the recognition of manipulative gestures of the type found in assembly and maintenance tasks. We discuss how relevant information can be extracted from the ultrasonic signal despite problems with low sampling rate, occlusions and reflections that occur in this type of application. We then introduce several methods of fusing the ultrasound and motion sensor information. We evaluate our methods on an experimental data set that contains 21 different actions performed repeatedly by three different subjects during simulated bike repair. Due to the complexity of the recognition tasks with many similar and vaguely defined actions and person independent training both the ultrasound and motion sensors perform poorly on their own. However with our fusion methods recognition rates well over 90% can be achieved for most activities. In extreme case recognition rates go up from just over 50% for separate classifications to nearly 89% with our fusion methods.
TL;DR: The algorithm Projection Shift Analysis developed to run on mobile phones is outlined, which detects the relative motion of the camera in the two-dimensional pixel space of the image and calculates the direction of the movement with a suitable algorithm.
Abstract: Mobile devices become smaller and more powerful with each generation distributed. Because of the tiny enclosures the interaction with such devices offers limited input capabilities. In contrast there are hardly any mobile phones purchasable that do not have a built-in camera. We developed a concept of an intuitive interaction technique using optical inertial tracking on mobile phones. The key of this concept is the user moving the mobile device which results in a moving video stream of the camera. The direction of the movement can be calculated with a suitable algorithm. This paper outlines the algorithm Projection Shift Analysis developed to run on mobile phones. 1. MOTIVATION Our approach detects the relative motion of the camera in the two-dimensional pixel space of the image. As camera movement directly results in motion of all scene components the motion of the camera movement can be defined as the inverse movement of the scene. If there are no significant scene components moving for itself conventional motion detection methods can be used to analyse the video stream. There are several algorithms from different fields of computer graphics and image processing used to parameterise the motion of the scene. Although they pursue different approaches, all of them would analyse the scene motion sufficiently. Due to low CPU and memory resources on mobile phones we developed the Projection Shift Analysis algorithm for motion analysis. There is a wide range of applications for the motion parameter, for example controlling a game similar to joystick interaction. Another possible application could interpret motion gestures or control the cursor like the stylus input technique on PDAs. 2. RELATED WORK In [8] Geiger et al. present concepts for mobile games and address interaction issues concerning mobile devices. The simplest approach to track objects is detecting significant features in an image, e.g. edges. Naturally, edge detection methods like the Robert, Prewitt, Sobel, Laplace or Canny [2] filters are used to achieve this. Motion Detection in 3D-Computer Graphics, Mixedand Augmented Reality is often referred to as Tracking. Beier et al. presented a markerless tracking system using edge detection in [1]. Comport et al. propose a robust markerless tracking system in [3]. A specialised solution to the problem of markerless tracking was published by Simon et al. in [15]. In [11] Moehring et al. present a marker-based tracking system designed especially for mobile phones. Kato and Billinghurst developed the optical marker based tracking system ARToolKit published in [9]. Foxlin et al. present a wide spectrum of optical inertial tracking systems in [5, 6, 17]. Additionally, a taxonomy of Motion Detection methods has been published in [4]. In [10] Koller presents a method to track the position of cars using an optical system. Siemens Mobile developed a game called Mozzies, that is distributed with the mobile phone SX1 by default. This Symbian based game augments the background video from the camera with moths. The user can point the gun at a moth and shoot it by moving the phone and pressing the appropriate button. In [7] Geiger et al. present an interesting approach of an augmented reality version of a soccer game running on a PDA. 3. INTERACTION TECHNIQUES This chapter describes and classifies various interaction techniques used on mobile devices nowadays. There are a few main parameters that define the usability of those techniques. The reaction time between the user input and the response on output devices such as the display is a very crucial parameter. Any visual response on the output device after about 200 ms is not interpreted as a direct reaction to the user, but as a separate event. The quantity of actions a user is able to perform using a specific input technique defines the speed at which he can interact with the device. The intuitivity of an input method strongly affects the usability
TL;DR: The ability to automatically discover the different motion patterns in an intersection is demonstrated - the structure tensor at each pixel is interpretable as a constrained Gaussian probability density function over the derivatives measured across the entire image.
Abstract: Surveillance applications often capture video over long time periods; interpretation of this data is facilitated by background models that effectively represent the typical behavior in the scene. Capturing statistics of the spatio-temporal derivatives at each pixel can efficiently model surprisingly complicated motion patterns. Considering the video as a function of space and time, the mean 3D structure tensor at each pixel characterizes local image variation, the most common local motion, and whether that motion is consistent or ambiguous. Furthermore, this structure tensor field - the structure tensor at each pixel - is interpretable as a constrained Gaussian probability density function over the derivatives measured across the entire image. In scenes with multiple global motion patterns, a mixture model (of these global distributions) automatically factors background motion into a set of flow fields corresponding to the different motions. The models are developed online in real time and can adapt to changes in background motion. We demonstrate the ability to automatically discover the different motion patterns in an intersection.
TL;DR: The approach attempts to incorporate knowledge of the static and dynamics of human gait into the feature extraction process and uses the multi-class support vector machines to distinguish the different gaits of human.
Abstract: This paper proposes an automatic gait recognition approach for analyzing and classifying human gait by computer vision techniques. The approach attempts to incorporate knowledge of the static and dynamics of human gait into the feature extraction process. The width vectors of the binarized silhouette of a walking person contain the physical structure of the person, the motion of the limbs and other details of the body are chosen as the basic gait feature. Different from the model-based approaches, the limb angle information is extracted by analyzing the variation of silhouette width without needing the human body model. Discrete cosine analysis is used to analyze the shape and dynamic characteristic and reduce the gait features. And this paper uses the multi-class support vector machines to distinguish the different gaits of human. The performance of the proposed method is tested using different gait databases. Recognition results show this approach is efficient.
TL;DR: In this article, an apparatus and a method for mixing a 3D(three-dimensional) graphic image in common moving picture are provided to mix 3D graphic virtual image and the moving picture naturally by considering a camera motion of a moving picture without help of hardware, and a user interface part(600) performs interface with a user in order to make a composite image at a predetermined area of the frames.
Abstract: PURPOSE: An apparatus and a method for mixing a 3D(three-Dimensional) graphic image in common moving picture are provided to mix a 3D graphic virtual image and the moving picture naturally by considering a camera motion of the moving picture without help of hardware. CONSTITUTION: A moving picture input part(500) receive a moving picture signal which is a flow of continuous frames. A user interface part(600) performs interface with a user in order to make a composite image at a predetermined area of the frames. A motion analysis part(100) receives composition area information, scene change image information and the moving picture signal and then extracts motion information of a camera on a basis of information in the frames. An image mixing part(200) creates a mixed image on a basis of the motion information.
TL;DR: This work addresses the problem of visual motion analysis and interpretation by formulating it as an inference of motion layers from a noisy and possibly sparse point set in a 4D space and locally enforce rigidity for each layer in order to infer its 3D structure and motion.
Abstract: Most approaches for motion analysis and interpretation rely on restrictive parametric models and involve iterative methods which depend heavily on initial conditions and are subject to instability. Further difficulties are encountered in image regions where motion is not smooth-typically around motion boundaries. This work addresses the problem of visual motion analysis and interpretation by formulating it as an inference of motion layers from a noisy and possibly sparse point set in a 4D space. The core of the method is based on a layered 4D representation of data and a voting scheme for affinity propagation. The inherent problem caused by the ambiguity of 2D to 3D interpretation is usually handled by adding additional constraints, such as rigidity. However, enforcing such a global constraint has been problematic in the combined presence of noise and multiple independent motions. By decoupling the processes of matching, outlier rejection, segmentation, and interpretation, we extract accurate motion layers based on the smoothness of image motion, and then locally enforce rigidity for each layer in order to infer its 3D structure and motion. The proposed framework is noniterative and consistently handles both smooth moving regions and motion discontinuities without using any prior knowledge of the motion model.
TL;DR: A method for estimating egomotion that avoids pointwise image velocity estimation as a first step and can be applied to a wide range of 3D cluttered scenes, including those for which pointwiseimage velocities cannot be measured because only normal velocity information is available.
Abstract: Previous methods for estimating observer motion in a rigid 3D scene assume that image velocities can be measured at isolated points. When the observer is moving through a cluttered 3D scene such as a forest, however, pointwise measurements of image velocity are more challenging to obtain because multiple depths, and hence multiple velocities, are present in most local image regions. We introduce a method for estimating egomotion that avoids pointwise image velocity estimation as a first step. In its place, the direction of motion parallax in local image regions is estimated, using a spectrum-based method, and these directions are then combined to directly estimate 3D observer motion. There are two advantages to this approach. First, the method can be applied to a wide range of 3D cluttered scenes, including those for which pointwise image velocities cannot be measured because only normal velocity information is available. Second, the egomotion estimates can be used as a posterior constraint on estimating pointwise image velocities, since known egomotion parameters constrain the candidate image velocities at each point to a one-dimensional rather than a two-dimensional space.
TL;DR: This paper is to introduce detecting and tracking moving object using an active camera, which is mounted on mobile robot, and concludes that moving object is finally decided by combined feature set and motion analysis.
Abstract: This paper is to introduce detecting and tracking moving object using an active camera, which is mounted on mobile robot. Motion of camera is analyzed and compensated by comparing edge features among consecutive image frames. Candidate regions of moving object are found by differencing between transformed ith image and t-1th image. Moving object is finally decided by combined feature set and motion analysis. Object is tracked by matching object components in ROI. We have experimented detecting and tracking moving object with an active camera, which is pan/tilt/zoom and single camera
TL;DR: In this paper, an active contour model, snake, was developed as a useful segmenting and tracking tool for rigid or non-rigid objects, which is designed on the basis of snake energies.
Abstract: Motion tracking and object segmentation are the most fundamental and critical problems in vision tasks such as motion analysis. An active contour model, snake, was developed as a useful segmenting and tracking tool for rigid or non-rigid objects. Snake is designed on the basis of snake energies. Segmenting and tracking can be executed successfully by energy minimization. In this research, two new paradigms for segmentation and tracking are suggested. First, because the conventional method uses only intensity information, it is difficult to separate an object from its complex background. Therefore, a new energy and design schemes should be proposed for the better segmentation of objects. Second, conventional snake can be applied in situations where the change between images is small. If a fast moving object exists in successive images, conventional snake will not operate well because the moving object may have large differences in its position or shape, between successive images. Snake's nodes may also fall into the local minima in their motion to the new positions of the target object in the succeeding image. For robust tracking, the condensation algorithm was adopted to control the parameters of the proposed snake model called "adaptive color snake model (ACSM)". The effectiveness of the ACSM is verified by appropriate simulations and experiments.
TL;DR: An approach to recognise 10 elementary gestures is proposed and it can be applied to sign language recognition and can work reliably in real-time without relying on accurate tracking, and give a probabilistic output that is useful in complex motion analysis.
Abstract: An approach to recognise 10 elementary gestures is proposed and it can be applied to sign language recognition. In this work, a motion gradient orientation image is extracted directly from a raw video input and transformed to a motion feature vector. This feature vector is then classified into one of the 10 elementary gestures by a sparse Bayesian classifier. A training set of 628 samples and a testing set of over 1000 samples have been obtained to evaluate the proposed method. A real-time system was built and trained with the training set. From the experiment, the reported classification accuracy is 90% and the system can run in around 25 frames per second. Compared with other recently proposed methods that involve the use of hand tracking, the system can work reliably in real-time without relying on accurate tracking, and give a probabilistic output that is useful in complex motion analysis.
TL;DR: A method of measuring, and modelling, wrist joint motion that could potentially be used to improve the kinematic performance of wrist arthroplasty designs and could be used clinically to follow disease progression or recovery following surgery.
TL;DR: An approach to increase adaptability of a recognition system, which can recognise 10 elementary gestures and be extended to sign language recognition, is proposed and the accuracy of the classifier can be boosted from less than 40% to over 80% by re-training it using 5 newly captured samples from each gesture class.
Abstract: An approach to increase adaptability of a recognition system, which can recognise 10 elementary gestures and be extended to sign language recognition, is proposed. In this work, recognition is done by firstly extracting a motion gradient orientation image from a raw video input and then classifying a feature vector generated from this image to one of the 10 gestures by a sparse Bayesian classifier. The classifier is designed in a way that it supports online incremental learning and it can be thus re-trained to increase its adaptability to an input captured under a new condition. Experiments show that the accuracy of the classifier can be boosted from less than 40% to over 80% by re-training it using 5 newly captured samples from each gesture class. Apart from having a better adaptability, the system can work reliably in real-time and give a probabilistic output that is useful in complex motion analysis.
TL;DR: Experimental results show that the proposed system successfully tracks front vehicles and provides information of collision warning in urban artery with speed around 60 km/hr both at night and day times.
Abstract: This paper presents a design and implementation of a real-time visual tracking system for vehicle safety applications. A novel feature-based vehicle tracking algorithm is proposed. This algorithm can automatically detect and track multiple moving objects, including cars and motorcycles, ahead of the tracking vehicle. Combined with the concept of focus of expansion (FOE) and scene analysis, the developed system can segment features of moving objects from moving background and provide a collision warning in real time. The proposed algorithm is realized using a CMOS image sensor and Nios embedded processor architecture. The constructed stand-alone visual tracking system has been validated in actual road tests. Experimental results show that the proposed system successfully tracks front vehicles and provides information of collision warning in urban artery with speed around 60 km/hr both at night and day times.
TL;DR: The phase or associated amplitude analysis of a sequence of images is improved by, first, providing quantifications in response to the phase or amplitude information as mentioned in this paper, and, second, since heart motion or other motion within a body may become complex, multiple harmonics may be used in addition to the first harmonic or fundamental information for parametric imaging a motion.
Abstract: The phase or associated amplitude analysis of a sequence of images is improved by, first, providing quantifications in response to the phase or amplitude information. For example, a value or values representing asynchrony between different locations through a sequence of images may provide useful diagnostic information. Second, since heart motion or other motion within a body may become complex, multiple harmonics may be used in addition to the first harmonic or fundamental information for parametrically imaging a motion. Third, where different portions of a cycle have different characteristics, such as the systolic phase and diastolic phase of a heart cycle, images associated with each of the portions may be separated from other portions. A phase or amplitude analysis of the sequence of images for each portion is handled separately.