TL;DR: The methods reviewed are intended for real-time surveillance through definition of a diverse set of events for further analysis triggering, including virtual fencing, speed profiling, behavior classification, anomaly detection, and object interaction.
Abstract: This paper presents a survey of trajectory-based activity analysis for visual surveillance. It describes techniques that use trajectory data to define a general set of activities that are applicable to a wide range of scenes and environments. Events of interest are detected by building a generic topographical scene description from underlying motion structure as observed over time. The scene topology is automatically learned and is distinguished by points of interest and motion characterized by activity paths. The methods we review are intended for real-time surveillance through definition of a diverse set of events for further analysis triggering, including virtual fencing, speed profiling, behavior classification, anomaly detection, and object interaction.
TL;DR: In this paper, the authors present an automated motion detection system for team sports using a set of motion recognition algorithms, such as TRAKUS, SoccerMan, TRAKPERFORMANCE, Pfinder, and Prozone.
Abstract: Efforts at player motion tracking have traditionally involved a range of data collection techniques from live observation to post-event video analysis where player movement patterns are manually recorded and categorized to determine performance effectiveness. Due to the considerable time required to manually collect and analyse such data, research has tended to focus only on small numbers of players within predefined playing areas. Whilst notational analysis is a convenient, practical and typically inexpensive technique, the validity and reliability of the process can vary depending on a number of factors, including how many observers are used, their experience, and the quality of their viewing perspective. Undoubtedly the application of automated tracking technology to team sports has been hampered because of inadequate video and computational facilities available at sports venues. However, the complex nature of movement inherent to many physical activities also represents a significant hurdle to overcome. Athletes tend to exhibit quick and agile movements, with many unpredictable changes in direction and also frequent collisions with other players. Each of these characteristics of player behaviour violate the assumptions of smooth movement on which computer tracking algorithms are typically based. Systems such as TRAKUS™, SoccerMan™, TRAKPERFORMANCE™, Pfinder™ and Prozone™ all provide extrinsic feedback information to coaches and athletes. However, commercial tracking systems still require a fair amount of operator intervention to process the data after capture and are often limited by the restricted capture environments that can be used and the necessity for individuals to wear tracking devices. Whilst some online tracking systems alleviate the requirements of manual tracking, to our knowledge a completely automated system suitable for sports performance is not yet commercially available. Automatic motion tracking has been used successfully in other domains outside of elite sport performance, notably for surveillance in the military and security industry where automatic recognition of moving objects is achievable because identification of the objects is not necessary. The current challenge is to obtain appropriate video sequences that can robustly identify and label people over time, in a cluttered environment containing multiple interacting people. This problem is often compounded by the quality of video capture, the relative size and occlusion frequency of people, and also changes in illumination. Potential applications of an automated motion detection system are offered, such as: planning tactics and strategies; measuring team organisation; providing meaningful kinematic feedback; and objective measures of intervention effectiveness in teamsports, which could benefit coaches, players, and sports scientists.
TL;DR: The high-resolution velocity estimates used for restoring the image are obtained by global motion estimation, Bezier curve fitting, and local motion estimation without resort to correspondence identification.
Abstract: Due to the sequential-readout structure of complementary metal-oxide semiconductor image sensor array, each scanline of the acquired image is exposed at a different time, resulting in the so-called electronic rolling shutter that induces geometric image distortion when the object or the video camera moves during image capture. In this paper, we propose an image processing technique using a planar motion model to address the problem. Unlike previous methods that involve complex 3-D feature correspondences, a simple approach to the analysis of inter- and intraframe distortions is presented. The high-resolution velocity estimates used for restoring the image are obtained by global motion estimation, Bezier curve fitting, and local motion estimation without resort to correspondence identification. Experimental results demonstrate the effectiveness of the algorithm.
TL;DR: Two different types of visual activity analysis modules based on vehicle tracking are presented, adding realtime situational awareness to highway monitoring for high-level activity and behavior analysis.
Abstract: This paper presents two different types of visual activity analysis modules based on vehicle tracking. The highway monitoring module accurately classifies vehicles into eight different types and collects traffic flow statistics by leveraging tracking information. These statistics are continuously accumulated to maintain daily highway models that are used to categorize traffic flow in real time. The path modeling block is a more general analysis tool that learns the normal motions encountered in a scene in an unsupervised fashion. The spatiotemporal motion characteristics of these motion paths are encoded by a hidden Markov model. With the path definitions, abnormal trajectories are detected and future intent is predicted. These modules add realtime situational awareness to highway monitoring for high-level activity and behavior analysis.
TL;DR: This paper presents an approach to detect abnormal situations in crowded scenes by analyzing the motion aspect instead of tracking subjects one by one and presents the results on the detection of collapsing events in real videos of airport escalator exits.
Abstract: Video-surveillance systems are becoming more and more autonomous in the detection and the reporting of abnormal events. In this context, this paper presents an approach to detect abnormal situations in crowded scenes by analyzing the motion aspect instead of tracking subjects one by one. The proposed approach estimates sudden changes and abnormal motion variations of a set of points of interest (POI). The number of tracked POIs is reduced using a mask that corresponds to hot areas of the built motion heat map. The approach detects events where local motion variation is important compared to previous events. Optical flow techniques are used to extract information such as density, direction and velocity. To demonstrate the interest of the approach, we present the results on the detection of collapsing events in real videos of airport escalator exits.
TL;DR: An optic-flow-driven scheme, focusing on the visual field in the side mirror by placing a camera on top of it, to segment the rear view and track the overtaking vehicle and presents a validation benchmark scheme to evaluate the viability and robustness of the system.
Abstract: Overtaking and lane changing are very dangerous driving maneuvers due to possible driver distraction and blind spots. We propose an aid system based on image processing to help the driver in these situations. The main purpose of an overtaking monitoring system is to segment the rear view and track the overtaking vehicle. We address this task with an optic-flow-driven scheme, focusing on the visual field in the side mirror by placing a camera on top of it. When driving a car, the ego-motion optic-flow pattern is very regular, i.e., all the static objects (such as trees, buildings on the roadside, or landmarks) move backwards. An overtaking vehicle, on the other hand, generates an optic-flow pattern in the opposite direction, i.e., moving forward toward the vehicle. This well-structured motion scenario facilitates the segmentation of regular motion patterns that correspond to the overtaking vehicle. Our approach is based on two main processing stages: First, the computation of optical flow in real time uses a customized digital signal processor (DSP) particularly designed for this task and, second, the tracking stage itself, based on motion pattern analysis, which we address using a standard processor. We present a validation benchmark scheme to evaluate the viability and robustness of the system using a set of overtaking vehicle sequences to determine a reliable vehicle-detection distance.
TL;DR: An automated activity analysis and summarization for eldercare video monitoring and an adaptive learning method to estimate the physical location and moving speed of a person from a single camera view without calibration are developed.
Abstract: In this work, we study how continuous video monitoring and intelligent video processing can be used in eldercare to assist the independent living of elders and to improve the efficiency of eldercare practice. More specifically, we develop an automated activity analysis and summarization for eldercare video monitoring. At the object level, we construct an advanced silhouette extraction, human detection and tracking algorithm for indoor environments. At the feature level, we develop an adaptive learning method to estimate the physical location and moving speed of a person from a single camera view without calibration. At the action level, we explore hierarchical decision tree and dimension reduction methods for human action recognition. We extract important ADL (activities of daily living) statistics for automated functional assessment. To test and evaluate the proposed algorithms and methods, we deploy the camera system in a real living environment for about a month and have collected more than 200 hours (in excess of 600 G bytes) of activity monitoring videos. Our extensive tests over these massive video datasets demonstrate that the proposed automated activity analysis system is very efficient.
TL;DR: The results suggest robustness of the proposed method with respect to multiple views action recognition, scale and phase variations, and invariant analysis of silhouettes.
TL;DR: This paper presents a framework for learning a compact representation of primitive actions that can be used for video obtained from a single camera for simultaneous action recognition and viewpoint estimation and shows recognition rates on a publicly available data set previously only achieved using multiple simultaneous views.
Abstract: Researchers are increasingly interested in providing video-based, view-invariant action recognition for human motion. Addressing this problem will lead to more accurate modeling and analysis of the type of unconstrained video commonly collected in the areas of athletics and medicine. Previous viewpoint-invariant methods use multiple cameras in both the training and testing phases of action recognition or require storing many examples of a single action from multiple viewpoints. In this paper, we present a framework for learning a compact representation of primitive actions (e.g., walk, punch, kick, sit) that can be used for video obtained from a single camera for simultaneous action recognition and viewpoint estimation. Using our method, which models the low-dimensional structure of these actions relative to viewpoint, we show recognition rates on a publicly available data set previously only achieved using multiple simultaneous views.
TL;DR: A stereoscopic video generation method using motion analysis which converts motion into disparity values and considers multi-user conditions and the characteristics of the display device is proposed.
Abstract: Stereoscopic video generation methods can produce stereoscopic content from conventional video filmed with monoscopic cameras. In this paper, we propose a stereoscopic video generation method using motion analysis which converts motion into disparity values and considers multi-user conditions and the characteristics of the display device. The field of view and the maximum and minimum disparity values were calculated in the stereoscopic display characterization stage and were then applied to various types of 3D displays. After motion estimation, we used three cues to decide the scale factor of motion-to-disparity conversion. These cues were the magnitude of motion, camera movements and scene complexity. A subjective evaluation showed that the proposed method generated more satisfactory video sequence.
TL;DR: An image-based method for vehicle speed detection is presented, using a single image captured with vehicle motion for speed measurement according to the imaging geometry, camera pose, and blur extent in the image.
TL;DR: In this paper, model-based approaches for real-time 3-D soccer ball tracking are proposed, using image sequences from multiple fixed cameras as input, and incorporating motion cues and temporal hysteresis thresholding in ball detection and employing phase-specific models to estimate ball trajectories.
Abstract: In this paper, model-based approaches for real-time 3-D soccer ball tracking are proposed, using image sequences from multiple fixed cameras as input. The main challenges include filtering false alarms, tracking through missing observations, and estimating 3-D positions from single or multiple cameras. The key innovations are: 1. incorporating motion cues and temporal hysteresis thresholding in ball detection; 2. modeling each ball trajectory as curve segments in successive virtual vertical planes so that the 3-D position of the ball can be determined from a single camera view; and 3. introducing four motion phases (rolling, flying, in possession, and out of play) and employing phase-specific models to estimate ball trajectories which enables high-level semantics applied in low-level tracking. In addition, unreliable or missing ball observations are recovered using spatio-temporal constraints and temporal filtering. The system accuracy and robustness are evaluated by comparing the estimated ball positions and phases with manual ground-truth data of real soccer sequences.
TL;DR: A novel approach for estimating the global motion between frames using a curve warping technique known as dynamic time warping, which guarantees robustness also in presence of sharp illumination changes and moving objects.
Abstract: The widespread diffusion of hand-held devices with video recording capabilities requires the adoption of reliable digital Stabilization methods to enjoy the acquired sequences without disturbing jerkiness. In order to effectively get rid of the unwanted camera movements, an estimate of the global motion between adjacent frames is necessary. This paper presents a novel approach for estimating the global motion between frames using a curve warping technique known as dynamic time warping. The proposed algorithm guarantees robustness also in presence of sharp illumination changes and moving objects.
TL;DR: Results on the compression-classification tradeoff would provide valuable insight into jointly designing a system that performs video encoding at the camera front-end and action classification at the processing back-end.
Abstract: We present a compressed domain scheme that is able to recognize and localize actions at high speeds. The recognition problem is posed as performing an action video query on a test video sequence. Our method is based on computing motion similarity using compressed domain features which can be extracted with low complexity. We introduce a novel motion correlation measure that takes into account differences in motion directions and magnitudes. Our method is appearance-invariant, requires no prior segmentation, alignment or stabilization, and is able to localize actions in both space and time. We evaluated our method on a benchmark action video database consisting of six actions performed by 25 people under three different scenarios. Our proposed method achieved a classification accuracy of 90%, comparing favorably with existing methods in action classification accuracy, and is able to localize a template video of 80 x 64 pixels with 23 frames in a test video of 368 x 184 pixels with 835 frames in just 11 s, easily outperforming other methods in localization speed. We also perform a systematic investigation of the effects of various encoding options on our proposed approach. In particular, we present results on the compression-classification tradeoff, which would provide valuable insight into jointly designing a system that performs video encoding at the camera front-end and action classification at the processing back-end.
TL;DR: A fast inter mode decision is proposed to decide best prediction mode utilizing the spatial continuity of motion field, which is generated by motion vectors from 4times4 motion estimation, which can save more than 50% computational complexity.
Abstract: Variable size motion estimation with multiple reference frames has been adopted by the new video coding standard H.264. It can achieve significant coding efficiency compared to coding a macroblock (MB) in regular size with single reference frame. On the other hand, it causes high computational complexity of motion estimation at the encoder. Rate distortion optimized (RDO) decision is one powerful method to choose the best coding mode among all combinations of block sizes and reference frames, but it requires extremely high computation. In this paper, a fast inter mode decision is proposed to decide best prediction mode utilizing the spatial continuity of motion field, which is generated by motion vectors from 4times4 motion estimation. Motion continuity of each MB is decided based on the motion edge map detected by the Sobel operator. Based on the motion continuity of a MB, only a small number of block sizes are selected in motion estimation and RDO computation process. Simulation results show that our algorithm can save more than 50% computational complexity, with negligible loss of coding efficiency.
TL;DR: It is demonstrated that the proposed focus-of-attention strategy reduces the false positives of an otherwise identical monocular pedestrian recognition system by a factor of two, at equal detection rates.
Abstract: This paper presents a novel focus-of-attention strategy for monocular pedestrian recognition. It uses Bayespsila rule to estimate the posterior for the presence of a pedestrian in a certain (rectangular) image region, based on motion parallax features. This posterior is used as a parameter to control the amount of regions of interest (ROIs) that is passed to subsequent verification stages. For the latter, we use a state-of-the-art pedestrian recognition scheme which consists of multiple modules in a cascade architecture. We obtain optimized settings for the control parameters of the combined cascade system by a sequential ROC convex hull technique. Experiments are conducted on image data captured from a moving vehicle in an urban environment. We demonstrate that the proposed focus-of-attention strategy reduces the false positives of an otherwise identical monocular pedestrian recognition system by a factor of two, at equal detection rates. The overall system maintains processing rates close to real-time.
TL;DR: The design, methodology and validation of the novel RGS are addressed and the same prosthetic foot as the subject was used and the appropriate vertical ground reaction force was realized with a proportional iterative learning controller.
Abstract: We have developed a robotic gait simulator (RGS) by leveraging a 6-degree of freedom parallel robot, with the goal of overcoming three significant challenges of gait simulation, including: 1) operating at near physiologically correct velocities; 2) inputting full scale ground reaction forces; and 3) simulating motion in all three planes (sagittal, coronal and transverse). The robot will eventually be employed with cadaveric specimens, but as a means of exploring the capability of the system, we have first used it with a prosthetic foot. Gait data were recorded from one transtibial amputee using a motion analysis system and force plate. Using the same prosthetic foot as the subject, the RGS accurately reproduced the recorded kinematics and kinetics and the appropriate vertical ground reaction force was realized with a proportional iterative learning controller. After six gait iterations the controller reduced the root mean square (RMS) error between the simulated and in situ vertical ground reaction force to 35 N during a 1.5 s simulation of the stance phase of gait with a prosthetic foot. This paper addresses the design, methodology and validation of the novel RGS.
TL;DR: This paper uses an adaptive background-foreground separation technique to extract motion information and generate silhouettes (foreground) from the input videos, and derives directionality-based feature vectors from the silhouette contours and uses the distinct data distribution of directional vectors in a vector space for clustering and recognition.
Abstract: Recent advances in computer vision and pattern recognition have fueled numerous initiatives that aim to intelligently recognize human activities. In this paper, we propose an algorithm for nonintrusive human activity recognition. We use an adaptive background-foreground separation technique to extract motion information and generate silhouettes (foreground) from the input videos. We then derive directionality-based feature vectors (directional vectors) from the silhouette contours and use the distinct data distribution of directional vectors in a vector space for clustering and recognition. We also exploit the dynamic characteristic of human motion in order to smooth decisions over time and reduce errors in activity recognition. Our approach is monocular, tolerant to moderate view changes, and can be applied to both frontal and lateral views of most activities. Experiments with short and long video sequences show robust recognition under conditions of varying view angles, zoom depths, backgrounds, and frame rates.
TL;DR: In this paper, the authors presented the design and validation of a three-segment human body model aimed at the reconstruction of motion trajectories of the shank, thigh and HAT segments in sit-to-stand motion using low cost inertial sensors.
TL;DR: A classification of all the motion segmentation techniques into different categories according to their main principle and features is proposed, pointing out their strengths and weaknesses and suggesting further research directions.
Abstract: Motion segmentation is an essential process for many computer vision algorithms. During the last decade, a large amount of work has been trying to tackle this challenge, however, performances of most of them still fall far behind human perception. In this paper the motion segmentation problem is studied, analyzing and reviewing the most important and newest techniques. We propose a classification of all these techniques into different categories according to their main principle and features. Moreover, we point out their strengths and weaknesses and finally we suggest further research directions.
TL;DR: In this article, a high-speed biplane radiography system for in-vivo assessment of joint function is provided, which can acquire stereo-pair radiographic images at rates from 30-4000 frames per second of nearly any motion or joint.
Abstract: A high-speed biplane radiography system for in-vivo assessment of joint function is provided. The system can acquire stereo-pair radiographic images at rates from 30-4000 frames per second of nearly any motion or joint. The radiographic equipment can be mounted in a gantry system that provides sufficient positioning flexibility for imaging different joints of a subject's body, along with an imaging area large enough for a variety of dynamic activities (e.g., walking, running, jumping, throwing, etc.). Three-dimensional (3D) bone positions can be determined using software for matching the bones in each X-ray image with 3D models developed from subject-specific CT (computed tomography) scans. This system can provide accurate (e.g., ±0.1 mm) assessment and direct 3D visualization of dynamic joint function, and can overcome limitations of conventional gate or motion analysis.
TL;DR: The originality of the method consists in characterizing each sequence globally, to enhance the robustness, and a binary volume is obtained, composed by all the silhouettes of the moving person.
TL;DR: Zhang et al. as discussed by the authors proposed a saliency map model that skips an unwanted area and pays attention to a desired area, which reflects the human preference and refusal in subsequent visual search processes.
Abstract: We propose new integrated saliency map and selective motion analysis models partly inspired by a biological visual attention mechanism. The proposed models consider not only binocular stereopsis to identify a final attention area so that the system focuses on the closer area as in human binocular vision, based on the single eye alignment hypothesis, but also both the static and dynamic features of an input scene. Moreover, the proposed saliency map model includes an affective computing process that skips an unwanted area and pays attention to a desired area, which reflects the human preference and refusal in subsequent visual search processes. In addition, we show the effectiveness of considering the symmetry feature determined by a neural network and an independent component analysis (ICA) filter which are helpful to construct an object preferable attention model. Also, we propose a selective motion analysis model by integrating the proposed saliency map with a neural network for motion analysis. The neural network for motion analysis responds selectively to rotation, expansion, contraction and planar motion of the optical flow in a selected area. Experiments show that the proposed model can generate plausible scan paths and selective motion analysis results for natural input scenes.
TL;DR: The proposed integrated saliency map model includes an affective computing process that skips an unwanted area and pays attention to a desired area, which reflects the human preference and refusal in subsequent visual search processes.
TL;DR: This work addresses the drift problem for the challenging task of human motion capture and tracking in the presence of multiple moving objects where the error accumulation becomes even more problematic due to occlusions and proposes an analysis-by-synthesis framework for articulated models.
Abstract: Model-based 3D tracker estimate the position, rotation, and joint angles of a given model from video data of one or multiple cameras. They often rely on image features that are tracked over time but the accumulation of small errors results in a drift away from the target object. In this work, we address the drift problem for the challenging task of human motion capture and tracking in the presence of multiple moving objects where the error accumulation becomes even more problematic due to occlusions. To this end, we propose an analysis-by-synthesis framework for articulated models. It combines the complementary concepts of patch-based and region-based matching to track both structured and homogeneous body parts. The performance of our method is demonstrated for rigid bodies, body parts, and full human bodies where the sequences contain fast movements, self-occlusions, multiple moving objects, and clutter. We also provide a quantitative error analysis and comparison with other model-based approaches.
TL;DR: This paper proposes here a dense motion estimator dedicated to the extraction of 3-D wind fields characterizing the dynamics of a layered atmosphere, using a multilayer model describing a stack of dynamic horizontal layers of evolving thickness.
Abstract: In this paper, we address the problem of estimating 3-D motions of a stratified atmosphere from satellite image sequences. The analysis of 3-D atmospheric fluid flows associated with incomplete observation of atmospheric layers due to the sparsity of cloud systems is very difficult. This makes the estimation of dense atmospheric motion field from satellite image sequences very difficult. The recovery of the vertical component of fluid motion from a monocular sequence of image observations is a very challenging problem for which no solution exists in the literature. Based on a physically sound vertical decomposition of the atmosphere into cloud layers of different altitudes, we propose here a dense motion estimator dedicated to the extraction of 3-D wind fields characterizing the dynamics of a layered atmosphere. Wind estimation is performed over the complete 3-D space, using a multilayer model describing a stack of dynamic horizontal layers of evolving thickness, interacting at their boundaries via vertical winds. The efficiency of our approach is demonstrated on synthetic and real sequences.
TL;DR: This paper presents a semiautomatic motion-labeling scheme based on force-based motion segmentation and learning-based action classification and proposes a scheme for capturing the interactions between two players, guided by the motion transition model.
Abstract: In this paper, we deal with the problem of synthesizing novel motions of standing-up martial arts such as kickboxing, karate, and taekwondo performed by a pair of humanlike characters while reflecting their interactions. Adopting an example-based paradigm, we address three nontrivial issues embedded in this problem: motion modeling, interaction modeling, and motion synthesis. For the first issue, we present a semiautomatic motion-labeling scheme based on force-based motion segmentation and learning-based action classification. We also construct a pair of motion transition graphs, each of which represents an individual motion stream. For the second issue, we propose a scheme for capturing the interactions between two players. A dynamic Bayesian network is adopted to build a motion transition model on top of the coupled motion transition graph that is constructed from an example motion stream. For the last issue, we provide a scheme for synthesizing a novel sequence of coupled motions, guided by the motion transition model. Although the focus of the present work is on martial arts, we believe that the framework of the proposed approach can be conveyed to other two-player motions as well.
TL;DR: This paper addresses the problem of associating trajectories across multiple moving airborne cameras with geometric constraints on the relationship between the motion of each object across cameras without assuming any prior calibration information, and shows that, under special conditions, trajectories interrupted due to occlusion or missing detections can be repaired.
Abstract: A camera mounted on an aerial vehicle provides an excellent means to monitor large areas of a scene. Utilizing several such cameras on different aerial vehicles allows further flexibility in terms of increased visual scope and in the pursuit of multiple targets. In this paper, we address the problem of associating trajectories across multiple moving airborne cameras. We exploit geometric constraints on the relationship between the motion of each object across cameras without assuming any prior calibration information. Since multiple cameras exist, ensuring coherency in association is an essential requirement, e.g., that transitive closure is maintained between more than two cameras. To ensure such coherency, we pose the problem of maximizing the likelihood function as a k-dimensional matching and use an approximation to find the optimal assignment of association. Using the proposed error function, canonical trajectories of each object and optimal estimates of intercamera transformations (in a maximum likelihood sense) are computed. Finally, we show that, as a result of associating trajectories across the cameras, under special conditions, trajectories interrupted due to occlusion or missing detections can be repaired. Results are shown on a number of real and controlled scenarios with multiple objects observed by multiple cameras, validating our qualitative models, and, through simulation, quantitative performance is also reported.
TL;DR: The proof of concept for a new non-invasive FES-assisted rehabilitation system for the upper limb, called smartFES (sFES), where the electrical stimulation is controlled by a biologically inspired neural inverse dynamics model fed by the kinematic information associated with the execution of a planar goal-oriented movement is introduced.
Abstract: Restoration of upper limb movements in subjects recovering from stroke is an essential keystone in rehabilitative practices. Rehabilitation of arm movements, in fact, is usually a far more difficult one as compared to that of lower extremities. For these reasons, researchers are developing new methods and technologies so that the rehabilitative process could be more accurate, rapid and easily accepted by the patient. This paper introduces the proof of concept for a new non-invasive FES-assisted rehabilitation system for the upper limb, called smartFES (sFES), where the electrical stimulation is controlled by a biologically inspired neural inverse dynamics model, fed by the kinematic information associated with the execution of a planar goal-oriented movement. More specifically, this work details two steps of the proposed system: an ad hoc markerless motion analysis algorithm for the estimation of kinematics, and a neural controller that drives a synthetic arm. The vision of the entire system is to acquire kinematics from the analysis of video sequences during planar arm movements and to use it together with a neural inverse dynamics model able to provide the patient with the electrical stimulation patterns needed to perform the movement with the assisted limb. The markerless motion tracking system aims at localizing and monitoring the arm movement by tracking its silhouette. It uses a specifically designed motion estimation method, that we named Neural Snakes, which predicts the arm contour deformation as a first step for a silhouette extraction algorithm. The starting and ending points of the arm movement feed an Artificial Neural Controller, enclosing the muscular Hill's model, which solves the inverse dynamics to obtain the FES patterns needed to move a simulated arm from the starting point to the desired point. Both position error with respect to the requested arm trajectory and comparison between curvature factors have been calculated in order to determine the accuracy of the system. The proposed method has been tested on real data acquired during the execution of planar goal-oriented arm movements. Main results concern the capability of the system to accurately recreate the movement task by providing a synthetic arm model with the stimulation patterns estimated by the inverse dynamics model. In the simulation of movements with a length of ± 20 cm, the model has shown an unbiased angular error, and a mean (absolute) position error of about 1.5 cm, thus confirming the ability of the system to reliably drive the model to the desired targets. Moreover, the curvature factors of the factual human movements and of the reconstructed ones are similar, thus encouraging future developments of the system in terms of reproducibility of the desired movements. A novel FES-assisted rehabilitation system for the upper limb is presented and two parts of it have been designed and tested. The system includes a markerless motion estimation algorithm, and a biologically inspired neural controller that drives a biomechanical arm model and provides the stimulation patterns that, in a future development, could be used to drive a smart Functional Electrical Stimulation system (sFES). The system is envisioned to help in the rehabilitation of post stroke hemiparetic patients, by assisting the movement of the paretic upper limb, once trained with a set of movements performed by the therapist or in virtual reality. Future work will include the application and testing of the stimulation patterns in real conditions.
TL;DR: This work is dedicated to a statistical trajectory-based approach addressing two issues related to dynamic video content understanding: recognition of events and detection of unexpected events.
Abstract: This work is dedicated to a statistical trajectory-based approach addressing two issues related to dynamic video content understanding: recognition of events and detection of unexpected events. Appropriate local differential features combining curvature and motion magnitude are defined and robustly computed on the motion trajectories in the image sequence. These features are invariant to image translation, in-the-plane rotation and spatial scaling. The temporal causality of the features is then captured by hidden Markov models dedicated to trajectory description, whose states are properly quantized values. The similarity between trajectories is expressed by exploiting this quantization-based HMM framework. Moreover statistical techniques have been developed for parameter estimations. Evaluations of the method have been conducted on several data sets including real trajectories obtained from sport videos, especially Formula One and ski TV program. The novel method compares favorably with other methods including feature histogram comparisons, HMM/GMM modeling and SVM classification.