TL;DR: These methods for quantifying gait pathology with commodity cameras increase access to quantitative motion analysis in clinics and at home and enable researchers to conduct large-scale studies of neurological and musculoskeletal disorders.
Abstract: Many neurological and musculoskeletal diseases impair movement, which limits people’s function and social participation. Quantitative assessment of motion is critical to medical decision-making but is currently possible only with expensive motion capture systems and highly trained personnel. Here, we present a method for predicting clinically relevant motion parameters from an ordinary video of a patient. Our machine learning models predict parameters include walking speed (r = 0.73), cadence (r = 0.79), knee flexion angle at maximum extension (r = 0.83), and Gait Deviation Index (GDI), a comprehensive metric of gait impairment (r = 0.75). These correlation values approach the theoretical limits for accuracy imposed by natural variability in these metrics within our patient population. Our methods for quantifying gait pathology with commodity cameras increase access to quantitative motion analysis in clinics and at home and enable researchers to conduct large-scale studies of neurological and musculoskeletal disorders. In the context of diseases impairing movement, quantitative assessment of motion is critical to medical decision-making but is currently possible only with expensive motion capture systems and trained personnel. Here, the authors present a method for predicting clinically relevant motion parameters from an ordinary video of a patient.
TL;DR: While size did not affect the joint moment prediction, the addition of noise to the dataset resulted in an improved prediction accuracy, indicating that research on appropriate augmentation techniques for biomechanical data is useful to further improve machine learning applications.
Abstract: Enhancement of activity is one major topic related to the aging society. Therefore, it is necessary to understand people's motion and identify possible risk factors during activity. Technology can be used to monitor motion patterns during daily life. Especially the use of artificial intelligence combined with wearable sensors can simplify measurement systems and might at some point replace the standard motion capturing using optical measurement technologies. Therefore, this study aims to analyze the estimation of 3D joint angles and joint moments of the lower limbs based on IMU data using a feedforward neural network. The dataset summarizes optical motion capture data of former studies and additional newly collected IMU data. Based on the optical data, the acceleration and angular rate of inertial sensors was simulated. The data was augmented by simulating different sensor positions and orientations. In this study, gait analysis was undertaken with 30 participants using a conventional motion capture set-up based on an optoelectronic system and force plates in parallel with a custom IMU system consisting of five sensors. A mean correlation coefficient of 0.85 for the joint angles and 0.95 for the joint moments was achieved. The RMSE for the joint angle prediction was smaller than 4.8° and the nRMSE for the joint moment prediction was below 13.0%. Especially in the sagittal motion plane good results could be achieved. As the measured dataset is rather small, data was synthesized to complement the measured data. The enlargement of the dataset improved the prediction of the joint angles. While size did not affect the joint moment prediction, the addition of noise to the dataset resulted in an improved prediction accuracy. This indicates that research on appropriate augmentation techniques for biomechanical data is useful to further improve machine learning applications.
TL;DR: OpenPose based motion analysis is reliable and has the advantage of being low cost and easier to operate than conventional methods, and in future, it is necessary to identify the error between the true values indicating actual joint movement and data obtained by OpenPose with its correction for fixed and proportional biases.
TL;DR: A novel neural network is proposed that infers the per-pixel warp fields for video stabilization from the optical flow fields of the input video, achieving quantitatively and visually better results than the state-of-the-art optimization based and deep learning based video stabilization methods.
Abstract: We propose a novel neural network that infers the per-pixel warp fields for video stabilization from the optical flow fields of the input video. While previous learning based video stabilization methods attempt to implicitly learn frame motions from color videos, our method resorts to optical flow for motion analysis and directly learns the stabilization using the optical flow. We also propose a pipeline that uses optical flow principal components for motion inpainting and warp field smoothing, making our method robust to moving objects, occlusion and optical flow inaccuracy, which is challenging for other video stabilization methods. Our method achieves quantitatively and visually better results than the state-of-the-art optimization based and deep learning based video stabilization methods. Our method also gives a ~3x speed improvement compared to the optimization based methods.
TL;DR: An efficient and robust real-time approach for automatic vehicle detection and tracking in aerial videos that employ both detections and tracking features to enhance the final decision and achieves a fast processing speed.
Abstract: Real-time automatic detection and tracking of moving vehicles in videos acquired by airborne cameras is a challenging problem due to vehicle occlusion, camera movement, and high computational cost. This paper presents an efficient and robust real-time approach for automatic vehicle detection and tracking in aerial videos that employ both detections and tracking features to enhance the final decision. The use of Top-hat and Bottom-hat transformation aided by the morphological operation in the detection phase has been adopted. After detection, background regions are eliminated by motion feature points’ analysis of the obtained object regions using a combined technique between KLT tracker and K-means clustering. Obtained object features are clustered into separate objects based on their motion characteristic. Finally, an efficient connecting algorithm is introduced to assign the vehicle labels with their corresponding cluster trajectories. The proposed method was tested on videos taken in different scenarios. The experimental results showed that the recall, precision, and tracking accuracy of the proposed method were about 95.1 %, 97.5%, and 95.2%, respectively. The method also achieves a fast processing speed. Thus, the proposed approach has superior overall performance compared to newly published approaches.
TL;DR: This study proposes a novel Joint kinematic estimation method that tightly incorporates the connection between adjacent segments within a sensor fusion algorithm, to obtain drift-free joint kinematics.
Abstract: The ability to capture joint kinematics in outside-laboratory environments is clinically relevant. In order to estimate kinematics, inertial measurement units can be attached to body segments and their absolute orientations can be estimated. However, the heading part of such orientation estimates is known to drift over time, resulting in drifting joint kinematics. This study proposes a novel joint kinematic estimation method that tightly incorporates the connection between adjacent segments within a sensor fusion algorithm, to obtain drift-free joint kinematics. Drift in the joint kinematics is eliminated solely by utilizing common information in the accelerometer and gyroscope measurements of sensors placed on connecting segments. Both an optimization-based smoothing and a filtering approach were implemented. Validity was assessed on a robotic manipulator under varying measurement durations and movement excitations. Standard deviations of the estimated relative sensor orientations were below 0.89° in an optimization-based smoothing implementation for all robot trials. The filtering implementation yielded similar results after convergence. The method is proven to be applicable in biomechanics, with a prolonged gait trial of 7 minutes on 11 healthy subjects. Three-dimensional knee joint angles were estimated, with mean RMS errors of 2.14°, 1.85°, 3.66° in an optimization-based smoothing implementation and mean RMS errors of 3.08°, 2.42°, 4.47° in a filtering implementation, with respect to a golden standard optical motion capture reference system.
TL;DR: This study compares the performance of these two kinds of neural networks on the prediction of ground reaction force and joint moments of the lower limbs during gait based on joint angles determined by optical motion capture as input data.
TL;DR: A wearable pose estimation system (WePosE), based on inertial measurements units (IMUs), for motion analysis and body tracking, which does not suffer from occlusion problems and lighting conditions, it is cost effective and it can be used in indoor and outdoor environments.
Abstract: Estimating the limbs pose in a wearable way may benefit multiple areas such as rehabilitation, teleoperation, human-robot interaction, gaming, and many more. Several solutions are commercially available, but they are usually expensive or not wearable/portable. We present a wearable pose estimation system (WePosE), based on inertial measurements units (IMUs), for motion analysis and body tracking. Differently from camera-based approaches, the proposed system does not suffer from occlusion problems and lighting conditions, it is cost effective and it can be used in indoor and outdoor environments. Moreover, since only accelerometers and gyroscopes are used to estimate the orientation, the system can be used also in the presence of iron and magnetic disturbances. An experimental validation using a high precision optical tracker has been performed. Results confirmed the effectiveness of the proposed approach.
TL;DR: In this paper, a style transfer network encodes motions into two latent codes, for content and for style, each of which plays a different role in the decoding (synthesis) process.
Abstract: Transferring the motion style from one animation clip to another, while preserving the motion content of the latter, has been a long-standing problem in character animation. Most existing data-driven approaches are supervised and rely on paired data, where motions with the same content are performed in different styles. In addition, these approaches are limited to transfer of styles that were seen during training. In this paper, we present a novel data-driven framework for motion style transfer, which learns from an unpaired collection of motions with style labels, and enables transferring motion styles not observed during training. Furthermore, our framework is able to extract motion styles directly from videos, bypassing 3D reconstruction, and apply them to the 3D input motion. Our style transfer network encodes motions into two latent codes, for content and for style, each of which plays a different role in the decoding (synthesis) process. While the content code is decoded into the output motion by several temporal convolutional layers, the style code modifies deep features via temporally invariant adaptive instance normalization (AdaIN). Moreover, while the content code is encoded from 3D joint rotations, we learn a common embedding for style from either 3D or 2D joint positions, enabling style extraction from videos. Our results are comparable to the state-of-the-art, despite not requiring paired training data, and outperform other methods when transferring previously unseen styles. To our knowledge, we are the first to demonstrate style transfer directly from videos to 3D animations - an ability which enables one to extend the set of style examples far beyond motions captured by MoCap systems.
TL;DR: It is proposed that style translation is an effective way to transform adult motion capture data to the style of child motion, and results show that the translated adult motions are recognized as child motions significantly more often than adult motions.
Abstract: Child characters are commonly seen in leading roles in top-selling video games. Previous studies have shown that child motions are perceptually and stylistically different from those of adults. Creating motion for these characters by motion capturing children is uniquely challenging because of confusion, lack of patience and regulations. Retargeting adult motion, which is much easier to record, onto child skeletons, does not capture the stylistic differences. In this paper, we propose that style translation is an effective way to transform adult motion capture data to the style of child motion. Our method is based on CycleGAN, which allows training on a relatively small number of sequences of child and adult motions that do not even need to be temporally aligned. Our adult2child network converts short sequences of motions called motion words from one domain to the other. The network was trained using a motion capture database collected by our team containing 23 locomotion and exercise motions. We conducted a perception study to evaluate the success of style translation algorithms, including our algorithm and recently presented style translation neural networks. Results show that the translated adult motions are recognized as child motions significantly more often than adult motions.
TL;DR: Training a long short-term memory neural network on the prediction of 3D lower limb joint angles based on inertial data showed that three sensors placed on the pelvis and both shanks are sufficient, and the application of principal component analysis to the data of five sensors did not reveal improved results.
Abstract: The use of machine learning to estimate joint angles from inertial sensors is a promising approach to in-field motion analysis. In this context, the simplification of the measurements by using a small number of sensors is of great interest. Neural networks have the opportunity to estimate joint angles from a sparse dataset, which enables the reduction of sensors necessary for the determination of all three-dimensional lower limb joint angles. Additionally, the dimensions of the problem can be simplified using principal component analysis. Training a long short-term memory neural network on the prediction of 3D lower limb joint angles based on inertial data showed that three sensors placed on the pelvis and both shanks are sufficient. The application of principal component analysis to the data of five sensors did not reveal improved results. The use of longer motion sequences compared to time-normalised gait cycles seems to be advantageous for the prediction accuracy, which bridges the gap to real-time applications of long short-term memory neural networks in the future.
TL;DR: A Conditional Generative Adversarial Networks (GAN) based model is proposed to predict complex motions in UAV videos and robust motion prediction and improved MOT performance are achieved compared with state-of-the-art methods.
TL;DR: The MoCA dataset is introduced and its peculiarities are discussed, and a baseline analysis is discussed as well as examples of applications for which the dataset is well suited.
Abstract: MoCA is a bi-modal dataset in which we collect Motion Capture data and video sequences acquired from multiple views, including an ego-like viewpoint, of upper body actions in a cooking scenario. It has been collected with the specific purpose of investigating view-invariant action properties in both biological and artificial systems. Besides that, it represents an ideal test bed for research in a number of fields - including cognitive science and artificial vision - and application domains - as motor control and robotics. Compared to other benchmarks available, MoCA provides a unique compromise for research communities leveraging very different approaches to data gathering: from one extreme of action recognition in the wild - the standard practice nowadays in the fields of Computer Vision and Machine Learning - to motion analysis in very controlled scenarios - as for motor control in biomedical applications. In this work we introduce the dataset and its peculiarities, and discuss a baseline analysis as well as examples of applications for which the dataset is well suited.
TL;DR: The lower limb joint angles estimated using the extended Kalman filter with noise covariance matrices based on sensor output were generally consistent with results obtained from the optical 3D motion analysis system.
Abstract: This paper presents an extended Kalman filter for pose estimation using noise covariance matrices based on sensor output. Compact and lightweight nine-axis motion sensors are used for motion analysis in widely various fields such as medical welfare and sports. A nine-axis motion sensor includes a three-axis gyroscope, a three-axis accelerometer, and a three-axis magnetometer. Information obtained from the three sensors is useful for estimating joint angles using the Kalman filter. The extended Kalman filter is used widely for state estimation because it can estimate the status with a small computational load. However, determining the process and observation noise covariance matrices in the extended Kalman filter is complicated. The noise covariance matrices in the extended Kalman filter were found for this study based on the sensor output. Postural change appears in the gyroscope output because the rotational motion of the joints produces human movement. Therefore, the process noise covariance matrix was determined based on the gyroscope output. An observation noise covariance matrix was determined based on the accelerometer and magnetometer output because the two sensors’ outputs were used as observation values. During a laboratory experiment, the lower limb joint angles of three participants were measured using an optical 3D motion analysis system and nine-axis motion sensors while participants were walking. The lower limb joint angles estimated using the extended Kalman filter with noise covariance matrices based on sensor output were generally consistent with results obtained from the optical 3D motion analysis system. Furthermore, the lower limb joint angles were measured using nine-axis motion sensors while participants were running in place for about 100 s. The experiment results demonstrated the effectiveness of the proposed method for human pose estimation.
TL;DR: A novel subspace learning approach is developed, which pursues regularized low-rank and sparse representation for multishot person Re-ID and integrates the recurring pattern prior into the model to refine the affinities among images.
Abstract: This paper addresses the challenging problem of multishot person reidentification (Re-ID) in real world uncontrolled surveillance systems. A key issue is how to effectively represent and process the multiple data with various appearance information due to the variations of pose, occlusions, and viewpoints. To this end, this paper develops a novel subspace learning approach, which pursues regularized low-rank and sparse representation for multishot person Re-ID. For the images of a person crossing a certain camera, we assume that the appearances of those subset images with similar viewpoints against a camera draw from the same low-rank subspace, and all the images of a person under a camera lie on a union of low-rank subspaces. Based on this assumption, we propose to learn a nonnegative low-rank and sparse graph to represent the person images. Moreover, the recurring pattern prior is integrated into our model to refine the affinities among images. Extensive experiments on four public benchmark datasets yield impressive performance by improving 22.9% on imagery library for intelligent detection systems video re identification (iLIDS-VID), 42.4% on person RE-ID (PRID) dataset 2011, 39.7% and 30.6% on speech, audio, image, and video technology-SoftBio camera 3/8 and camera 5/8, respectively, and 1.6% on motion analysis and re identification set compared to the state-of-the-art methods.
TL;DR: The proposed method provides a novel approach in generating self-propelled locomotion, and designing and computing the visco-elastic parameters for energy efficacy, which is mainly periodic and desirable forward motion.
Abstract: This paper studies the dynamics and motion generation of a self-propelled robotic system with a visco-elastic joint. The system is underactuated, legless and wheelless, and has potential applications in environmental inspection and operation in restricted spaces which are inaccessible to human beings, such as pipeline inspection, medical assistance and disaster rescue. Locomotion of the system relies on the stick–slip effects, which interacts with the frictional force of the surface in contact. The nonlinear robotic model utilizes combined tangential-wise and normal-wise vibrations for underactuated locomotion, which features a generic significance for the studies on self-propelled systems. To identify the characteristics of the visco-elastic joint and shed light on the energy efficacy, parameter dependences on stiffness and damping coefficients are thoroughly analysed. Our studies demonstrate that the dynamic behaviour of the self-propelled system is mainly periodic and desirable forward motion is achieved via identification of the variation laws of the control parameters and elaborate selection of the stiffness and damping coefficients. A motion generation strategy is developed, and an analytical two-stage motion profile is proposed based on the system response and dynamic constraint analysis, followed by a parameterization procedure to optimally generate the trajectory. The proposed method provides a novel approach in generating self-propelled locomotion, and designing and computing the visco-elastic parameters for energy efficacy. Simulation results are presented to demonstrate the effectiveness and feasibility of the proposed model and motion generation approach.
TL;DR: The proposed video steganography achieves less perceptual distortion to human eyes and it's resistant against reducing video storage.
Abstract: Steganography is a technique of concealing the message in multimedia data. Multimedia data, such as videos are often compressed to reduce the storage for limited bandwidth. The video provides additional hidden-space in the object motion of image sequences. This research proposes a video steganography scheme based on object motion and DCT-psychovisual for concealing the message. The proposed hiding technique embeds a secret message along the object motion of the video frames. Motion analysis is used to determine the embedding regions. The proposed scheme selects six DCT coefficients in the middle frequency using DCT- psychovisual effects of hiding messages. A message is embedded by modifying middle DCT coefficients using the proposed algorithm. The middle frequencies have a large hiding capacity and it relatively does not give significant effect to the video reconstruction. The performance of the proposed video steganography is evaluated in terms of video quality and robustness against MPEG compression. The experimental results produce minimum distortion of the video quality. Our scheme produces a robust of hiding messages against MPEG-4 compression with average NC value of 0.94. The proposed video steganography achieves less perceptual distortion to human eyes and it's resistant against reducing video storage.
TL;DR: A System-on-Chip approach, implemented in Xilinx Zynq SoC is proposed that will be efficient in terms of power and resource utilization as the hardware is configured based on the property of input video.
TL;DR: This paper evaluates machine learning techniques, based on RNN, to evaluate the fatigue factor caused by repetitive motions using time-stamped motion data collected using infrared cameras while a subject performs one of the repetitive motions.
Abstract: Industrial Revolution 4.0 is defined as the interconnection of Information, Communications Technologies (ICT), and factory floor workers. Workers in the material handling industry are often subject to repetitive motions that cause exhaustion (or fatigue) which leads to work-related musculoskeletal disorders (WMSDs). The most common repetitive motions are lifting, pulling, pushing, carrying and walking with load. In this research data is collected as time-stamped motion data using infrared cameras at a rate of 100Hz while a subject performs one of the repetitive motions (i.e. lifting). The data is a combination of xyz-coordinates of 39 reflective markers. This results in 117 data points for each frame captured. Since these motions occur over time for a duration of time, this data is used as input to a time-series machine learning (ML) model such as Recurrent Neural Network (RNN). Using this model, this paper evaluates machine learning techniques, based on RNN, to evaluate the fatigue factor caused by repetitive motions.
TL;DR: This work derives an inference procedure that utilizes short observation sequences of an object in motion without need for markers or learned body models and demonstrates robust part decompositions of moving objects under both 3D and 2D observation models.
Abstract: Articulated motion analysis often utilizes strong prior knowledge such as a known or trained parts model for humans. Yet, the world contains a variety of articulating objects--mammals, insects, mechanized structures--where the number and configuration of parts for a particular object is unknown in advance. Here, we relax such strong assumptions via an unsupervised, Bayesian nonparametric parts model that infers an unknown number of parts with motions coupled by a body dynamic and parameterized by SE(D), the Lie group of rigid transformations. We derive an inference procedure that utilizes short observation sequences (image, depth, point cloud or mesh) of an object in motion without need for markers or learned body models. Efficient Gibbs decompositions for inference over distributions on SE(D) demonstrate robust part decompositions of moving objects under both 3D and 2D observation models. The inferred representation permits novel analysis, such as object segmentation by relative part motion, and transfers to new observations of the same object type.
TL;DR: To analyze human motions, a framework to transform motions into the instantaneous frequency-domain using the Hilbert-Huang transform (HHT) is presented and reveals that the multivariate EMD can decompose complicated human motions into a finite number of nonlinear modes (IMFs) corresponding to distinct motion primitives.
Abstract: Motion capture data are widely used in different research fields such as medical, entertainment, and industry. However, most motion researches using motion capture data are carried out in the time-domain. To understand human motion complexities, it is necessary to analyze motion data in the frequency-domain. In this paper, to analyze human motions, we present a framework to transform motions into the instantaneous frequency-domain using the Hilbert-Huang transform (HHT). The empirical mode decomposition (EMD) that is a part of HHT decomposes nonstationary and nonlinear signals captured from the real-world experiments into pseudo monochromatic signals, so-called intrinsic mode function (IMF). Our research reveals that the multivariate EMD can decompose complicated human motions into a finite number of nonlinear modes (IMFs) corresponding to distinct motion primitives. Analyzing these decomposed motions in Hilbert spectrum, motion characteristics can be extracted and visualized in instantaneous frequency-domain. For example, we apply our framework to (1) a jump motion, (2) a foot-injured gait, and (3) a golf swing motion.
TL;DR: This paper analyzes two video properties that are essential for respiratory motion analysis and various signal extraction approaches and finds that pixel movement can better quantify respiratory motion than pixel intensity variation in various conditions.
Abstract: Video-based motion analysis gave rise to contactless respiration rate monitoring that measures subtle respiratory movement from a human chest or belly. In this paper, we revisit this technology via a large video benchmark that includes six categories of practical challenges. We analyze two video properties (i.e. pixel intensity variation and pixel movement) that are essential for respiratory motion analysis and various signal extraction approaches (i.e. from conventional to recent Convolutional Neural Network (CNN)-based methods). We find that pixel movement can better quantify respiratory motion than pixel intensity variation in various conditions. We also conclude that the simple conventional approach (e.g. Zerophase Component Analysis) can achieve better performance than CNN that uses data training to define the extraction of respiration signal, which thus raises a more general question whether CNN can improve video-based physiological signal measurement.
TL;DR: This study validated the use of IMUs in the measurement of turning kinematics in healthy adults compared to a camera-based 3D motion analysis system and demonstrated that the IMU sensors produced reliable kinematic measures and showed excellent reliability.
Abstract: Camera-based 3D motion analysis systems are considered to be the gold standard for movement analysis. However, using such equipment in a clinical setting is prohibitive due to the expense and time-consuming nature of data collection and analysis. Therefore, Inertial Measurement Units (IMUs) have been suggested as an alternative to measure movement in clinical settings. One area which is both important and challenging is the assessment of turning kinematics in individuals with movement disorders. This study aimed to validate the use of IMUs in the measurement of turning kinematics in healthy adults compared to a camera-based 3D motion analysis system. Data were collected from twelve participants using a Vicon motion analysis system which were compared with data from four IMUs placed on the forehead, middle thorax, and feet in order to determine accuracy and reliability. The results demonstrated that the IMU sensors produced reliable kinematic measures and showed excellent reliability (ICCs 0.80–0.98) and no significant differences were seen in paired t-tests in all parameters when comparing the two systems. This suggests that the IMU sensors provide a viable alternative to camera-based motion capture that could be used in isolation to gather data from individuals with movement disorders in clinical settings and real-life situations.
TL;DR: Sliding-window label overlapping of time-series wearable motion data in training dataset acquisition is proposed to accurately detect foot–ground contact phases, which are composed of 3 sub-phases as well as the swing phase, at a frequency of 100 Hz with a convolutional neural network (CNN) architecture.
Abstract: Classification of foot–ground contact phases, as well as the swing phase is essential in biomechanics domains where lower-limb motion analysis is required; this analysis is used for lower-limb rehabilitation, walking gait analysis and improvement, and exoskeleton motion capture. In this study, sliding-window label overlapping of time-series wearable motion data in training dataset acquisition is proposed to accurately detect foot–ground contact phases, which are composed of 3 sub-phases as well as the swing phase, at a frequency of 100 Hz with a convolutional neural network (CNN) architecture. We not only succeeded in developing a real-time CNN model for learning and obtaining a test accuracy of 99.8% or higher, but also confirmed that its validation accuracy was close to 85%.
TL;DR: A novel and effective system based on multiple cameras to extract the events for soccer matches using the local-based deep neural network for the ball and player detection from the input images and a novel unsupervised U-encoder for the player labeling.
Abstract: In this article, we propose a novel and effective system based on multiple cameras to extract the events for soccer matches. A precise ontological definition of the soccer events is still an open point. According to our definition, the events include the free kick, corner kick, penalty kick and the goal, because they are the representative shots for the audience to watch. The events are very important for highlights selection and sport data analysis. At present, the events including the ball and players information are selected and labeled manually from the images, which is a big workload for the staffs. Addressing this problem, our system provides an automatic extraction of the events. For soccer videos, our system first uses the local-based deep neural network for the ball and player detection from the input images. Then, we handle with the ball and player bounding boxes separately. For players, a player can be labeled as one of the three types: two teams or the referee, and a novel unsupervised U-encoder is designed for the player labeling. For soccer ball, the application of multiple cameras allows us to refine the ball detection results. We can get the world coordinate of ball according to the camera parameters and then rebuild the ball trajectory and the court in a top view. Based on the reconstructed map, we get the soccer events by motion analysis of ball trajectory and then apply the ball location and player classification results to display the events for each camera. The test results on real videos of European soccer league show the good detection and labeling performance of our system. We find all the events in the test videos. Our proposed system can deal with many complex cases such as occlusion and pose variation that happen frequently in real applications.
TL;DR: A multisensor data fusion algorithm, which combines the complementary properties of gyroscopes, accelerometers, and magnetometers in order to estimate the 3D orientation of two body segments separately and with respect to another body segment considering the spatial relationship between them, and a method for performing 3D motion tracking of twoBody segments, based on the estimation of their orientation, including motion compensation.
TL;DR: In this article, the authors define task embodiment as the amount of task information encoded in an agent's motions, and use task-specific information embedded in motion to create detailed performance assessments.
Abstract: Motions carry information about the underlying task being executed. Previous work in human motion analysis suggests that complex motions may result from the composition of fundamental submovements called movemes. The existence of finite structure in motion motivates information-theoretic approaches to motion analysis and robotic assistance. We define task embodiment as the amount of task information encoded in an agent's motions. By decoding task-specific information embedded in motion, we can use task embodiment to create detailed performance assessments. We extract an alphabet of behaviors comprising a motion without \textit{a priori} knowledge using a novel algorithm, which we call dynamical system segmentation. For a given task, we specify an optimal agent, and compute an alphabet of behaviors representative of the task. We identify these behaviors in data from agent executions, and compare their relative frequencies against that of the optimal agent using the Kullback-Leibler divergence. We validate this approach using a dataset of human subjects (n=53) performing a dynamic task, and under this measure find that individuals receiving assistance better embody the task. Moreover, we find that task embodiment is a better predictor of assistance than integrated mean-squared-error.
TL;DR: In this article, several popular and state-of-the-art methods were reviewed, with the focus on the most important attributes These methods were classified according to the main approach taken, namely Image Difference, Optical Flow, Wavelet, Statistical, Layers, Manifold Clustering, Template Matching, and Deep Learning The investigated methods are compared and major research challenges are highlighted.
Abstract: Motion segmentation has applications in, amongst others, robotics, traffic monitoring, sports analysis, inspection, video surveillance, compression, and video indexing However, the performance of most methods is limited compared to human capabilities Based on extensive literature the following challenges remain: occlusions, temporary stopping, missing data, and segmenting multiple objects In this paper, several popular and state-of-the-art methods were reviewed, with the focus on the most important attributes These methods were classified according to the main approach taken, namely Image Difference, Optical Flow, Wavelet, Statistical, Layers, Manifold Clustering, Template Matching, and Deep Learning The investigated methods are compared and major research challenges are highlighted Based on the review, improvements are identified as a basis for future research
TL;DR: The survey is not only aimed at researchers with technical background, but also addresses sports scientists and emphasises the use and advantages of visionbased approaches for climbing motion analysis.