TL;DR: A new system that applies an example-based learning method to learn facial motion patterns from a video sequence of individual facial behavior, and using that to create vivid three-dimensional (3-D) face animation according to the definition of MPEG-4 face animation parameters.
Abstract: We present a new system that applies an example-based learning method to learn facial motion patterns from a video sequence of individual facial behavior such as lip motion and facial expressions, and using that to create vivid three-dimensional (3-D) face animation according to the definition of MPEG-4 face animation parameters. The system consists of three key modules, face tracking, pattern learning, and face animation. In face tracking, to reduce the complexity of the tracking process, a novel coarse-to-fine strategy combined with a Kalman filter is proposed for localizing key facial landmarks in each image of the video. The landmarks' sequence is normalized into a visual feature matrix and then fed to the next step of process. In pattern learning, in the pretraining stage, the parameters of the camera that took the video are requested with the training video data so the system can estimate the basic mapping from a normalized two-dimensional (2-D) visual feature matrix to the representation in 3-D MPEG-4 face animation parameter space, in assistance with the computer vision method. In the practice stage, considering that in most cases camera parameters are not provided with video data, the system uses machine learning technology to complement the incomplete 3-D information for the mapping that information is needed in face orientation presentation. The example-based learning in this system integrates several methods including clustering, HMM, and ANN to make a better conversion from a 2-D to 3-D model and better estimation of incomplete 3-D information for good mapping; this will be used to drive face animation thereafter. In face animation, the system can synthesize face animation following any type of face motion in video. Experiments show that our system produces more vivid face motion animation, compared to other early systems.
TL;DR: A novel two dimensional (2D) facial animation communication system is presented which enables the embedded sites to gain awareness of PC users' facial expression via facial animation.
Abstract: Video-based channel that can transmit people's facial expression information is designed to support real-time interaction between collaborators. Yet exchanging facial video in ubiquitous environment is confined by the low bandwidth and limited computational resources on mobile embedded devices. In this paper, we present a novel two dimensional (2D) facial animation communication system which enables the embedded sites to gain awareness of PC users' facial expression via facial animation. The system was examined by experiments and survey. Results show that 2D facial animation provides participants' emotion awareness and is a novel visual communication pattern.
TL;DR: A new method combining the CANDIDE model with the Lucas-Kanade algorithm is proposed to track face features and extract face animation parameters and the experimental results show the face animation parameter extraction and driving are accurate.
Abstract: A new method combining the CANDIDE model with the Lucas-Kanade algorithm is proposed to track face features and extract face animation parameters. To adapt a general mesh model to a specific face scan data, a face model is made. Combining the parameter model and muscle model, the face animation is created finally by driving face animation parameters. The experimental results show the face animation parameter extraction and driving are accurate.
TL;DR: In this paper, a collaborative filtering-based real-time voice-driven human face and lip synchronous animation system is presented, where a human head model makes lip animation synchronous with the input voice.
Abstract: The invention discloses a collaborative filtering-based real-time voice-driven human face and lip synchronous animation system. By inputting voice in real time, a human head model makes lip animation synchronous with the input voice. The system comprises an audio/video coding module, a collaborative filtering module, and an animation module; the module respectively performs Mel frequency cepstrum parameter coding and human face animation parameter coding in the standard of Moving Picture Experts Group (MPEG-4) on the acquired voice and human face three-dimensional characteristic point motion information to obtain a Mel frequency cepstrum parameter and human face animation parameter multimodal synchronous library; the collaborative filtering module solves a human face animation parameter synchronous with the voice by combining Mel frequency cepstrum parameter coding of the newly input voice and the Mel frequency cepstrum parameter and human face animation parameter multimodal synchronous library through collaborative filtering; and the animation module carries out animation by driving the human face model through the human face animation parameter. The system has the advantages of better sense of reality, real-time and wider application environment.