TL;DR: In this article, the role of animation in multimedia learning is examined, including multimedia instructional messages and microworld games, and a cognitive theory of multimedia learning, which has yielded seven principles for the use of animation for multimedia instruction.
Abstract: How can animation be used to promote learner understanding of scientific and mathematical explanations? In this review, we examine the role of animation in multimedia learning (including multimedia instructional messages and microworld games), present a cognitive theory of multimedia learning, and summarize our program of research, which has yielded seven principles for the use of animation in multimedia instruction. These include the multimedia principle (present animation and narration rather than narration alone), spatial contiguity principle (present on-screen text near rather than far from corresponding animation), temporal contiguity principle (present corresponding animation and narration simultaneously rather than successively), coherence principle (exclude extraneous words, sounds, and video), modality principle (present animation and narration rather than animation and onscreen text), redundancy principle (present animation and narration rather than animation, narration, and on-screen text), and personalization principle (present words in conversational rather than formal style). Animation can promote learner understanding when used in ways that are consistent with the cognitive theory of multimedia learning.
TL;DR: The issues and available tools in three key areas of virtual human research: face-to-face conversation, emotions and personality, and human figure animation are overviewed.
Abstract: Discusses some of the key issues that must be addressed in creating virtual humans, or androids. As a first step, we overview the issues and available tools in three key areas of virtual human research: face-to-face conversation, emotions and personality, and human figure animation. Assembling a virtual human is still a daunting task, but the building blocks are getting bigger and better every day.
TL;DR: Experimental results show that the synthetic expressive talking face of the iFACE system is comparable with a real face in terms of the effectiveness of their influences on bimodal human emotion perception.
Abstract: A real-time speech-driven synthetic talking face provides an effective multimodal communication interface in distributed collaboration environments. Nonverbal gestures such as facial expressions are important to human communication and should be considered by speech-driven face animation systems. In this paper, we present a framework that systematically addresses facial deformation modeling, automatic facial motion analysis, and real-time speech-driven face animation with expression using neural networks. Based on this framework, we learn a quantitative visual representation of the facial deformations, called the motion units (MUs). A facial deformation can be approximated by a linear combination of the MUs weighted by MU parameters (MUPs). We develop an MU-based facial motion tracking algorithm which is used to collect an audio-visual training database. Then, we construct a real-time audio-to-MUP mapping by training a set of neural networks using the collected audio-visual training database. The quantitative evaluation of the mapping shows the effectiveness of the proposed approach. Using the proposed method, we develop the functionality of real-time speech-driven face animation with expressions for the iFACE system. Experimental results show that the synthetic expressive talking face of the iFACE system is comparable with a real face in terms of the effectiveness of their influences on bimodal human emotion perception.
TL;DR: The motivation for this solution is the belief that a more anatomically appropriate control skeleton allows for more natural looking movement of a human or animal-like figure.
Abstract: The animation of an articulated figure is typically accomplished through the use of a corresponding control skeleton. Although the control skeleton is an effective tool, the manual construction of the skeleton can be a laborious process often requiring several hours of work and a fair degree of proficiency with the animation software used.
The focus of the research described here is the automatic generation of such control skeletons. To this end, two solutions to the problem are presented, one general and one specific. In both cases, the input is required to be a set of polygonal data that defines the figure, and the output is a description of a control skeleton to be used in animating that figure.
The general solution is widely applicable; it makes very few assumptions about the figure given as input or about the type of control skeleton that should be generated. A system is described that divides the problem into a series of steps, each of which is performed automatically. The basic process involves discretizing the figure, approximating its medial surface, and using that surface to construct a control skeleton. The system can produce a reasonably good control skeleton for any of a variety of figures in as little as one or two minutes on a low-end PC.
The specific solution builds upon the general one but is geared toward producing more desirable skeletons for the very common case involving human-like and animal-like figures. Certain assumptions are made about the figure and about the type of control skeleton desired. In addition, heuristics based upon human and animal anatomy are invoked to adjust the control skeleton so that it is more anatomically appropriate. The motivation for this solution is the belief that a more anatomically appropriate control skeleton allows for more natural looking movement of a human or animal-like figure.
Partly to support that claim, the system can produce geometry for individual bones that might function as the anatomical skeleton of the figure. This skeletal geometry can form the foundation for additional anatomical modeling that might add more realism to the animation of the figure.
TL;DR: This work developed an image-based rendering approach for displaying multiple avatars that takes advantage of the properties of the urban environment and the way a viewer and the avatars move within it to produce fast rendering, based on positional and directional discretization.
Abstract: Populated virtual urban environments are important in many applications, from urban planning to entertainment. At the current stage of technology, users can interactively navigate through complex, polygon-based scenes rendered with sophisticated lighting effects and high-quality antialiasing techniques. As a result, animated characters (or agents) that users can interact with are also becoming increasingly common. However, rendering crowded scenes with thousands of different animated virtual people in real time is still challenging. To address this, we developed an image-based rendering approach for displaying multiple avatars. We take advantage of the properties of the urban environment and the way a viewer and the avatars move within it to produce fast rendering, based on positional and directional discretization. To display many different individual people at interactive frame rates, we combined texture compression with multipass rendering.
TL;DR: A Parameterized Action Representation (PAR) is described that allows an agent to act, plan, and reason about its actions or actions of others, and is designed for building future behaviors into autonomous agents and controlling the animation parameters that portray personality, mood, and affect in an embodied agent.
Abstract: The last few years have seen great maturation in understanding how to use computer graphics technology to portray 3D embodied characters or virtual humans. Unlike the off-line, animator-intensive methods used in the special effects industry, real-time embodied agents are expected to exist and interact with us "live." They can be represent other people or function as autonomous helpers, teammates, or tutors enabling novel interactive educational and training applications. We should be able to interact and communicate with them through modalities we already use, such as language, facial expressions, and gesture. Various aspects and issues in real-time virtual humans will be discussed, including consistent parameterizations for gesture and facial actions using movement observation principles, and the representational basis for character believability, personality, and affect. We also describe a Parameterized Action Representation (PAR) that allows an agent to act, plan, and reason about its actions or actions of others. Besides embodying the semantics of human action, the PAR is designed for building future behaviors into autonomous agents and controlling the animation parameters that portray personality, mood, and affect in an embodied agent.
TL;DR: The ACM SIGGRAPH Symposium on Computer Animation as discussed by the authors was the first and founding year for the Symposium, which attracted a large number of submissions from the computer animation community.
Abstract: Welcome to the first ACM SIGGRAPH Symposium on Computer Animation! The creation of this symposium was motivated by the need for a forum where researchers and practioners in computer animation could interact on a smaller and more intimate scale than a conference such as SIGGRAPH. It is our hope that the symposium will play an important role in fostering an improved sense of community among its participants.The active participation of many people is of course the key to any community, and the computer animation community is no different. With this in mind, we assembled a program committee of over 50 experts in all aspects of computer animation to help shape this year's program. A total of 53 submissions were received. Each paper received at least 3 reviews by members of the program committee, who then made a recommendation to the program chairs. The quality of the submissions was remarkable, considering that this is the first and founding year for the symposium. In the end, 22 papers were selected for presentation and publication in the proceedings.
TL;DR: A novel input device and interface for interactively controlling the animation of graphical human character from a desktop environment and a layered kinematic motion recording strategy accesses subsets of the total degrees of freedom of the character.
Abstract: We present a novel input device and interface for interactively controlling the animation of graphical human character from a desktop environment. The trackers are embedded in a new physical design, which is both simple yet also provides significant benefits, and establishes a tangible interface with coordinate frames inherent to the character. A layered kinematic motion recording strategy accesses subsets of the total degrees of freedom of the character. We present the experiences of three novice users with the system, and that of a long-term user who has prior experience with other complex continuous interfaces.
TL;DR: The Avatar Markup Language (AML), based on XML, encapsulates the Text to Speech, Facial Animation and Body Animation in a unified manner with appropriate synchronization, can be effectively used by intelligent software agents to control their 3D graphical representations in the virtual environments.
Abstract: Synchronization of speech, facial expressions and body gestures is one of the most critical problems in realistic avatar animation in virtual environments. In this paper, we address this problem by proposing a new high-level animation language to describe avatar animation. The Avatar Markup Language (AML), based on XML, encapsulates the Text to Speech, Facial Animation and Body Animation in a unified manner with appropriate synchronization. We use low-level animation parameters, defined by the MPEG-4 standard, to demonstrate the use of the AML. However, the AML itself is independent of any low-level parameters as such. AML can be effectively used by intelligent software agents to control their 3D graphical representations in the virtual environments. With the help of the associated tools, AML also facilitates to create and share 3D avatar animations quickly and easily. We also discuss how the language has been developed and used within the SoNG project framework. The tools developed to use AML in a real-time animation system incorporating intelligent agents and 3D avatars are also discussed subsequently.
TL;DR: In this article, a method of allowing a user to efficiently direct the generation of frames in a computer animation is presented, where each object within a frame has an initial representation, e.g., position, orientation, scale, intensity, etc.
Abstract: The present invention provides a method of allowing a user to efficiently direct the generation of frames in a computer animation. An object within a frame has an initial representation, e.g., position, orientation, scale, intensity, etc. A vector response characteristic can be associated with the object, where the vector response characteristic specifies how the representation of the object changes in response to applied vectors. For example, a ball might accelerate proportional to the directed magnitude of an applied vector, while a light source might change in intensity and color according to the direction and magnitude of an applied vector. Each object can have its own vector response characteristic, multiple vector response characteristics (e.g., applicable if different parts of the animation), and constraints on its vector response characteristics (e.g., must stay connected to another object). Objects can also generate their own vectors to apply to other objects (e.g., a wall can generate a vector to discourage objects from penetrating the wall).
TL;DR: An efficient kinematic approach to creating gesture animations from shape specifications is presented, which provides fine adaptation to temporal constraints that are imposed by cross-modal synchrony.
Abstract: Virtual conversational agents are supposed to combine speech with non-verbal modalities for intelligible and believable utterances. However, the automatic synthesis of co-verbal gestures is still struggling with several problems like naturalness in procedurally generated animations, flexibility in pre-defined movements, and synchronization with speech. In this paper we focus on generating complex multimodal utterances including gesture and speech from XML-based descriptions of their overt form. We describe a coordination model that reproduces coarticulation and transition effects in both modalities. In particular, an efficient kinematic approach to creating gesture animations from shape specifications is presented, which provides fine adaptation to temporal constraints that are imposed by cross-modal synchrony.
TL;DR: The authors explored how drawing movements entail a decoding of live-action cinema, which is intensified in the techniques of moving drawings that are prevalent in anime, and explored the ways in which different movements have an impact on narrative, genre and spectatorship.
Abstract: This essay deals with two kinds of movement common in cel animation: 'drawing movements' and 'moving drawings'. Drawing movements is common in traditional cel animation that strives for full animation. The latter ndash; moving drawings - becomes pronounced in techniques of limited animation, common in anime . The goal is not, however, to identify and consolidate differences between animation and anime . On the contrary, this paper explores how drawing movements entails a decoding of live-action cinema,which is intensified in the techniques of moving drawings that are prevalent in anime . Thus, anime is seen as a part of movement away from one kind of cinematic experience, towards something like new media and information. The goal of the essay is to think across media, to explore the ways in which different movements have an impact on narrative, genre and spectatorship. Miyazaki Hayao's Tenkuno shiro Raputa (Castle in the sky) (Studio Ghibli, 1986) provides a site for analysis of the ways in which anime te...
TL;DR: The aim is to assist the user in understanding the structural and visual changes that have occurred in the layout and to apply the Gestalt principles of organisation to this animation phase.
Abstract: Graphs are a commonly used data structure for representing relational information. Drawings of these structures, as node and link diagrams, can provide a useful visualization of the underlying abstract data. This makes drawings of graphs a useful tool in information visualization. Indeed graph drawing has been applied in many application areas including software engineering, knowledge management and for depicting communication networks. The spatial layout can help the user build up a cognitive model or 'mental map' of the information structure. Many automatic algorithms for producing drawings of a graph have been implemented. In many domains it is also common for the underlying information to be dynamic and this means the graph drawing must be updated. Unfortunately, even small changes to the underlying data can result in dramatic changes to the final drawing and this means the user may totally lose their previous 'mental map'. Animation between the two versions of the layout is one approach that can assist the user to make the transition between the two drawings. We have been examining how to apply the Gestalt principles of organisation to this animation phase. The aim is to assist the user in understanding the structural and visual changes that have occurred in the layout. Results of that work are described with relevant examples.
TL;DR: A bilayered approach for efficient cloth animation with a large number of mass-points using two mass-spring meshes that can be successfully used for real-time animation of plausible cloth models.
Abstract: We present an efficient method for rapid animation of mass-spring model with a large number of mass-points. Realistic cloth simulation requires large amount of time in general. This is not only because the calculation for one step requires much time but also because the cloth simulation easily tends to become unstable. Although the implicit method can make the simulation stable, it is still impossible to generate interactive animation when the number of mass-points is sufficiently large enough to represent realistic wrinkles. An efficient animation method proposed by M. Desbrun (1999) also involves O(n/sup 2/)-sized matrix so that it cannot be applied to models with a large number of mass-points. A stable approximate method without matrix operations has been introduced by M. Oshita and A. Makinouchi (2001), but its physical correctness is significantly impaired as the stiffness increases or the timestep becomes large. In this paper we propose a bilayered approach for efficient cloth animation with a large number of mass-points. The proposed method uses two mass-spring meshes. One of them is a rough mesh for representing global motion, and the other is a fine mesh for realistic wrinkles. The experimental results show that the method can be successfully used for real-time animation of plausible cloth models.
TL;DR: This work develops a technique that creates cartoon style deformations automatically while preserving desirable qualities of the object's appearance and motion through a set of simple parameters that drive specific features of the motion.
Abstract: Traditional hand animation is in many cases superior to simulated motion for conveying information about character and events Much of this superiority comes from an animator's ability to abstract motion and play to human perceptual effects However, experienced animators are difficult to come by and the resulting motion is typically not interactive On the other hand, procedural models for generating motion, such as physical simulation, can create motion on the fly but are poor at stylizing movement We start to bridge this gap with a technique that creates cartoon style deformations automatically while preserving desirable qualities of the object's appearance and motion Our method is focused on squash-and-stretch deformations based on the velocity and collision parameters of the object, making it suitable for procedural animation systems The user has direct control of the object's motion through a set of simple parameters that drive specific features of the motion, such as the degree of squash and stretch We demonstrate our approach with examples from our prototype system
TL;DR: The advantage of the presented method over laser scanning and coded light range digitizers is the acquisition of the source data in a fraction of a second, allowing the measurement of human faces with higher accuracy and the possibility to measure dynamic events like the speech of a person.
TL;DR: In this article, a quasi-Newton nonlinear programming technique (superlinear convergence) is implemented to solve minimum torque-based human motion-planning problems and the explicit analytical gradients needed in the dynamics are derived using a matrix exponential formulation and Lie algebra.
Abstract: This paper presents an efficient optimal control and recursive dynamics-based computer animation system for simulating and controlling the motion of articulated figures. A quasi-Newton nonlinear programming technique (super-linear convergence) is implemented to solve minimum torque-based human motion-planning problems. The explicit analytical gradients needed in the dynamics are derived using a matrix exponential formulation and Lie algebra. Cubic spline functions are used to make the search space for an optimal solution finite. Based on our formulations, our method is well conditioned and robust, in addition to being computationally efficient. To better illustrate the efficiency of our method, we present results of natural looking and physically correct human motions for a variety of human motion tasks involving open and closed loop kinematic chains.
TL;DR: LabanEditor is described, which is an interactive graphical editor for writing and editing Labanotation scores, which a user can input and edit human body movement of dance and also display animation of a human body model in 3D graphics.
Abstract: Today, intangible cultural properties like ballet and dance have been a target of digital archiving. Our laboratory is developing comprehensive data processing system to input, describe, record, search, and display human body movement. This system can record and reenact human body movement using the data format based on Labanotation, which has been used for recording human movement of dance with several types of graphical symbols. This paper describes LabanEditor, which is an interactive graphical editor for writing and editing Labanotation scores. By using LabanEditor, a user can input and edit human body movement of dance and also display animation of a human body model in 3D graphics.
TL;DR: The most successful and computationally cheapest scheme obtains an accuracy of 82% on the task of picking the "consistent" speaker from a set including three confusers, and a final experiment demonstrates the potential utility of the scheme for speaker localization in video.
Abstract: This paper considers schemes for determining which of a set of faces on screen, if any, is producing speech in a video soundtrack. Whilst motivated by the TREC 2002 (Video Retrieval Track) monologue detection task, the schemes are also applicable to voice and face-based biometrics systems, for assessing lip synchronization quality in movie editing and computer animation, and for speaker localization in video. Several approaches are discussed: two implementations of a generic mutual-information-based measure of the degree of synchrony between signals, which can be used with or without prior speech and face detection, and a stronger model-based scheme which follows speech and face detection with an assessment of face and lip movement plausibility. Schemes are compared on a corpus of 1016 test cases containing multiple faces and multiple speakers, a test set 200 times larger than the nearest comparable test set of which we are aware. The most successful and computationally cheapest scheme obtains an accuracy of 82% on the task of picking the "consistent" speaker from a set including three confusers. A final experiment demonstrates the potential utility of the scheme for speaker localization in video.
TL;DR: The Facial Animation Framework as mentioned in this paper is a framework for MPEG-4 compatible face models and a plethora of tools for facial animation content production, including a light-weitht, portable, MPEG4 compatible facial animation player.
Abstract: Talking virtual characters are graphical simulations of real or imaginary persons capable of human-like behavior, most importantly talking and gesturing. They may find applications on the Internet and mobile platforms as newscasters, customer service representatives, sales representatives, guides etc. After briefly discussing the possible applications and the technical requirements for bringing such applications to life, we describe our approach to enable these applications: the Facial Animation Framework. This framework consists of (1) a lightweitht, portable, MPEG-4 compatible Facial Animation Player, (2) a system for fast production of ready-to-animate, MPEG-4 compatible face models and (3) a plethora of MPEG-4 compatible tools for Facial Animation content production. We believe that this kind of approach offers enough flexibility to rapidly adapt to a broad range of applications involving facial animation on various platforms.
TL;DR: In this article, an animation data generation apparatus that supplies a state where a character string can be read, in a part of 3D character animation generated on the basis of functions is presented.
Abstract: The present invention relates to an animation data generation apparatus that supplies a state where a character string can be read, in a part of 3D character animation generated on the basis of functions. This animation data generation apparatus includes: an interface unit for setting characters which are used for animation, time allocation between 3D character animations in moving state and in standstill, and the type of the animation in moving state; a number-of-frame calculation unit for calculating the number of frames corresponding to animation on the basis of the time allocation; and an animation data generation unit 105 for generating data of the animation in standstill so that 3D characters corresponding to the set characters can be read, and generating data of the 3D character animation in moving state so as to link to the data of the animation in standstill, using the number of frames calculated by the number-of-frame calculation unit and a function corresponding to the animation type set by the interface unit.
TL;DR: The Interactive Dance Club as discussed by the authors is a dance venue for the 25th annual ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques (SIGGRAPH 25) that incorporated music and imagery in a unique way.
Abstract: We were recently asked by representatives of the 25th annual ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques if we had any ideas for a potential program incorporating music and imagery in a unique way. We had an opportunity to create a ‘‘never-done-before’’ venue for a major conference, based upon our ability to ‘‘sell’’ the conference in the concept, and then find enough corporate sponsors and likeminded volunteers to bring it to fruition. We accepted the offer and brought our respective experience in music production, sound design, and control system engineering together to spearhead the development of the Interactive Dance Club. We wanted to create a type of venue where people could have the opportunity to become players in a large, interconnected, interactive musical and visual environment. Within the framework of a dance club, the results of their interactions had to sound musical (to the untrained ear), with special attention paid to keeping the overall sound from becoming cacophonous. The environment had to make the participants feel comfortable and safe to express themselves. Because this was a production assignment with a very tight schedule and not a research project per se, we decided to use as much ‘‘off-the-shelf’’ technology as possible, developing software and hardware only where necessary. We used Max extensively, which allowed us to rapidly prototype and deploy software for distribution to our collaborators, and we also used Opcode’s Vision MIDI sequencing software. Infusion Systems loaned us some of their digitizers, and Interactive Light provided a handful of Dimension Beam IR sensor devices. Side Effects Software’s 3D animation package, Houdini, was used as our computer graphics platform for both authoring and runtime; we drew on its strengths in real-time control of nearly every aspect of the computer graphics environment. Additional drum pad interfaces came from Roland, and custom sensor interfaces were created by Interactive Technologies. Both Apple Computer and Silicon Graphics generously supplied us with computers during both the development process and for the SIGGRAPH show.
TL;DR: SnakeToonz is an interactive system that allows children and others untrained in cel animation to create two-dimensional cartoons from video streams and images by combining constraints of the cartooning medium with simple user input and analysis of that input.
Abstract: SnakeToonz is an interactive system that allows children and others untrained in cel animation to create two-dimensional cartoons from video streams and images. The ability to create cartoons has traditionally been limited to professional animation houses and trained artists. SnakeToonz aims to give anyone with a video camera and a computer the ability to create compelling cel animation. This is done by combining constraints of the cartooning medium with simple user input and analysis of that input.A cartoon is created in a dialogue with the system. After recording video material the user sketches contours directly onto the first frame of video. These sketches initialize a set of spline-based active contours which are relaxed to best fit the image and other aesthetic constraints. Small gaps are closed, and the user can choose colors for the cartoon. The system then uses motion estimation techniques to track these contours through the image sequence. The user remains in the process to edit the cartoon as it progresses.
TL;DR: An old algorithm for visual simulation of climbing plants is extended here, using the phenomenon of traumatic reiteration for critical cases and an associated voxel space for collisions and space occupancy detection as well as for evaluating the illumination of the plant organs.
Abstract: An old algorithm for visual simulation of climbing plants is extended here. Plants are modeled as systems of oriented particles that are able to sense their environment. Particles move to the best locations using directed random walk. We use the phenomenon of traumatic reiteration for critical cases. If there is no location for further growth possible the particle dies, but before that it sends a signal that is propagated down in the plant structure. This signal activates the closest possible sleeping particle that takes its job. We use an associated voxel space for collisions and space occupancy detection as well as for evaluating the illumination of the plant organs. The algorithm is fast, easy to implement, and runs interactively even for quite large scenes on a medium-class computer. We believe that this approach can be used as an interactive technique in architecture, computer games, computer animation, etc.
TL;DR: The first method is to create 3D Chinese painting animation using existing software packages and the second method is an expressive paint tool which allows an artist to interactively create 2D Chinese Painting.
Abstract: We present two methods to create realistic Chinese painting. The first method is to create 3D Chinese painting animation using existing software packages. The second method is an expressive paint tool which allows an artist to interactively create 2D Chinese painting.
TL;DR: This work presents a planning system that transforms a description of animator intentions and character actions into a series of camera shots which portray these intentions and produces an animation intended to produce a viewer impression to support the animator's description of the mood and theme of the narrative.
Abstract: Standard techniques, such as soundtrack recording, storyboarding and key-framing, are used to create animation adaptations of narratives. Many aspects of the narrative, such as moods, themes, character motivations and plot, must he captured in the audio-visual medium. Our work focusses on achieving the communication of moods and themes solely through the application of well-known cinematography techniques. We present a planning system that transforms a description of animator intentions and character actions into a series of camera shots which portray these intentions. The planner accomplishes this portrayal by utilizing lighting, framing, camera motion, colour choice and shot pacing. The final output is an animation that is intended to produce a viewer impression to support the animator's description of the mood and theme of the narrative.
TL;DR: A modified version of the coarticulation model proposed by Cohen and Massaro (1993) is described and was applied with success to GRETA, an Italian talking head, and examples are illustrated to show the naturalness of the resulting animation technique.
Abstract: A modified version of the coarticulation model proposed by Cohen and Massaro (1993) is described. A semi-automatic minimization technique, working on real cinematic data, acquired by the ELITE opto-electronic system, was used to train the dynamic characteristics of the model. Finally, the model was applied with success to GRETA, an Italian talking head, and examples are illustrated to show the naturalness of the resulting animation technique.
TL;DR: This paper presents a method for animating human characters, especially dedicated to walk planning problems, including a steering method dedicated to human walk, integrated in a randomized motion planning scheme, assuming realistic animations.
Abstract: This paper presents a method for animating human characters, especially dedicated to walk planning problems. The method is integrated in a randomized motion planning scheme, including a steering method dedicated to human walk. This steering method integrates a character motion controller assuming realistic animations. The navigation of the character through a virtual environment is modeled as a composition of Bezier curves. The controller is based on motion capture data editing techniques. This approach satisfies some essential computer graphics criteria: a realistic result, a low response time, a collision-free motion in possibly constrained 3D environments. The approach has been implemented and successfully demonstrated on several examples.
TL;DR: RUTH adopts an open, layered architecture in which fine-grained features of the animation can be derived by rule from inferred linguistic structure, allowing it to investigate the meaningful high-level elements of conversational facial movement for American English speakers.
Abstract: People highlight the intended interpretation of their utterances within a larger discourse by a diverse set of nonverbal signals. These signals represent a key challenge for animated conversational agents because they are pervasive, variable, and need to be coordinated judiciously in an effective contribution to conversation. In this paper we describe a freely-available cross-platform real-time facial animation system, RUTH, that animates such high-level signals in synchrony with speech and lip movements. RUTH adopts an open, layered architecture in which fine-grained features of the animation can be derived by rule from inferred linguistic structure, allowing us to use RUTH, in conjunction with annotation of observed discourse, to investigate the meaningful high-level elements of conversational facial movement for American English speakers.
TL;DR: An accurate and inexpensive procedure that estimates 3D facial motion parameters from mirror-reflected multiview video clips and a novel closed-form linear algorithm to reconstruct 3D positions from real versus mirrored point correspondences in an uncalibrated environment is proposed.
Abstract: We propose an accurate and inexpensive procedure that estimates 3D facial motion parameters from mirror-reflected multiview video clips. We place two planar mirrors near a subject's cheeks and use a single camera to simultaneously capture a marker's front and side view images. We also propose a novel closed-form linear algorithm to reconstruct 3D positions from real versus mirrored point correspondences in an uncalibrated environment. Our computer simulations reveal that exploiting mirrors' various reflective properties yields a more robust, accurate, and simpler 3D position estimation approach than general-purpose stereo vision methods that use a linear approach or maximum-likelihood optimization. Our experiments show a root mean square (RMS) error of less than 2 mm in 3D space with only 20-point correspondences. For semiautomatic 3D motion tracking, we use an adaptive Kalman predictor and filter to improve stability and infer the occluded markers' position. Our approach tracks more than 50 markers on a subject's face and lips from 30-frame-per-second video clips. We've applied the facial motion parameters estimated from the proposed method to our facial animation system.