Scispace (Formerly Typeset)
  1. Home
  2. Topics
  3. Computer animation
  4. 2020
  1. Home
  2. Topics
  3. Computer animation
  4. 2020
Showing papers on "Computer animation published in 2020"
Proceedings Article•10.1109/CVPR42600.2020.00316•
ARCH: Animatable Reconstruction of Clothed Humans

[...]

Zeng Huang1, Yuanlu Xu2, Christoph Lassner2, Hao Li1, Tony Tung2 •
University of Southern California1, Facebook2
14 Jun 2020
TL;DR: This paper proposes ARCH (Animatable Reconstruction of Clothed Humans), a novel end-to-end framework for accurate reconstruction of animation-ready 3D clothed humans from a monocular image and shows numerous qualitative examples of animated, high-quality reconstructed avatars unseen in the literature so far.
Abstract: In this paper, we propose ARCH (Animatable Reconstruction of Clothed Humans), a novel end-to-end framework for accurate reconstruction of animation-ready 3D clothed humans from a monocular image. Existing approaches to digitize 3D humans struggle to handle pose variations and recover details. Also, they do not produce models that are animation ready. In contrast, ARCH is a learned pose-aware model that produces detailed 3D rigged full-body human avatars from a single unconstrained RGB image. A Semantic Space and a Semantic Deformation Field are created using a parametric 3D body estimator. They allow the transformation of 2D/3D clothed humans into a canonical space, reducing ambiguities in geometry caused by pose variations and occlusions in training data. Detailed surface geometry and appearance are learned using an implicit function representation with spatial local features. Furthermore, we propose additional per-pixel supervision on the 3D reconstruction using opacity-aware differentiable rendering. Our experiments indicate that ARCH increases the fidelity of the reconstructed humans. We obtain more than 50% lower reconstruction errors for standard metrics compared to state-of-the-art methods on public datasets. We also show numerous qualitative examples of animated, high-quality reconstructed avatars unseen in the literature so far.

469 citations

Journal Article•10.1145/3386569.3392422•
Character controllers using motion VAEs

[...]

Hung Yu Ling1, Fabio Zinno2, George Cheng2, Michiel van de Panne1•
University of British Columbia1, Electronic Arts2
08 Jul 2020-ACM Transactions on Graphics
TL;DR: This work uses deep reinforcement learning to learn controllers that achieve goal-directed movements in data-driven generative models of human movement using autoregressive conditional variational autoencoders, or Motion VAEs.
Abstract: A fundamental problem in computer animation is that of realizing purposeful and realistic human movement given a sufficiently-rich set of motion capture clips. We learn data-driven generative models of human movement using autoregressive conditional variational autoencoders, or Motion VAEs. The latent variables of the learned autoencoder define the action space for the movement and thereby govern its evolution over time. Planning or control algorithms can then use this action space to generate desired motions. In particular, we use deep reinforcement learning to learn controllers that achieve goal-directed movements. We demonstrate the effectiveness of the approach on multiple tasks. We further evaluate system-design choices and describe the current limitations of Motion VAEs.

309 citations

Journal Article•10.1145/3414685.3417803•
PIE: portrait image embedding for semantic control

[...]

Ayush Tewari1, Mohamed Elgharib1, Mallikarjun B R1, Florian Bernard2, Hans-Peter Seidel1, Patrick Pérez3, Michael Zollhöfer4, Christian Theobalt1 •
Max Planck Society1, Technische Universität München2, Valeo3, Stanford University4
26 Nov 2020
TL;DR: The authors embeds real portrait images in the latent space of StyleGAN, which allows for intuitive editing of the head pose, facial expression, and scene illumination in the image, while maintaining facial integrity.
Abstract: Editing of portrait images is a very popular and important research topic with a large variety of applications. For ease of use, control should be provided via a semantically meaningful parameterization that is akin to computer animation controls. The vast majority of existing techniques do not provide such intuitive and fine-grained control, or only enable coarse editing of a single isolated control parameter. Very recently, high-quality semantically controlled editing has been demonstrated, however only on synthetically created StyleGAN images. We present the first approach for embedding real portrait images in the latent space of StyleGAN, which allows for intuitive editing of the head pose, facial expression, and scene illumination in the image. Semantic editing in parameter space is achieved based on StyleRig, a pretrained neural network that maps the control space of a 3D morphable face model to the latent space of the GAN. We design a novel hierarchical non-linear optimization problem to obtain the embedding. An identity preservation energy term allows spatially coherent edits while maintaining facial integrity. Our approach runs at interactive frame rates and thus allows the user to explore the space of possible edits. We evaluate our approach on a wide set of portrait photos, compare it to the current state of the art, and validate the effectiveness of its components in an ablation study.

210 citations

Book Chapter•10.1007/978-3-030-58545-7_3•
Talking-head Generation with Rhythmic Head Motion

[...]

Lele Chen1, Guofeng Cui1, Celong Liu, Zhong Li, Ziyi Kou1, Yi Xu, Chenliang Xu1 •
University of Rochester1
23 Aug 2020
TL;DR: This work proposes a 3D-aware generative network along with a hybrid embedding module and a non-linear composition module that achieves controllable, photo-realistic, and temporally coherent talking-head videos with natural head movements.
Abstract: When people deliver a speech, they naturally move heads, and this rhythmic head motion conveys prosodic information. However, generating a lip-synced video while moving head naturally is challenging. While remarkably successful, existing works either generate still talking-face videos or rely on landmark/video frames as sparse/dense mapping guidance to generate head movements, which leads to unrealistic or uncontrollable video synthesis. To overcome the limitations, we propose a 3D-aware generative network along with a hybrid embedding module and a non-linear composition module. Through modeling the head motion and facial expressions (In our setting, facial expression means facial movement (e.g., blinks, and lip & chin movements).) explicitly, manipulating 3D animation carefully, and embedding reference images dynamically, our approach achieves controllable, photo-realistic, and temporally coherent talking-head videos with natural head movements. Thoughtful experiments on several standard benchmarks demonstrate that our method achieves significantly better results than the state-of-the-art methods in both quantitative and qualitative comparisons. The code is available on https://github.com/lelechen63/Talking-head-Generation-with-Rhythmic-Head-Motion.

207 citations

Proceedings Article•10.1109/CVPR42600.2020.00347•
Learning Formation of Physically-Based Face Attributes

[...]

Ruilong Li1, Karl Bladin1, Yajie Zhao1, Chinmay Chinara1, Owen Ingraham1, Pengda Xiang1, Xinglei Ren1, Pratusha Prasad1, Bipin Kishore1, Jun Xing1, Hao Li1 •
Institute for Creative Technologies1
14 Jun 2020
TL;DR: In this article, a non-linear morphable face model is proposed to generate multifarious face geometry of pore-level resolution, coupled with material attributes for use in physically-based rendering.
Abstract: Based on a combined data set of 4000 high resolution facial scans, we introduce a non-linear morphable face model, capable of producing multifarious face geometry of pore-level resolution, coupled with material attributes for use in physically-based rendering. We aim to maximize the variety of the participant’s face identities, while increasing the robustness of correspondence between unique components, including middle-frequency geometry, albedo maps, specular intensity maps and high-frequency displacement details. Our deep learning based generative model learns to correlate albedo and geometry, which ensures the anatomical correctness of the generated assets. We demonstrate potential use of our generative model for novel identity generation, model fitting, interpolation, animation, high fidelity data visualization, and low-to-high resolution data domain transferring. We hope the release of this generative model will encourage further cooperation between all graphics, vision, and data focused professionals, while demonstrating the cumulative value of every individual’s complete biometric profile.

82 citations

Posted Content•
DeePSD: Automatic Deep Skinning And Pose Space Deformation For 3D Garment Animation

[...]

Hugo Bertiche, Meysam Madadi, Sergio Escalera
06 Sep 2020-arXiv: Computer Vision and Pattern Recognition
TL;DR: This work proposes a simple model that can animate any outfit, independently of its topology, vertex order or connectivity, and proposes a methodology to complement supervised learning with an unsupervised physically based learning that implicitly solves collisions and enhances cloth quality.
Abstract: We present a novel solution to the garment animation problem through deep learning. Our contribution allows animating any template outfit with arbitrary topology and geometric complexity. Recent works develop models for garment edition, resizing and animation at the same time by leveraging the support body model (encoding garments as body homotopies). This leads to complex engineering solutions that suffer from scalability, applicability and compatibility. By limiting our scope to garment animation only, we are able to propose a simple model that can animate any outfit, independently of its topology, vertex order or connectivity. Our proposed architecture maps outfits to animated 3D models into the standard format for 3D animation (blend weights and blend shapes matrices), automatically providing of compatibility with any graphics engine. We also propose a methodology to complement supervised learning with an unsupervised physically based learning that implicitly solves collisions and enhances cloth quality.

62 citations

Journal Article•10.1145/3355089.3356545•
ScalarFlow: A Large-Scale Volumetric Data Set of Real-world Scalar Transport Flows for Computer Animation and Machine Learning.

[...]

Marie-Lena Eckert1, Kiwon Um1, Nils Thuerey1•
Technische Universität München1
20 Nov 2020-arXiv: Graphics
TL;DR: ScalarFlow is presented, a first large-scale data set of reconstructions of real-world smoke plumes and a framework for accurate physics-based reconstructions from a small number of video streams, using a novel estimation of unseen inflow regions and an efficient regularization scheme.
Abstract: In this paper, we present ScalarFlow, a first large-scale data set of reconstructions of real-world smoke plumes. We additionally propose a framework for accurate physics-based reconstructions from a small number of video streams. Central components of our algorithm are a novel estimation of unseen inflow regions and an efficient regularization scheme. Our data set includes a large number of complex and natural buoyancy-driven flows. The flows transition to turbulent flows and contain observable scalar transport processes. As such, the ScalarFlow data set is tailored towards computer graphics, vision, and learning applications. The published data set will contain volumetric reconstructions of velocity and density, input image sequences, together with calibration data, code, and instructions how to recreate the commodity hardware capture setup. We further demonstrate one of the many potential application areas: a first perceptual evaluation study, which reveals that the complexity of the captured flows requires a huge simulation resolution for regular solvers in order to recreate at least parts of the natural complexity contained in the captured data.

46 citations

Journal Article•10.1016/J.CAG.2020.09.003•
A survey on the animation of signing avatars: From sign representation to utterance synthesis

[...]

Lucie Naert1, Caroline Larboulette1, Sylvie Gibet1•
University of Southern Brittany1
01 Nov 2020-Computers & Graphics
TL;DR: This article distinguishes the synthesis of isolated signs deprived of any contextual inflections from the generation of full sign language utterances, and constitutes a survey of the challenges specific to sign languages avatars.

37 citations

Proceedings Article•10.1145/3379337.3415819•
Video-Annotated Augmented Reality Assembly Tutorials

[...]

Masahiro Yamaguchi1, Shohei Mori1, Peter Mohr1, Markus Tatzgern2, Ana Stanescu1, Hideo Saito3, Denis Kalkofen1 •
Graz University of Technology1, University of Salzburg2, Keio University3
20 Oct 2020
TL;DR: A system for generating and visualizing interactive 3D Augmented Reality tutorials based on 2D video input, which allows viewpoint control at runtime and develops a presentation system that uses commonly available hardware to make the results accessible for home use.
Abstract: We present a system for generating and visualizing interactive 3D Augmented Reality tutorials based on 2D video input, which allows viewpoint control at runtime. Inspired by assembly planning, we analyze the input video using a 3D CAD model of the object to determine an assembly graph that encodes blocking relationships between parts. Using an assembly graph enables us to detect assembly steps that are otherwise difficult to extract from the video, and generally improves object detection and tracking by providing prior knowledge about movable parts. To avoid information loss, we combine the 3D animation with relevant parts of the 2D video so that we can show detailed manipulations and tool usage that cannot be easily extracted from the video. To further support user orientation, we visually align the 3D animation with the real-world object by using texture information from the input video. We developed a presentation system that uses commonly available hardware to make our results accessible for home use and demonstrate the effectiveness of our approach by comparing it to traditional video tutorials.

34 citations

Proceedings Article•10.1145/3378184.3378228•
An RNN-Ensemble approach for Real Time Human Pose Estimation from Sparse IMUs

[...]

Deepak Nagaraj, Erik Schake, Patrick Leiner, Dirk Werth
7 Jan 2020
TL;DR: This article focuses on the development of a novel machine learning based proof of concept for real-time human pose estimation using data collected from sparse inertial measurement units (IMU) system which is cost-effective and least intrusive in the scope of skilled crafts domain.
Abstract: With recent advances in various hardware technologies, human motion capturing (MoCap) has gained importance in the fields such as computer vision, computer animation, gesture recognition in gaming, and most importantly in bio-mechanical analysis. In this direction, human motion is being captured using various kinds of sensors. Correspondingly, many model-based and data-based techniques have been developed in order to decode sensor readings into information understandable by a person. Given that the current technologies still lack applicability in real-world scenarios considering cost and ease of information gathering, leaves substantial room for improvement. This article focuses on the development of a novel machine learning based proof of concept for real-time human pose estimation using data collected from sparse inertial measurement units (IMU) system which is cost-effective and least intrusive in the scope of skilled crafts domain. Training diverse bi-directional recurrent neural networks (bi-RNN) with variable window size and building an ensemble of these models to estimate human pose in terms of human-joints' angles more accurately and robustly is discussed.

22 citations

Journal Article•10.1109/TVCG.2020.3025175•
On the Plausibility of Virtual Body Animation Features in Virtual Reality.

[...]

Henrique GalvanDebarba1, Sylvain Chagué, Caecilia Charbonnier•
IT University of Copenhagen1
18 Sep 2020-IEEE Transactions on Visualization and Computer Graphics
TL;DR: It was found that foot tracking, followed by mouth animation and finger tracking, were the features that added the most to the sense of control of a self-representing avatar.
Abstract: We present two experiments to assess the relative impact of different levels of body animation fidelity on plausibility illusion (Psi). The first experiment presents a virtual character that is not controlled by the user ( n = 13) while the second experiment presents a user-controlled virtual avatar ( n = 24, all male). Psi concerns how realistic and coherent the events in a virtual environment look and feel and is part of Slater's proposition of two orthogonal components of presence in virtual reality (VR). In the experiments, the face, hands, upper body and lower body of the character or self-avatar were manipulated to present different degrees of animation fidelity, such as no animation, procedural animation, and motion captured animation. Participants started the experiment experiencing the best animation configuration. Then, animation features were reduced to limit the amount of captured information made available to the system. Participants had to move from this basic animation configuration towards a more complete one, and declare when the avatar animation realism felt equivalent to the initial and most complete configuration, which could happen before all animation features were maxed out. Participants in the self-avatar experiment were also asked to rate how each animation feature affected their sense of control of the virtual body. We found that a virtual body with upper and lower body animated using eight tracked rigid bodies and inverse kinematics (IK) was often perceived as equivalent to a professional capture pipeline relying on 53 markers. Compared to what standard VR kits in the market are offering, i.e. a tracked headset and two hand controllers, we found that foot tracking, followed by mouth animation and finger tracking, were the features that added the most to the sense of control of a self-representing avatar. In addition, these features were often among the first to be improved in both experiments.
Journal Article•10.1088/1757-899X/917/1/012021•
A Review Survey on the Use Computer Animation in Education

[...]

Noor Rohana Mansor1, Rosdi Zakaria1, Roswati Abd. Rashid1, Raihan Mohd Arifin2, Baidruel Hairiel Abd Rahim, Ruslimi Zakaria2, Mohd Tajuddin Abd. Razak1 •
Universiti Malaysia Terengganu1, Universiti Sultan Zainal Abidin2
1 Sep 2020
TL;DR: In this article, the authors focused on illustrating potential applications of animations in language learning and education, identifying evidence-based principles for their design and use, and proposing possible research works.
Abstract: The use of animation computers in today's world is beneficial in many sectors. Animations development has changed over time and usage context, and illustrate phenomena and concepts that difficult to understand. Animations may not always be useful, however, and teachers using animations need to understand the importance of it. This paper focused on illustrating potential applications of animations in language learning and education, on identifying evidence- based principles for their design and use, and on proposing possible research works. The animation in animation involves the use of compelling graphics, including images, audio and video in the form of technology. However, the use of animations or computers is not limited to education alone, in addition to the education sector, the economic sector, the business sector, the medical sector. However, there is a great deal of debate about the effectiveness of computer animation in various fields. The general findings can be concluded that the role of computer animation is constructive in mostly in language learning.
Proceedings Article•10.1109/CVPRW50498.2020.00393•
Robust One Shot Audio to Video Generation

[...]

Neeraj Kumar, Srishti Goel, Ankur Narang, Mujtaba Hasan
14 Jun 2020
TL;DR: Qualitative evaluation and Online Turing tests demonstrate the efficacy of OneShotA2V, a novel approach to synthesize a talking person video of arbitrary length using as input: an audio signal and a single unseen image of a person.
Abstract: Audio to Video generation is an interesting problem that has numerous applications across industry verticals including film making, multi-media, marketing, education and others. High-quality video generation with expressive facial movements is a challenging problem that involves complex learning steps for generative adversarial networks. Further, enabling one-shot learning for an unseen single image increases the complexity of the problem while simultaneously making it more applicable to practical scenarios.In the paper, we propose a novel approach OneShotA2V to synthesize a talking person video of arbitrary length using as input: an audio signal and a single unseen image of a person. OneShotA2V leverages curriculum learning to learn movements of expressive facial components and hence generates a high-quality talking head video of the given person.Further, it feeds the features generated from the audio input directly into a generative adversarial network and it adapts to any given unseen selfie by applying few-shot learning with only a few output updation epochs. OneShotA2V leverages spatially adaptive normalization based multi-level generator and multiple multi-level discriminators based architecture. The input audio clip is not restricted to any specific language, which gives the method multilingual applicability. Experimental evaluation demonstrates superior performance of OneShotA2V as compared to Realistic Speech-Driven Facial Animation with GANs(RSDGAN) [43], Speech2Vid [8], and other approaches, on multiple quantitative metrics including: SSIM (structural similarity index), PSNR (peak signal to noise ratio) and CPBD (image sharpness). Further, qualitative evaluation and Online Turing tests demonstrate the efficacy of our approach.
Journal Article•10.1016/J.JVCIR.2020.102812•
Generating video animation from single still image in social media based on intelligent computing

[...]

Tao Hu1, Tao Hu2, Chao Liang2, Geyong Min3, Keqin Li4, Chunxia Xiao2 •
Minzu University of China1, Wuhan University2, University of Exeter3, State University of New York System4
01 Aug 2020-Journal of Visual Communication and Image Representation
TL;DR: This paper presents an image animating method for enhancing single still image in social media with virtual realistic and animated motions without prior information that produces visually natural results while guaranteeing motion harmony between active objects and passive objects.
Journal Article•10.1016/J.VRIH.2020.07.002•
Virtual simulation experiment of the design and manufacture of a beer bottle-defect detection system.

[...]

Yuxiang Zhao1, Xiaowei An1, Nongliang Sun1•
Shandong University of Science and Technology1
1 Aug 2020
TL;DR: By mainly focusing on bottle mouthdefect detection, the detection system dedicates more attention to the user and the task, and will eventually yield much better results as a training tool for imageprocessing education.
Abstract: Background Machine learning-based beer bottle-defect detection is a complex technology that runs automatically; however, it consumes considerable memory, is expensive, and poses a certain danger when training novice operators. Moreover, some topics are difficult to learn from experimental lectures, such as digital image processing and computer vision. However, virtual simulation experiments have been widely used to good effect within education. A virtual simulation of the design and manufacture of a beer bottle-defect detection system will not only help the students to increase their image-processing knowledge, but also improve their ability to solve complex engineering problems and design complex systems. Methods The hardware models for the experiment (camera, light source, conveyor belt, power supply, manipulator, and computer) were built using the 3DS MAX modeling and animation software. The Unreal Engine 4 (UE4) game engine was utilized to build a virtual design room, design the interactive operations, and simulate the system operation. Results The results showed that the virtual-simulation system received much better experimental feedback, which facilitated the design and manufacture of a beer bottle-defect detection system. The specialized functions of the functional modules in the detection system, including a basic experimental operation menu, power switch, image shooting, image processing, and manipulator grasping, allowed students (or virtual designers) to easily build a detection system by retrieving basic models from the model library, and creating the beer-bottle transportation, image shooting, image processing, defect detection, and defective-product removal. The virtual simulation experiment was completed with image processing as the main body. Conclusions By mainly focusing on bottle mouthdefect detection, the detection system dedicates more attention to the user and the task. With more detailed tasks available, the virtual system will eventually yield much better results as a training tool for imageprocessing education. In addition, a novel visual perception-thinking pedagogical framework enables better comprehension than the traditional lecture-tutorial style.
Posted Content•
VideoForensicsHQ: Detecting High-quality Manipulated Face Videos

[...]

Gereon Fox1, Wentao Liu1, Hyeongwoo Kim1, Hans-Peter Seidel1, Mohamed Elgharib1, Christian Theobalt1 •
Max Planck Society1
20 May 2020-arXiv: Computer Vision and Pattern Recognition
TL;DR: This paper examines how the performance of forgery detectors depends on the presence of artefacts that the human eye can see and introduces a new family of detectors that examine combinations of spatial and temporal features and outperform existing approaches both in terms of detection accuracy and generalization.
Abstract: New approaches to synthesize and manipulate face videos at very high quality have paved the way for new applications in computer animation, virtual and augmented reality, or face video analysis. However, there are concerns that they may be used in a malicious way, e.g. to manipulate videos of public figures, politicians or reporters, to spread false information. The research community therefore developed techniques for automated detection of modified imagery, and assembled benchmark datasets showing manipulatons by state-of-the-art techniques. In this paper, we contribute to this initiative in two ways: First, we present a new audio-visual benchmark dataset. It shows some of the highest quality visual manipulations available today. Human observers find them significantly harder to identify as forged than videos from other benchmarks. Furthermore we propose new family of deep-learning-based fake detectors, demonstrating that existing detectors are not well-suited for detecting fakes of a quality as high as presented in our dataset. Our detectors examine spatial and temporal features. This allows them to outperform existing approaches both in terms of high detection accuracy and generalization to unseen fake generation methods and unseen identities.
Journal Article•10.1016/J.TIBS.2020.04.009•
Using 3D Animation to Visualize Hypotheses

[...]

Shraddha Nayak1, Hui Liu1, Grace I-Hsuan Hsu1, Janet Iwasa1•
University of Utah1
15 May 2020-Trends in Biochemical Sciences
Journal Article•10.1111/CGF.14110•
Higher-order time integration for deformable solids

[...]

Fabian Löschner1, Andreas Longva1, Stefan Rhys Jeske1, Tassilo Kugelstadt1, Jan Bender1 •
RWTH Aachen University1
6 Oct 2020
TL;DR: An implicit linearized contact model is derived based on a predictor‐corrector approach that leads to consistent behavior with higher‐order integrators as predictors and is well suited for the simulation of stiff, nonlinear materials with the integration methods presented in this paper.
Abstract: Visually appealing and vivid simulations of deformable solids represent an important aspect of physically based computer animation. For the temporal discretization, it is customary in computer animation to use first-order accurate integration methods, such as Backward Euler, due to their simplicity and robustness. Although there is notable research on second-order methods, their use is not widespread. Many of these well-known methods have significant drawbacks such as severe numerical damping or scene-dependent time step restrictions to ensure stability. In this paper, we discuss the most relevant requirements on such methods in computer animation and motivate the interest beyond first-order accuracy. Keeping these requirements in mind, we investigate several promising methods from the families of diagonally implicit Runge-Kutta (DIRK) and Rosenbrock methods which currently do not appear to have considerable popularity in this field. We show that the usage of such methods improves the visual quality of physical animations. In addition, we demonstrate that they allow distinctly more control over damping at lower computational cost than classical methods. As part of our theoretical contribution, we review aspects of simulations that are often considered more intricate with higher-order methods, such as contact handling. To this end, we derive an implicit linearized contact model based on a predictor-corrector approach that leads to consistent behavior with higher-order integrators as predictors. Our contact model is well suited for the simulation of stiff, nonlinear materials with the integration methods presented in this paper and more common methods such as Backward Euler alike.
Proceedings Article•10.1109/QOMEX48832.2020.9123135•
Synthetic Thermal Image Generation for Human-Machine Interaction in Vehicles

[...]

R. Blythman, Amr Elrasad, Eoin O'Connell, Paul Kielty1, Michael O'Byrne, Mohamed Moustafa1, Cian Ryan, Joseph Lemley •
National University of Ireland, Galway1
26 May 2020
TL;DR: A pipeline for creating a synthetic thermal image dataset is developed and the effectiveness of the approach is evaluated using a number of deep learning algorithms that may enable human-machine interaction such as head pose estimation and face detection.
Abstract: Thermal infrared imaging holds promise for human-machine interaction in vehicles owing to superior performance in low-light and low-visibility conditions, and the potential for monitoring human psycho-physiological state However, the shortage of large-scale 2D thermal image datasets and public benchmarks has hindered progress of deep-learning-based solutions To tackle this problem, we develop a pipeline for creating a synthetic thermal image dataset Firstly, 3D models of human heads are generated from uncalibrated TIR images (without additional visible or depth images) using photogrammetry techniques A synthetic dataset of 100k images of 640×480 resolution are then generated by rendering each of the five 3D models for a range of head poses, camera positions and backgrounds using commercial animation software The effectiveness of the approach is evaluated using a number of deep learning algorithms that may enable human-machine interaction such as head pose estimation and face detection The neural networks are trained on the new synthetic thermal dataset, before fine tuning on real world data where possible
Journal Article•10.1049/IET-CVI.2019.0786•
Going beyond Free Viewpoint: Creating Animatable Volumetric Video of Human Performances

[...]

Anna Hilsmann, Philipp Fechteler, Wieland Morgenstern, Wolfgang Paier, Ingo Feldmann, Oliver Schreer, Peter Eisert1 •
Fraunhofer Society1
02 Sep 2020-arXiv: Computer Vision and Pattern Recognition
TL;DR: In this article, a hybrid geometry and video-based animation approach is proposed to combine the flexibility of classical CG animation with the realism of real captured data, where coarse movements and poses are modeled in the geometry only, while very fine and subtle details in the face, often lacking in purely geometric methods, are captured in videobased textures.
Abstract: In this paper, we present an end-to-end pipeline for the creation of high-quality animatable volumetric video content of human performances. Going beyond the application of free-viewpoint volumetric video, we allow re-animation and alteration of an actor's performance through (i) the enrichment of the captured data with semantics and animation properties and (ii) applying hybrid geometry- and video-based animation methods that allow a direct animation of the high-quality data itself instead of creating an animatable model that resembles the captured data. Semantic enrichment and geometric animation ability are achieved by establishing temporal consistency in the 3D data, followed by an automatic rigging of each frame using a parametric shape-adaptive full human body model. Our hybrid geometry- and video-based animation approaches combine the flexibility of classical CG animation with the realism of real captured data. For pose editing, we exploit the captured data as much as possible and kinematically deform the captured frames to fit a desired pose. Further, we treat the face differently from the body in a hybrid geometry- and video-based animation approach where coarse movements and poses are modeled in the geometry only, while very fine and subtle details in the face, often lacking in purely geometric methods, are captured in video-based textures. These are processed to be interactively combined to form new facial expressions. On top of that, we learn the appearance of regions that are challenging to synthesize, such as the teeth or the eyes, and fill in missing regions realistically in an autoencoder-based approach. This paper covers the full pipeline from capturing and producing high-quality video content, over the enrichment with semantics and deformation properties for re-animation and processing of the data for the final hybrid animation.
Journal Article•10.1109/TCST.2019.2935063•
Visual Surveillance of Human Activities via Gradient-Based Coverage Control on Matrix Manifolds

[...]

Takeshi Hatanaka1, Riku Funada2, Masayuki Fujita2•
Osaka University1, Tokyo Institute of Technology2
01 Nov 2020-IEEE Transactions on Control Systems and Technology
TL;DR: This article develops and demonstrates the complete algorithm including the gradient descent algorithm, the density estimation algorithm, image acquisition/processing and physical motion on a simulator built on a 3-D animation software and an experimental testbed.
Abstract: In this article, we address visual surveillance of human activities for a network of cameras with controllable orientations based on gradient-based coverage control techniques. We first formulate the problem as an optimization problem on the matrix manifold $SO(3)$ and then derive the gradient for the cost function using a density function defined on the image plane of each camera. We then develop a real-time density estimation algorithm using computer vision techniques including a real-time pedestrian detection algorithm and examine its real-time feasibility through simulation. We finally demonstrate the complete algorithm including the gradient descent algorithm, the density estimation algorithm, image acquisition/processing and physical motion on a simulator built on a 3-D animation software and an experimental testbed.
Journal Article•10.37200/IJPR/V24I4/PR201075•
Scientific and methodological bases of development of creative activity of students in drawing on the basis of computer animation models

[...]

Mamurova Dilfuza Islomovna, Shukurov Avaz Ruziboevich
28 Feb 2020
Journal Article•10.1155/2020/8857748•
Capture of 3D Human Motion Pose in Virtual Reality Based on Video Recognition

[...]

Qiang Fu, Xingui Zhang1, Jinxiu Xu2, Haimin Zhang•
Tsinghua University1, Xinxiang Medical University2
20 Nov 2020-Complexity
TL;DR: An attitude estimation algorithm adapted to be embedded based on fuzzy logic and a two-classification model and human daily behaviors for experiments for the research of human motion recognition are proposed.
Abstract: Motion pose capture technology can effectively solve the problem of difficulty in defining character motion in the process of 3D animation production and greatly reduce the workload of character motion control, thereby improving the efficiency of animation development and the fidelity of character motion. Motion gesture capture technology is widely used in virtual reality systems, virtual training grounds, and real-time tracking of the motion trajectories of general objects. This paper proposes an attitude estimation algorithm adapted to be embedded. The previous centralized Kalman filter is divided into two-step Kalman filtering. According to the different characteristics of the sensors, they are processed separately to isolate the cross-influence between sensors. An adaptive adjustment method based on fuzzy logic is proposed. The acceleration, angular velocity, and geomagnetic field strength of the environment are used as the input of fuzzy logic to judge the motion state of the carrier and then adjust the covariance matrix of the filter. The adaptive adjustment of the sensor is converted to the recognition of the motion state. For the study of human motion posture capture, this paper designs a verification experiment based on the existing robotic arm in the laboratory. The experiment shows that the studied motion posture capture method has better performance. The human body motion gesture is designed for capturing experiments, and the capture results show that the obtained pose angle information can better restore the human body motion. A visual model of human motion posture capture was established, and after comparing and analyzing with the real situation, it was found that the simulation approach reproduced the motion process of human motion well. For the research of human motion recognition, this paper designs a two-classification model and human daily behaviors for experiments. Experiments show that the accuracy of the two-category human motion gesture capture and recognition has achieved good results. The experimental effect of SVC on the recognition of two classifications is excellent. In the case of using all optimization algorithms, the accuracy rate is higher than 90%, and the final recognition accuracy rate is also higher than 90%. In terms of recognition time, the time required for human motion gesture capture and recognition is less than 2 s.
Journal Article•10.1088/1757-899X/879/1/012147•
Use of 3D Animation Software in Visualizing Architectural Works

[...]

W. Hadiyatna, Andi Harapan S
1 Jul 2020
Journal Article•10.4018/IJTHI.2020100103•
A Natural User Interface for 3D Animation Using Kinect

[...]

Naveed Ahmed1, Hind Kharoub1, Selma Medjden1, Areej Alsaafin1•
University of Sharjah1
01 Oct 2020-International Journal of Technology and Human Interaction
TL;DR: A gesture-based natural user interface is a preferred method to control a 3D animation compared to a cursor-based interface and not only proved to be more efficient but resulted in a more engaging and enjoyable user experience.
Abstract: This article presents a new natural user interface to control and manipulate a 3D animation using the Kinect. The researchers design a number of gestures that allow the user to play, pause, forward, rewind, scale, and rotate the 3D animation. They also implement a cursor-based traditional interface and compare it with the natural user interface. Both interfaces are extensively evaluated via a user study in terms of both the usability and user experience. Through both quantitative and the qualitative evaluation, they show that a gesture-based natural user interface is a preferred method to control a 3D animation compared to a cursor-based interface. The natural user interface not only proved to be more efficient but resulted in a more engaging and enjoyable user experience.
Journal Article•10.3390/MTI4030037•
Examining Computer–Supported 3D Event Recreation for Enhancing Cognitive Load, Memorability, and Engagement

[...]

Ruochen Cao, James A. Walsh, Andrew Cunningham, Mark Kohler, Ross T. Smith, Bruce H. Thomas 
6 Jul 2020
TL;DR: This paper explores the use of 3D model and animation techniques, combined with narrative techniques, for recreating event-based information to aid understanding and indicates that both forms of3D graphical techniques positively supported users in terms of cognitive load, recall, and engagement over reading text.
Abstract: Three-dimensional rendering technologies have long been utilized for explanatory purposes in scientific visualization and related areas. Their applications to wider fields, however, have often been limited. In this paper, we explore the use of 3D model and animation techniques, combined with narrative techniques, for recreating event-based information to aid understanding. An empirical experiment was conducted which examined the effectiveness of 3D model images and 3D animation videos compared to reading narratives in textual form. The results indicated that both forms of 3D graphical techniques positively supported users in terms of cognitive load, recall, and engagement over reading text.
Journal Article•10.1080/15551393.2020.1732218•
Analysis of Facial Feature Design for 3D Animation Characters

[...]

Kuan Lin Chen1, I. Ping Chen, Chi Min Hsieh•
National Chiao Tung University1
02 Apr 2020-Visual Communication Quarterly
TL;DR: Using multidimensional scaling, this study analyzed the differences in the facial features of 332 three-dimensional characters from the top-100 grossing animation films at the international box office and indicated that face aspect ratio, nose length, and distance from mouth corner to face edge are three crucial features in character design.
Abstract: The facial features of animation characters convey information about the characters to audiences and make the characters more believable. Using multidimensional scaling, this study analyzed the dif...
Journal Article•10.1137/19M1241581•
Automatically Controlled Morphing of 2D Shapes with Textures

[...]

Alexander Tereshin, Valery Adzhiev, Oleg Fryazinov, Felix Marrington-Reeve, Alexander Pasko 
04 Feb 2020-Siam Journal on Imaging Sciences
TL;DR: This paper deals with 2D image transformations from a perspective of a 3D heterogeneous shape modeling and computer animation.
Abstract: This paper deals with 2D image transformations from a perspective of a 3D heterogeneous shape modeling and computer animation. Shape and image morphing techniques have attracted a lot of attention ...
Journal Article•10.13052/JWE1540-9589.195612•
A Method of Stereoscopic Display for Dynamic 3D Graphics on Android Platform

[...]

Shihong Chen1, Zi Jiu2•
Beijing Union University1, Communication University of China2
14 Dec 2020-Journal of Web Engineering
TL;DR: A method to rapidly convert 3D dynamic graphics produced by 3D animation software into stereoscopic display suitable for the Android platform is presented, with details of an algorithm to generate double-Viewpoint image sequences from single-viewpoint 3Dynamic graphics, and a method for compositing stereoscopic displays from double-view point image sequences.
Abstract: With the widespread use of smart terminals, the convenient use of stereoscopic video display on mobile platforms is urgently needed by more and more people. This study presents a method to rapidly convert 3D dynamic graphics produced by 3D animation software into stereoscopic display suitable for the Android platform, with details of an algorithm to generate double-viewpoint image sequences from single-viewpoint 3D dynamic graphics, and a method for compositing stereoscopic display from double-viewpoint image sequences. It developes a program on the basis of popular animation software to implement this method for realizing automatic generation of dynamic 3D graphics and for outputting composite images that conforms to the binocular characteristics of stereoscopic displays. As shown by experiments, the methods presented by this study, produce better results at a faster speed and provide stronger support for the production of high-quality stereo videos.
Journal Article•10.1109/MCG.2020.2993345•
Do Virtual Humans Dream of Digital Sheep

[...]

Marcello Carrozzino1, Riccardo Galdieri1, Octavian Mihai Machidon2, Massimo Bergamasco1, Mike Potel •
Sant'Anna School of Advanced Studies1, Transilvania University of Brașov2
01 Jul 2020-IEEE Computer Graphics and Applications
TL;DR: A survey on the role of digital human-like characters in virtual worlds, both as counterparts of real human users and as embodied agents driven by artificial intelligence is presented.
Abstract: As human beings, we are so used to interacting with each other that any world without humans would feel alien to us, including digital ones. In this article, we present a survey on the role of digital human-like characters in virtual worlds, both as counterparts of real human users and as embodied agents driven by artificial intelligence. The main issues related to 3-D graphics, physics, animation, and behavioral modeling are introduced, suggesting wherever available different alternatives and related development pipelines. A sizeable list of examples illustrating the use of virtual humans in different application sectors is then presented, focusing in particular on four domains: environmental design, training, cultural heritage, and healthcare.
...

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve