TL;DR: In this paper, a convolutional network is trained on renderings of synthetic 3D models of cars and chairs to predict an RGB image and a depth map of the object as seen from an arbitrary view.
Abstract: We present a convolutional network capable of inferring a 3D representation of a previously unseen object given a single image of this object. Concretely, the network can predict an RGB image and a depth map of the object as seen from an arbitrary view. Several of these depth maps fused together give a full point cloud of the object. The point cloud can in turn be transformed into a surface mesh. The network is trained on renderings of synthetic 3D models of cars and chairs. It successfully deals with objects on cluttered background and generates reasonable predictions for real images of cars.
TL;DR: A practical foveated rendering system that reduces number of shades by up to 70% and allows coarsened shading up to 30° closer to the fovea than Guenter et al.
Abstract: Foveated rendering synthesizes images with progressively less detail outside the eye fixation region, potentially unlocking significant speedups for wide field-of-view displays, such as head mounted displays, where target framerate and resolution is increasing faster than the performance of traditional real-time renderers. To study and improve potential gains, we designed a foveated rendering user study to evaluate the perceptual abilities of human peripheral vision when viewing today's displays. We determined that filtering peripheral regions reduces contrast, inducing a sense of tunnel vision. When applying a postprocess contrast enhancement, subjects tolerated up to 2× larger blur radius before detecting differences from a non-foveated ground truth. After verifying these insights on both desktop and head mounted displays augmented with high-speed gaze-tracking, we designed a perceptual target image to strive for when engineering a production foveated renderer. Given our perceptual target, we designed a practical foveated rendering system that reduces number of shades by up to 70% and allows coarsened shading up to 30° closer to the fovea than Guenter et al. [2012] without introducing perceivable aliasing or blur. We filter both pre- and post-shading to address aliasing from undersampling in the periphery, introduce a novel multiresolution- and saccade-aware temporal antialising algorithm, and use contrast enhancement to help recover peripheral details that are resolvable by our eye but degraded by filtering. We validate our system by performing another user study. Frequency analysis shows our system closely matches our perceptual target. Measurements of temporal stability show we obtain quality similar to temporally filtered non-foveated renderings.
TL;DR: In this paper, a collision avoidance policy is represented by a deep convolutional neural network that directly processes raw monocular images and outputs velocity commands, with a Monte Carlo policy evaluation algorithm that directly optimizes the network's ability to produce collision free flight.
Abstract: Deep reinforcement learning has emerged as a promising and powerful technique for automatically acquiring control policies that can process raw sensory inputs, such as images, and perform complex behaviors. However, extending deep RL to real-world robotic tasks has proven challenging, particularly in safety-critical domains such as autonomous flight, where a trial-and-error learning process is often impractical. In this paper, we explore the following question: can we train vision-based navigation policies entirely in simulation, and then transfer them into the real world to achieve real-world flight without a single real training image? We propose a learning method that we call CAD$^2$RL, which can be used to perform collision-free indoor flight in the real world while being trained entirely on 3D CAD models. Our method uses single RGB images from a monocular camera, without needing to explicitly reconstruct the 3D geometry of the environment or perform explicit motion planning. Our learned collision avoidance policy is represented by a deep convolutional neural network that directly processes raw monocular images and outputs velocity commands. This policy is trained entirely on simulated images, with a Monte Carlo policy evaluation algorithm that directly optimizes the network's ability to produce collision-free flight. By highly randomizing the rendering settings for our simulated training set, we show that we can train a policy that generalizes to the real world, without requiring the simulator to be particularly realistic or high-fidelity. We evaluate our method by flying a real quadrotor through indoor environments, and further evaluate the design choices in our simulator through a series of ablation studies on depth prediction. For supplementary video see: this https URL
TL;DR: The UnityEyes synthesis framework combines a novel generative 3D model of the human eye region with a real-time rendering framework and shows that these synthesized images can be used to estimate gaze in difficult in-the-wild scenarios, even for extreme gaze angles.
Abstract: Learning-based methods for appearance-based gaze estimation achieve state-of-the-art performance in challenging real-world settings but require large amounts of labelled training data. Learning-by-synthesis was proposed as a promising solution to this problem but current methods are limited with respect to speed, appearance variability, and the head pose and gaze angle distribution they can synthesize. We present UnityEyes, a novel method to rapidly synthesize large amounts of variable eye region images as training data. Our method combines a novel generative 3D model of the human eye region with a real-time rendering framework. The model is based on high-resolution 3D face scans and uses real-time approximations for complex eyeball materials and structures as well as anatomically inspired procedural geometry methods for eyelid animation. We show that these synthesized images can be used to estimate gaze in difficult in-the-wild scenarios, even for extreme gaze angles or in cases in which the pupil is fully occluded. We also demonstrate competitive gaze estimation results on a benchmark in-the-wild dataset, despite only using a light-weight nearest-neighbor algorithm. We are making our UnityEyes synthesis framework available online for the benefit of the research community.
TL;DR: In this paper, a hierarchical sub-band transform was proposed for intra-frame color encoding of point clouds for real-time 3D video. But the results show that the proposed solution performs comparably with the current state-of-the-art, in many occasions outperforming it, while being much more computationally efficient.
Abstract: In free-viewpoint video, there is a recent trend to represent scene objects as solids rather than using multiple depth maps. Point clouds have been used in computer graphics for a long time, and with the recent possibility of real-time capturing and rendering, point clouds have been favored over meshes in order to save computation. Each point in the cloud is associated with its 3D position and its color. We devise a method to compress the colors in point clouds, which is based on a hierarchical transform and arithmetic coding. The transform is a hierarchical sub-band transform that resembles an adaptive variation of a Haar wavelet. The arithmetic encoding of the coefficients assumes Laplace distributions, one per sub-band. The Laplace parameter for each distribution is transmitted to the decoder using a custom method. The geometry of the point cloud is encoded using the well-established octtree scanning. Results show that the proposed solution performs comparably with the current state-of-the-art, in many occasions outperforming it, while being much more computationally efficient. We believe this paper represents the state of the art in intra-frame compression of point clouds for real-time 3D video.
TL;DR: It is found that haptic feedback significantly increases the accuracy of VR interaction, most effectively by rendering high-fidelity shape output as in the case of mechanically-actuated hand-held controllers.
Abstract: We present an investigation of mechanically-actuated hand-held controllers that render the shape of virtual objects through physical shape displacement, enabling users to feel 3D surfaces, textures, and forces that match the visual rendering. We demonstrate two such controllers, NormalTouch and TextureTouch, which are tracked in 3D and produce spatially-registered haptic feedback to a user's finger. NormalTouch haptically renders object surfaces and provides force feedback using a tiltable and extrudable platform. TextureTouch renders the shape of virtual objects including detailed surface structure through a 4×4 matrix of actuated pins. By moving our controllers around while keeping their finger on the actuated platform, users obtain the impression of a much larger 3D shape by cognitively integrating output sensations over time. Our evaluation compares the effectiveness of our controllers with the two de-facto standards in Virtual Reality controllers: device vibration and visual feedback only. We find that haptic feedback significantly increases the accuracy of VR interaction, most effectively by rendering high-fidelity shape output as in the case of our controllers.
TL;DR: Photorealistic rendering of real world environments is important in a range of different areas; including Visual Special effects, Interior/Exterior Modelling, Architectural Modelled, Cultural Heritage, Computer Games and Automotive Design.
Abstract: Photorealistic rendering of real world environments is important in a range of different areas; including Visual Special effects, Interior/Exterior Modelling, Architectural Modelling, Cultural Heritage, Computer Games and Automotive Design.
Currently, rendering systems are able to produce photorealistic simulations of the appearance of many real-world materials. In the real world, viewer perception of objects depends on the lighting and object/material/surface characteristics, the way a surface interacts with the light and on how the light is reflected, scattered, absorbed by the surface and the impact these characteristics have on material appearance. In order to re-produce this, it is necessary to understand how materials interact with light. Thus the representation and acquisition of material models has become such an active research area.
This survey of the state-of-the-art of BRDF Representation and Acquisition presents an overview of BRDF (Bidirectional Reflectance Distribution Function) models used to represent surface/material reflection characteristics, and describes current acquisition methods for the capture and rendering of photorealistic materials.
TL;DR: The proposed human pose representation model is able to generalize to real depth images of unseen poses without the need for re-training or fine-tuning and dramatically outperforms existing state-of-the-art in action recognition.
Abstract: We propose a human pose representation model that transfers human poses acquired from different unknown views to a view-invariant high-level space. The model is a deep convolutional neural network and requires a large corpus of multiview training data which is very expensive to acquire. Therefore, we propose a method to generate this data by fitting synthetic 3D human models to real motion capture data and rendering the human poses from numerous viewpoints. While learning the CNN model, we do not use action labels but only the pose labels after clustering all training poses into k clusters. The proposed model is able to generalize to real depth images of unseen poses without the need for re-training or fine-tuning. Real depth videos are passed through the model frame-wise to extract viewinvariant features. For spatio-temporal representation, we propose group sparse Fourier Temporal Pyramid which robustly encodes the action specific most discriminative output features of the proposed human pose model. Experiments on two multiview and three single-view benchmark datasets show that the proposed method dramatically outperforms existing state-of-the-art in action recognition.
TL;DR: In this paper, the layout, rendering, and interaction methods for visualizing graphs in an immersive environment are presented, and a user study is conducted to evaluate their techniques compared to traditional 2D graph visualization.
Abstract: Information visualization has traditionally limited itself to 2D representations, primarily due to the prevalence of 2D displays and report formats. However, there has been a recent surge in popularity of consumer grade 3D displays and immersive head-mounted displays (HMDs). The ubiquity of such displays enables the possibility of immersive, stereoscopic visualization environments. While techniques that utilize such immersive environments have been explored extensively for spatial and scientific visualizations, contrastingly very little has been explored for information visualization. In this paper, we present our considerations of layout, rendering, and interaction methods for visualizing graphs in an immersive environment. We conducted a user study to evaluate our techniques compared to traditional 2D graph visualization. The results show that participants answered significantly faster with a fewer number of interactions using our techniques, especially for more difficult tasks. While the overall correctness rates are not significantly different, we found that participants gave significantly more correct answers using our techniques for larger graphs.
TL;DR: This work uses a machine learning approach to solve the inverse problem of finding the procedural model that best explains a user sketch, and integrates its algorithm in a coarse-to-fine urban modeling system that allows users to create rich buildings by successively sketching the building mass, roof, facades, windows, and ornaments.
Abstract: 3D modeling remains a notoriously difficult task for novices despite significant research effort to provide intuitive and automated systems. We tackle this problem by combining the strengths of two popular domains: sketch-based modeling and procedural modeling. On the one hand, sketch-based modeling exploits our ability to draw but requires detailed, unambiguous drawings to achieve complex models. On the other hand, procedural modeling automates the creation of precise and detailed geometry but requires the tedious definition and parameterization of procedural models. Our system uses a collection of simple procedural grammars, called snippets, as building blocks to turn sketches into realistic 3D models. We use a machine learning approach to solve the inverse problem of finding the procedural model that best explains a user sketch. We use non-photorealistic rendering to generate artificial data for training convolutional neural networks capable of quickly recognizing the procedural rule intended by a sketch and estimating its parameters. We integrate our algorithm in a coarse-to-fine urban modeling system that allows users to create rich buildings by successively sketching the building mass, roof, facades, windows, and ornaments. A user study shows that by using our approach non-expert users can generate complex buildings in just a few minutes.
TL;DR: By using passive metamaterials as subwavelength pixels, holographic rendering can be achieved without cumbersome circuitry and with only a single transducer, thus significantly reducing system complexity.
Abstract: Acoustic holographic rendering in complete analogy with optical holography are useful for various applications, ranging from multi-focal lensing, multiplexed sensing and synthesizing three-dimensional complex sound fields. Conventional approaches rely on a large number of active transducers and phase shifting circuits. In this paper we show that by using passive metamaterials as subwavelength pixels, holographic rendering can be achieved without cumbersome circuitry and with only a single transducer, thus significantly reducing system complexity. Such metamaterial-based holograms can serve as versatile platforms for various advanced acoustic wave manipulation and signal modulation, leading to new possibilities in acoustic sensing, energy deposition and medical diagnostic imaging.
TL;DR: The purpose of this state‐of‐the‐art report (STAR) is to provide an overview of research into the various aspects of TFs, which lead to interpretation of the underlying data through the use of meaningful visual representations.
Abstract: A central topic in scientific visualization is the transfer function (TF) for volume rendering. The TF serves a fundamental role in translating scalar and multivariate data into color and opacity to express and reveal the relevant features present in the data studied. Beyond this core functionality, TFs also serve as a tool for encoding and utilizing domain knowledge and as an expression for visual design of material appearances. TFs also enable interactive volumetric exploration of complex data. The purpose of this state-of-the-art report (STAR) is to provide an overview of research into the various aspects of TFs, which lead to interpretation of the underlying data through the use of meaningful visual representations. The STAR classifies TF research into the following aspects: dimensionality, derived attributes, aggregated attributes, rendering aspects, automation, and user interfaces. The STAR concludes with some interesting research challenges that form the basis of an agenda for the development of next generation TF tools and methodologies.
TL;DR: This work introduces SceneNet RGB-D, expanding the previous work of SceneNet to enable large scale photorealistic rendering of indoor scene trajectories and provides pixel-perfect ground truth for scene understanding problems such as semantic segmentation, instance segmentations, and object detection.
Abstract: We introduce SceneNet RGB-D, expanding the previous work of SceneNet to enable large scale photorealistic rendering of indoor scene trajectories. It provides pixel-perfect ground truth for scene understanding problems such as semantic segmentation, instance segmentation, and object detection, and also for geometric computer vision problems such as optical flow, depth estimation, camera pose estimation, and 3D reconstruction. Random sampling permits virtually unlimited scene configurations, and here we provide a set of 5M rendered RGB-D images from over 15K trajectories in synthetic layouts with random but physically simulated object poses. Each layout also has random lighting, camera trajectories, and textures. The scale of this dataset is well suited for pre-training data-driven computer vision techniques from scratch with RGB-D inputs, which previously has been limited by relatively small labelled datasets in NYUv2 and SUN RGB-D. It also provides a basis for investigating 3D scene labelling tasks by providing perfect camera poses and depth data as proxy for a SLAM system. We host the dataset at this http URL
TL;DR: The resulting normal estimation method outperforms most of the time the state of the art regarding robustness to outliers, to noise and to point density variation, in the presence of sharp edges, while remaining fast, scaling up to millions of points.
Abstract: Normal estimation in point clouds is a crucial first step for numerous algorithms, from surface reconstruction and scene understanding to rendering. A recurrent issue when estimating normals is to make appropriate decisions close to sharp features, not to smooth edges, or when the sampling density is not uniform, to prevent bias. Rather than resorting to manually-designed geometric priors, we propose to learn how to make these decisions, using ground-truth data made from synthetic scenes. For this, we project a discretized Hough space representing normal directions onto a structure amenable to deep learning. The resulting normal estimation method outperforms most of the time the state of the art regarding robustness to outliers, to noise and to point density variation, in the presence of sharp edges, while remaining fast, scaling up to millions of points.
TL;DR: This work presents a set of perceptually-based methods for improving foveated rendering running on a prototype virtual reality headset with an integrated eye tracker and shows how such techniques can fulfill large field-of-view and frame rate requirements with potentially large reductions in rendering cost.
Abstract: Humans have two distinct vision systems: foveal and peripheral vision. Foveal vision is sharp and detailed, while peripheral vision lacks fidelity. The difference in characteristics of the two systems enable recently popular foveated rendering systems, which seek to increase rendering performance by lowering image quality in the periphery.We present a set of perceptually-based methods for improving foveated rendering running on a prototype virtual reality headset with an integrated eye tracker. Foveated rendering has previously been demonstrated in conventional displays, but has recently become an especially attractive prospect in virtual reality (VR) and augmented reality (AR) display settings with a large field-of-view (FOV) and high frame rate requirements. Investigating prior work on foveated rendering, we find that some previous quality-reduction techniques can create objectionable artifacts like temporal instability and contrast loss. Our emerging technologies installation demonstrates these techniques running live in a head-mounted display and we will compare them against our new perceptually-based foveated techniques. Our new foveation techniques enable significant reduction in rendering cost but have no discernible difference in visual quality. We show how such techniques can fulfill these requirements with potentially large reductions in rendering cost.
TL;DR: The fast interactive segmentation operations and the accurate rendering make this tool particularly suitable for efficient analysis of multimodal image data sets which arise in large amounts in preclinical imaging studies.
Abstract: A software tool is presented for interactive segmentation of volumetric medical data sets. To allow interactive processing of large data sets, segmentation operations, and rendering are GPU-accelerated. Special adjustments are provided to overcome GPU-imposed constraints such as limited memory and host-device bandwidth. A general and efficient undo/redo mechanism is implemented using GPU-accelerated compression of the multiclass segmentation state. A broadly applicable set of interactive segmentation operations is provided which can be combined to solve the quantification task of many types of imaging studies. A fully GPU-accelerated ray casting method for multiclass segmentation rendering is implemented which is well-balanced with respect to delay, frame rate, worst-case memory consumption, scalability, and image quality. Performance of segmentation operations and rendering are measured using high-resolution example data sets showing that GPU-acceleration greatly improves the performance. Compared to a reference marching cubes implementation, the rendering was found to be superior with respect to rendering delay and worst-case memory consumption while providing sufficiently high frame rates for interactive visualization and comparable image quality. The fast interactive segmentation operations and the accurate rendering make our tool particularly suitable for efficient analysis of multimodal image data sets which arise in large amounts in preclinical imaging studies.
TL;DR: A perceptual experiment executed in a teleoperated environment with kinesthetic feedback showed that the addition of tactile feedback, provided through the Haptic Thimble, significantly improved performance of an exploratory task.
Abstract: This work presents the Haptic Thimble, a novel wearable haptic device for surface exploration. The Haptic Thimble combines rendering of surface orientation with fast transient and wide frequency bandwidth tactile cues. Such features allow surface exploration with rich tactile feedback, including reactive contact — no contact transition, rendering of collisions, surface asperities and textures. Above capabilities were obtained through a novel serial kinematics wrapped around the finger, actuated by compact servo motor for orienting the last link, and by a custom voice coil for actuating the plate in contact with the fingerpad. Performance of the voice coil were measured at the bench in static and dynamic conditions, assessing the capability of reproducing generic, wide-bandwidth (0–300 Hz) tactile cues. Overall usability of the Haptic Thimble was explored within a virtual environment involving exploration of virtual surfaces. Finally, a perceptual experiment executed in a teleoperated environment with kinesthetic feedback, showed that the addition of tactile feedback, provided through the Haptic Thimble, significantly improved performance of an exploratory task.
TL;DR: In this paper, a femtosecond laser is used for rendering aerial and volumetric graphics using femto-cond (FSL) laser sources, which can produce holograms using spatial light modulation technology and scanning of a laser beam by a galvano mirror.
Abstract: We present a method of rendering aerial and volumetric graphics using femtosecond lasers. A high-intensity laser excites physical matter to emit light at an arbitrary three-dimensional position. Popular applications can thus be explored, especially because plasma induced by a femtosecond laser is less harmful than that generated by a nanosecond laser. There are two methods of rendering graphics with a femtosecond laser in air: producing holograms using spatial light modulation technology and scanning of a laser beam by a galvano mirror. The holograms and workspace of the system proposed here occupy a volume of up to 1 cm3; however, this size is scalable depending on the optical devices and their setup. This article provides details of the principles, system setup, and experimental evaluation, and discusses the scalability, design space, and applications of this system. We tested two laser sources: an adjustable (30--100fs) laser that projects up to 1,000 pulses/s at an energy of up to 7mJ/pulse and a 269fs laser that projects up to 200,000 pulses/s at an energy of up to 50μJ/pulse. We confirmed that the spatiotemporal resolution of volumetric displays implemented using these laser sources is 4,000 and 200,000 dots/s, respectively. Although we focus on laser-induced plasma in air, the discussion presented here is also applicable to other rendering principles such as fluorescence and microbubbles in solid or liquid materials.
TL;DR: The key to make simulation programmers more productive at developing portable and performant code is to introduce new linguistic abstractions, as in rendering and image processing.
Abstract: Writing highly performant simulations requires a lot of human effort to optimize for an increasingly diverse set of hardware platforms, such as multi-core CPUs, GPUs, and distributed machines. Since these optimizations cut across both the design of geometric data structures and numerical linear algebra, code reusability and portability is frequently sacrificed for performance.We believe the key to make simulation programmers more productive at developing portable and performant code is to introduce new linguistic abstractions, as in rendering and image processing. In this perspective, we distill the core ideas from our two languages, Ebb and Simit, that are published in this journal.
TL;DR: In many ways, data seems cheap to get, and in many ways it is, but the process of creating a high quality labeled dataset from a mass of data is time-consuming and expensive as mentioned in this paper.
Abstract: Data seems cheap to get, and in many ways it is, but the process of creating a high quality labeled dataset from a mass of data is time-consuming and expensive.
TL;DR: This paper proposes a lossy coding scheme to efficiently represent plenoptic images that inherits a scalable structure with three layers and shows that plenOptic images are compressed efficiently with over 60 percent bit rate reduction compared with High Efficiency Video Coding intra coding, and with over 20 percent compared with an High efficiency block copying mode.
Abstract: One of the light field capturing techniques is the focused plenoptic capturing. By placing a microlens array in front of the photosensor, the focused plenoptic cameras capture both spatial and angular information of a scene in each microlens image and across microlens images. The capturing results in a significant amount of redundant information, and the captured image is usually of a large resolution. A coding scheme that removes the redundancy before coding can be of advantage for efficient compression, transmission, and rendering. In this paper, we propose a lossy coding scheme to efficiently represent plenoptic images. The format contains a sparse image set and its associated disparities. The reconstruction is performed by disparity-based interpolation and inpainting, and the reconstructed image is later employed as a prediction reference for the coding of the full plenoptic image. As an outcome of the representation, the proposed scheme inherits a scalable structure with three layers. The results show that plenoptic images are compressed efficiently with over 60 percent bit rate reduction compared with High Efficiency Video Coding intra coding, and with over 20 percent compared with an High Efficiency Video Coding block copying mode.
TL;DR: This paper presents an approach that significantly improves image generation performance of ray tracing by combining foveated rendering based on eye tracking with reprojection rendering using previous frames in order to drastically reduce the number of new image samples per frame.
Abstract: Head-mounted displays with dense pixel arrays used for virtual reality applications require high frame rates and low latency rendering. This forms a challenging use case for any rendering approach. In addition to its ability of generating realistic images, ray tracing offers a number of distinct advantages, but has been held back mainly by its performance. In this paper, we present an approach that significantly improves image generation performance of ray tracing. This is done by combining foveated rendering based on eye tracking with reprojection rendering using previous frames in order to drastically reduce the number of new image samples per frame. To reproject samples a coarse geometry is reconstructed from a G-Buffer. Possible errors introduced by this reprojection as well as parts that are critical to the perception are scheduled for resampling. Additionally, a coarse color buffer is used to provide an initial image, refined smoothly by more samples were needed. Evaluations and user tests show that our method achieves real-time frame rates, while visual differences compared to fully rendered images are hardly perceivable. As a result, we can ray trace non-trivial static scenes for the Oculus DK2 HMD at 1182 × 1464 per eye within the the VSync limits without perceived visual differences.
TL;DR: This work proposes an algorithm that only shades visible features of the image while cost‐effectively interpolating the remaining features without affecting perceived quality, and introduces a sampling scheme that incorporates multiple aspects of the human visual system: acuity, eye motion, contrast, and brightness adaptation.
Abstract: With ever-increasing display resolution for wide field-of-view displays---such as head-mounted displays or 8k projectors---shading has become the major computational cost in rasterization. To reduce computational effort, we propose an algorithm that only shades visible features of the image while cost-effectively interpolating the remaining features without affecting perceived quality. In contrast to previous approaches we do not only simulate acuity falloff but also introduce a sampling scheme that incorporates multiple aspects of the human visual system: acuity, eye motion, contrast (stemming from geometry, material or lighting properties), and brightness adaptation. Our sampling scheme is incorporated into a deferred shading pipeline to shade the image's perceptually relevant fragments while a pull-push algorithm interpolates the radiance for the rest of the image. Our approach does not impose any restrictions on the performed shading. We conduct a number of psycho-visual experiments to validate scene- and task-independence of our approach. The number of fragments that need to be shaded is reduced by 50% to 80%. Our algorithm scales favorably with increasing resolution and field-of-view, rendering it well-suited for head-mounted displays and wide-field-of-view projection.
TL;DR: A new metric for perceptual foveated rendering quality building on HDR-VDP2 is contributed that considers the loss of fidelity in peripheral vision by lowering the contrast sensitivity of the model with visual eccentricity based on the Cortical Magnification Factor (CMF).
Abstract: Perceptually lossless foveated rendering methods exploit human perception by selectively rendering at different quality levels based on eye gaze (at a lower computational cost) while still maintaining the user's perception of a full quality render. We consider three foveated rendering methods and propose practical rules of thumb for each method to achieve significant performance gains in real-time rendering frameworks. Additionally, we contribute a new metric for perceptual foveated rendering quality building on HDR-VDP2 that, unlike traditional metrics, considers the loss of fidelity in peripheral vision by lowering the contrast sensitivity of the model with visual eccentricity based on the Cortical Magnification Factor (CMF). The new metric is parameterized on user-test data generated in this study. Finally, we run our metric on a novel foveated rendering method for real-time immersive 360° content with motion parallax.
TL;DR: This paper proposes a digital watermarking method for depth-image-based rendered 3D video that is robust to geometric distortions, such as upscaling, rotation and cropping, downscaling to an arbitrary resolution, and the most common video distortions, including lossy compression and additive noise.
Abstract: The popularity of 3D video is increasing daily due to the availability of low-cost 3D televisions and high-speed Internet access. However, currently the contents of 3D video can be distributed illegally without any protection. For views generated using a depth-image-based rendering technique, not only the left and right views can be distributed as 3D content, but also the center, left, or right views individually as 2D content. As digital video watermarking is a possible way of protecting these views from unauthorized distribution, in this paper, we propose a digital watermarking method for depth-image-based rendered 3D video. In this method, the watermark is embedded in both of the chrominance channels of a YUV representation of the center view using the dual-tree complex wavelet transform. Then, the left and right views are generated from the watermarked center view and depth map using a depth-image based rendering technique. Finally, the watermark can be extracted from the center, left, and right views in a blind fashion without using the original unwatermarked center, left, or right views. This watermark is robust to geometric distortions, such as upscaling, rotation and cropping, downscaling to an arbitrary resolution, and the most common video distortions, including lossy compression and additive noise. Due to the approximate shift invariance characteristic of the dual-tree complex wavelet transform, the technique is robust against distortions in the left and right views generated using depth-image based rendering. The proposed method can also survive baseline distance adjustment and both 2D and 3D camcording.
TL;DR: Using a single emitter of Cu-Ga-S/ZnS quantum dots, all-solution-processed white electroluminescent lighting device that not only exhibits the record quantities of 1007 cd m(-2) in luminance and 1.9% in external quantum efficiency but also possesses satisfactorily high color rendering indices of 83-88 is demonstrated.
Abstract: Using a single emitter of Cu-Ga-S/ZnS quantum dots, all-solution-processed white electroluminescent lighting device that not only exhibits the record quantities of 1007 cd m(-2) in luminance and 1.9% in external quantum efficiency but also possesses satisfactorily high color rendering indices of 83-88 is demonstrated.
TL;DR: In this paper, the authors show that using passive metamaterials as subwavelength pixels, holographic rendering can be achieved without cumbersome circuitry and with only a single transducer, thus significantly reducing system complexity.
Abstract: Acoustic holographic rendering in complete analogy with optical holography are useful for various applications, ranging from multi-focal lensing, multiplexed sensing and synthesizing three-dimensional complex sound fields. Conventional approaches rely on a large number of active transducers and phase shifting circuits. In this paper we show that by using passive metamaterials as subwavelength pixels, holographic rendering can be achieved without cumbersome circuitry and with only a single transducer, thus significantly reducing system complexity. Such metamaterial-based holograms can serve as versatile platforms for various advanced acoustic wave manipulation and signal modulation, leading to new possibilities in acoustic sensing, energy deposition and medical diagnostic imaging.
TL;DR: A technique for automatically adding realism without the expense of manually animating the requisite detail is described, enabling real-time production and display.
Abstract: Generating sentences from a library of signs implemented through a sparse set of key frames derived from the segmental structure of a phonetic model of ASL has the advantage of flexibility and efficiency, but lacks the lifelike detail of motion capture. These difficulties are compounded when faced with real-time generation and display. This paper describes a technique for automatically adding realism without the expense of manually animating the requisite detail. The new technique layers transparently over and modifies the primary motions dictated by the segmental model and does so with very little computational cost, enabling real-time production and display. The paper also discusses avatar optimizations that can lower the rendering overhead in real-time displays.
TL;DR: This paper fits CT data to procedural models to automatically recover a full range of parameters, and augment the models with a measurement-based model of flyaway fibers, to create high-quality procedural yarn models of fabrics with fiber-level details.
Abstract: Fabrics play a significant role in many applications in design, prototyping, and entertainment. Recent fiber-based models capture the rich visual appearance of fabrics, but are too onerous to design and edit. Yarn-based procedural models are powerful and convenient, but too regular and not realistic enough in appearance. In this paper, we introduce an automatic fitting approach to create high-quality procedural yarn models of fabrics with fiber-level details. We fit CT data to procedural models to automatically recover a full range of parameters, and augment the models with a measurement-based model of flyaway fibers. We validate our fabric models against CT measurements and photographs, and demonstrate the utility of this approach for fabric modeling and editing.
TL;DR: This paper proposes a new adaptive rendering method that outperforms state-of-the-art methods by controlling the tradeoff between reconstruction bias and variance through locally defining the authors' polynomial order, even without need for filtering bandwidth optimization.
Abstract: In this paper, we propose a new adaptive rendering method to improve the performance of Monte Carlo ray tracing, by reducing noise contained in rendered images while preserving high-frequency edges. Our method locally approximates an image with polynomial functions and the optimal order of each polynomial function is estimated so that our reconstruction error can be minimized. To robustly estimate the optimal order, we propose a multi-stage error estimation process that iteratively estimates our reconstruction error. In addition, we present an energy-preserving outlier removal technique to remove spike noise without causing noticeable energy loss in our reconstruction result. Also, we adaptively allocate additional ray samples to high error regions guided by our error estimation. We demonstrate that our approach outperforms state-of-the-art methods by controlling the tradeoff between reconstruction bias and variance through locally defining our polynomial order, even without need for filtering bandwidth optimization, the common approach of other recent methods.