TL;DR: OSPRay is presented, a turn-key CPU ray tracing framework oriented towards production-use scientific visualization which can utilize varying SIMD widths and multiple device backends found across diverse HPC resources.
Abstract: Scientific data is continually increasing in complexity, variety and size, making efficient visualization and specifically rendering an ongoing challenge. Traditional rasterization-based visualization approaches encounter performance and quality limitations, particularly in HPC environments without dedicated rendering hardware. In this paper, we present OSPRay, a turn-key CPU ray tracing framework oriented towards production-use scientific visualization which can utilize varying SIMD widths and multiple device backends found across diverse HPC resources. This framework provides a high-quality, efficient CPU-based solution for typical visualization workloads, which has already been integrated into several prevalent visualization packages. We show that this system delivers the performance, high-level API simplicity, and modular device support needed to provide a compelling new rendering framework for implementing efficient scientific visualization workflows.
TL;DR: Two architectural designs are proposed to enable Processing-In-Memory based GPU for efficient 3D rendering and provide considerable memory traffic and energy reduction without sacrificing rendering quality.
Abstract: The performance of 3D rendering of GraphicsProcessing Unit that converts 3D vector stream into 2D framewith 3D image effects significantly impacts users gamingexperience on modern computer systems. Due to its hightexture throughput requirement, main memory bandwidthbecomes a critical obstacle for improving the overall renderingperformance. 3D-stacked memory systems such as HybridMemory Cube provide opportunities to significantly overcomethe memory wall by directly connecting logic controllers toDRAM dies. Although recent works have shown promisingimprovement in performance by utilizing HMC to acceleratespecial-purpose applications, a critical challenge of how toeffectively leverage its high internal bandwidth and computingcapability in GPU for 3D rendering remains unresolved. Basedon the observation that texel fetches greatly impact off-chipmemory traffic, we propose two architectural designs to enableProcessing-In-Memory based GPU for efficient 3D rendering. Additionally, we employ camera angles of pixels to controlthe performance-quality tradeoff of 3D rendering. Extensiveevaluation across several real-world games demonstrates thatour design can significantly improve the performance of texturefiltering and 3D rendering by an average of 3.97X (up to 6.4X) and 43% (up to 65%) respectively, over the baseline GPU. Meanwhile, our design provides considerable memory trafficand energy reduction without sacrificing rendering quality.
TL;DR: This work discusses the challenges and implementation choices that follow from the primary design decisions, demonstrating that such a rendering system can be made a practical, scalable, and efficient real-world application that is in use by many industry professionals today.
Abstract: While ray tracing has become increasingly common and path tracing is well understood by now, a major challenge consists of crafting an easy-to-use and efficient system implementing these technologies. Following a purely physically-based paradigm while still allowing for artistic workflows, the Iray light transport simulation and rendering system allows for rendering complex scenes by the push of a button and thus makes accurate light transport simulation widely available. We discuss the challenges and implementation choices that follow from our primary design decisions, demonstrating that such a rendering system can be made a practical, scalable, and efficient real-world application that is in use by many industry professionals today.
TL;DR: In this paper, the authors present a method to reduce the computational demands for user perspective rendering by applying lightweight optical flow tracking and an estimation of the users motion before head tracking is started.
Abstract: Handheld Augmented Reality commonly implements some variant of magic lens rendering, which turns only a fraction of the users real environment into AR while the rest of the environment remains unaffected. Since handheld AR devices are commonly equipped with video see-through capabilities, AR magic lens applications often suffer from spatial distortions, because the AR environment is presented from the perspective of the camera of the mobile device. Recent approaches counteract this distortion based on estimations of the users head position, rendering the scene from the user's perspective. To this end, approaches usually apply face-tracking algorithms on the front camera of the mobile device. However, this demands high computational resources and therefore commonly affects the performance of the application beyond the already high computational load of AR applications. In this paper, we present a method to reduce the computational demands for user perspective rendering by applying lightweight optical flow tracking and an estimation of the users motion before head tracking is started. We demonstrate the suitability of our approach for computationally limited mobile devices and we compare it to device perspective rendering, to head tracked user perspective rendering, as well as to fixed point of view user perspective rendering.
TL;DR: A recently introduced three-dimensional post-processing technique named Cinematic Rendering now makes it possible to use the output of routine CT and MR examinations as the basis for highly photo-realistic 3-D depictions of human anatomy.
Abstract: Modern computer techniques have been in use for several years to generate three-dimensional visualizations of human anatomy. Very good 3-D computer models of the human body are now available and used routinely in anatomy instruction. These techniques are subsumed under the heading “virtual anatomy” to distinguish them from the conventional study of anatomy entailing cadavers and anatomy textbooks. Moreover, other imaging procedures (X-ray, angiography, CT and MR) are also used in virtual anatomy instruction. A recently introduced three-dimensional post-processing technique named Cinematic Rendering now makes it possible to use the output of routine CT and MR examinations as the basis for highly photo-realistic 3-D depictions of human anatomy. We have installed Cinematic Rendering (enabled for stereoscopy) in a high-definition 8K 3-D projection space that accommodates an audience of 150. The space’s projection surface measures 16 × 9 meters; images can be projected on both the front wall and the floor. A game controller can be used to operate Cinematic Rendering software so that it can generate interactive real-time depictions of human anatomy on the basis of CT and MR data sets. This prototype installation was implemented without technical problems; in day-to-day, real-world use over a period of 22 months, there were no impairments of service due to software crashes or other technical problems. We are already employing this installation routinely for educational offerings open to the public, courses for students in the health professions, and (continuing) professional education units for medical interns, residents and specialists—in, so to speak, the dissecting theater of the future.
TL;DR: This work demonstrates a flexible parallel rendering framework built upon a task-based dynamic runtime environment enabling adaptable performance-oriented deployment on various platform configurations, and represents an effective and easy-to-control trade-off between sort-first and sort-last image compositing.
Abstract: An increasingly heterogeneous system landscape in modern high performance computing requires the efficient and portable adaption of performant algorithms to diverse architectures. However, classic hybrid shared-memory/distributed systems are designed and tuned towards specific platforms, thus impeding development, usage and optimization of these approaches with respect to portability. We demonstrate a flexible parallel rendering framework built upon a task-based dynamic runtime environment enabling adaptable performance-oriented deployment on various platform configurations. Our task definition represents an effective and easy-to-control trade-off between sort-first and sort-last image compositing, enabling good scalability in combination with inherent dynamic load balancing. We conduct comprehensive benchmarks to verify the characteristics and potential of our novel task-based system design for high-performance visualization.
TL;DR: A case study in traffic scenario is taken to empirically analyze the performance degradation when CV systems trained with virtual data are transferred to real data and a generative model coupled with 3D CAD shapes for scene instance synthesis and system performance tradeoffs due to the choice of rendering engine are explored.
Abstract: There is a growing interest to utilize Computer Graphics (CG) renderings to generate large scale annotated data in order to train machine learning systems, such as Deep convolutional neural networks, for Computer Vision (CV). However, there has been a long debate on the usefulness of CG generated data for tuning CV systems (even from the 1980's). Especially, the impact of modeling errors and computational rendering approximations, due to choices in the rendering pipeline, on trained CV systems generalization performance is still not clear. In this paper, we take a case study in traffic scenario to empirically analyze the performance degradation when CV systems trained with virtual data are transferred to real data. We: a) discuss a generative model coupled with 3D CAD shapes for scene instance synthesis and, b) explore system performance tradeoffs due to the choice of rendering engine (e.g. Lambertian shader (LS), ray-tracing (RT), and Monte-carlo path tracing (MCPT)) and their respective parameters. DeepLab, that performs semantic segmentation, is chosen as the CV system being evaluated. In our case study, involving traffic scenes, when the CV system is trained with CG data samples (that use MCPT or RT) and augmented with only 10% of real-world training data from CityScapes dataset, the performance levels achieved are comparable to that of training DeepLab with the complete CityScapes dataset. Use of samples from LS degraded the performance of DeepLab by 20%. Physics-based MCPT rendering improved the performance by 6% but at the cost of more than 3 times the rendering time.
TL;DR: A real-time volume rendering component for the Web, which provides a set of illustrative and non-photorealistic styles to several volume data types, offering a suitable tool for declarative volume rendering on the Web.
Abstract: We present a real-time volume rendering component for the Web, which provides a set of illustrative and non-photorealistic styles. Volume data is used in many scientific disciplines, requiring the visualization of the inner data, features for enhancing extracted characteristics or even coloring the volume. The Medical Working Group of X3D published a volume rendering specification. The next step is to build a component that realizes the functionalities defined by the specification. We have designed and built a volume rendering component integrated in the X3DOM framework. This component allows content developers to use the X3D specification. It combines and applies multiple rendering styles to several volume data types, offering a suitable tool for declarative volume rendering on the Web. As we show in the result section, the proposed component can be used in many fields that requires the visualization of multi-dimensional data, such as in medical and scientific fields. Our approach is based on WebGL and X3DOM, providing content developers with an easy and flexible declarative way of sharing and visualizing volumetric content over the Web.
TL;DR: It is shown that the implementation details can significantly affect the obtained performance with discrepancies up to 3 orders of magnitude and the effectiveness of the proposal on two embedded platforms is demonstrated, obtaining more than 16× speedup over benchmarks designed following OpenGL ES 2 best practices.
Abstract: Previous works in the literature have shown the feasibility of general purpose computations for non-visual applications on low-end mobile graphics processors using graphics APIs. These works focused only on the functional aspects of the software, ignoring the implementation details and therefore their performance implications due to their particular micro-architecture. Since various steps in such applications can be implemented in multiple ways, we identify optimisation opportunities, explore the different options and evaluate them. We show that the implementation details can significantly affect the obtained performance with discrepancies up to 3 orders of magnitude and we demonstrate the effectiveness of our proposal on two embedded platforms, obtaining more than 16× speedup over benchmarks designed following OpenGL ES 2 best practices.
TL;DR: This paper investigates the effect spatialized directional sound has on the visual attention of a user towards rendered images via the use of multi‐modal maps and finds them to perform significantly better than only using image saliency maps that are naively applied to multi-modal VEs.
Abstract: A major challenge in generating high-fidelity virtual environments VEs is to be able to provide realism at interactive rates. The high-fidelity simulation of light and sound is still unachievable in real time as such physical accuracy is very computationally demanding. Only recently has visual perception been used in high-fidelity rendering to improve performance by a series of novel exploitations; to render parts of the scene that are not currently being attended to by the viewer at a much lower quality without the difference being perceived. This paper investigates the effect spatialized directional sound has on the visual attention of a user towards rendered images. These perceptual artefacts are utilized in selective rendering pipelines via the use of multi-modal maps. The multi-modal maps are tested through psychophysical experiments to examine their applicability to selective rendering algorithms, with a series of fixed cost rendering functions, and are found to perform significantly better than only using image saliency maps that are naively applied to multi-modal VEs.
TL;DR: In this article, the authors proposed a method to obtain a video stream of a direct broadcasting room from a server, determining a decoding condition provided by a terminal, selecting a matched decoding interface to decode the video stream, and preferentially calling a hardware decoding interface for decoding when the decoding condition includes the hardware decoding interfaces and a software decoding interface, otherwise calling the software decoding interfaces to decode.
Abstract: The invention relates to the field of multimedia technologies, in particular to a video playing control method and device as well as terminal equipment The method comprises the following steps of obtaining video stream of a direct broadcasting room from a server; determining a decoding condition provided by a terminal, selecting a matched decoding interface to decode the video stream, and preferentially calling a hardware decoding interface to decode when the decoding condition includes the hardware decoding interface and a software decoding interface, otherwise calling the software decoding interface to decode; and determining a rendering condition provided by the terminal, selecting a matched graphic rendering interface to output the decoded video stream to a user interface to display after drawing, and preferentially calling a hardware rendering interface to render when the rendering condition includes the hardware rendering interface and a software rendering interface, otherwise calling the software rendering interface to render According to the method, the device and the terminal equipment, the terminal can realize smooth watching of the live video stream at high resolution, high bit rate and high frame rate
TL;DR: A method based on the Graphics Processing Unit (GPU) for voxelization and visualization, suitable for both interactive and offline rendering, is introduced.
Abstract: Most popular methods in cloth rendering rely on volumetric data in order to model complex optical phenomena such as sub-surface scattering. These approaches are able to produce very realistic illumination results, but their volumetric representations are costly to compute and render, forfeiting any interactive feedback. In this paper, we introduce a method based on the Graphics Processing Unit GPU for voxelization and visualization, suitable for both interactive and offline rendering. Recent features in the OpenGL model, like the ability to dynamically address arbitrary buffers and allocate bindless textures, are combined into our pipeline to interactively voxelize millions of polygons into a set of large three-dimensional 3D textures >109 elements, generating a volume with sub-voxel accuracy, which is suitable even for high-density woven cloth such as linen.
TL;DR: ButterFly, a novel system which collaboratively utilizes mobile GPUs to process high-quality rendering details for on-the-go mobile users, achieves two technical contributions for the collaborative design: a mobile device can migrate GPU workloads in buffer queue to peers and the collaborative rendering mechanism benefits user high quality details while significant power saving performance.
Abstract: The ever increasing of display resolution on mobile devices raises high demand for GPU rendering details. However, the challenge of poor hardware support but fine-grained rendering details often makes user unsatisfied especially in calling for high frame rate scenarios, e.g., game. To resolve such issue, we propose BUTTERFLY, a novel system which collaboratively utilizes mobile GPUs to process high-quality rendering details for on-the-go mobile users. In particular, ButterFly achieves two technical contributions for the collaborative design: (1) a mobile device can migrate GPU workloads in buffer queue to peers, and (2) the collaborative rendering mechanism benefits user high quality details while significant power saving performance. Both techniques are compatible with the OpenGL ES standards. Furthermore, a 40-person survey perceives that ButterFly can provide excellent user experience of both rendering details and frame rate over Wi-Fi network. In addition, our comprehensive trace-driven experiments on Android prototype reveal the benefits of Butterfly have more superior performance over state-of-the-art systems, which achieves more than 28.3% power saving.
TL;DR: This paper employs a light-weight mechanism to dynamically adjust the GPU memory access rate so that the GPU is able to just meet the required QoS level, which frees up memory system resources which can be shifted to the co-running CPU applications.
Abstract: Heterogeneous chip-multiprocessors with integrated CPU and GPU cores on the same die allow sharing of critical memory system resources among the applications executing on the twotypes of cores. In this paper, we explore memory system management driven by the quality of service (QoS) requirement of the GPU applications executing simultaneously with CPUapplications in such heterogeneous platforms. Our proposal dynamically estimates the level of QoS (e.g., frame rate in 3D scene rendering) of the GPU application. Unlike the priorproposals, our algorithm does not require any profile information and does not assume tile-based deferred rendering. If the estimated quality of service meets the minimum acceptable QoS level, our proposal employs a light-weight mechanism to dynamically adjust the GPU memory access rate so that the GPU is able to just meet the required QoS level. This frees up memory system resources which can be shifted to the co-running CPU applications. Detailed simulations done on a heterogeneous chip-multiprocessor with one GPU and four CPU cores running heterogeneous mixes of DirectX, OpenGL, and CPU applications show that our proposal improves the CPU performance by 18% on average.
TL;DR: This work analyzes real-world rendering workloads, derive requirements for effective patterns, and presents ten different pattern design strategies based on these requirements, and compares the performance of select patterns in a parallel sort-middle software rendering pipeline on an extensive set of triangle data captured from eight recent video games.
Abstract: To effectively utilize an ever increasing number of processors during parallel rendering, hardware and software designers rely on sophisticated load balancing strategies. While dynamic load balancing is a powerful solution, it requires complex work distribution and synchronization mechanisms. Graphics hardware manufacturers have opted to employ static load balancing strategies instead. Specifically, triangle data is distributed to processors based on its overlap with screenspace tiles arranged in a fixed pattern. While the current strategy of using simple patterns for a small number of fast rasterizers achieves formidable performance, it is questionable how this approach will scale as the number of processors increases further. To address this issue, we analyze real-world rendering workloads, derive requirements for effective patterns, and present ten different pattern design strategies based on these requirements. In addition to a theoretical evaluation of these design strategies, we compare the performance of select patterns in a parallel sort-middle software rendering pipeline on an extensive set of triangle data captured from eight recent video games. As a result, we are able to identify a set of patterns that scale well and exhibit significantly improved performance over naive approaches.
TL;DR: The Dynamic Ray Shuffling (DRS) architecture for GPUs is proposed, which significantly improves the SIMD efficiency for the tested benchmarks from 41.06% to 81.04% on average.
Abstract: Computer graphics is generally divided into two branches: real-time rendering and physically-based rendering. Conventional graphics processing units (GPUs) were designed to accelerate the former which is based on the standard Z-buffer algorithm. However, many applications in entertainment, science, and industry require high quality visual effects such as soft-shadows, reflections, and diffuse lighting interactions which are difficult to achieve with the Z-buffer algorithm, but are straightforward to implement using physically-based rendering methods. Physically-based rendering can already be implemented on present programmable GPUs. However, for physically-based rendering on GPUs, a large portion of the processing power is wasted due to low utilization of SIMD units. This is because the core algorithm of physically-based rendering, ray tracing, suffers from Single Instruction, Multiple Thread (SIMT) control flow divergences. In this paper, we propose the Dynamic Ray Shuffling (DRS) architecture for GPUs to address this problem. Our key insight is that the primary control flow divergences are caused by inconsistent ray traversal states of a warp, and can be eliminated by dynamically shuffling rays. Experimental results show that, for an estimated 0.11% area cost, DRS significantly improves the SIMD efficiency for the tested benchmarks from 41.06% to 81.04% on average. With this, the performance of a physically-based rendering method such as path tracing can be improved by $1.67 \times - 1.92 \times$, and $1.79 \times$ on average.CCS CONCEPTS• Computer systems organization $\rightarrow$ Single instruction, multiple data;
TL;DR: Experimental results show the algorithm is able to improve rendering efficiency and frame rate stability in terrain navigation and 'cracks' caused by different resolution between adjacent levels are eliminated by modifying outer tessellation level factor of shared edges between levels.
Abstract: Due to heavy rendering load and unstable frame rate when rendering large terrain, this paper proposes a geometry clipmaps based algorithm. Triangle meshes are generated by few tessellation control points in GPU tessellation shader. ‘Cracks’ caused by different resolution between adjacent levels are eliminated by modifying outer tessellation level factor of shared edges between levels. Experimental results show the algorithm is able to improve rendering efficiency and frame rate stability in terrain navigation. key words: terrain rendering, GPU, tessellation shader, geometry clipmaps
TL;DR: A novel approach to the problem of quality-interactivity trade-off through a progressive feedback-driven rendering algorithm that uses reprojections of past views to accelerate the reconstruction of the current view and can be used to extend existing point cloud viewing algorithms.
Abstract: Growing evidence indicates that transitioning patients are often unprepared for the self-management role they must assume when they return home. Over the past twenty five years, LiDAR scanning has emerged as a fascinating technology that allows for the rapid acquisition of three dimensional data of real world environments while new virtual reality (VR) technology allows users to experience simulated environments. However, combining these two technologies can be difficult as previous approaches to interactively rendering large point clouds have generally created a trade-off between interactivity and quality. For instance, many techniques used in commercially available software have utilized methods to sub-sample data during interaction, only showing a high-quality render when the viewpoint is kept static. Unfortunately, for displays in which viewpoints are rarely static, such as virtual reality systems, these methods are not useful. This paper presents a novel approach to the problem of quality-interactivity trade-off through a progressive feedback-driven rendering algorithm. This technique uses reprojections of past views to accelerate the reconstruction of the current view and can be used to extend existing point cloud viewing algorithms. The presented method is tested against previous methods, demonstrating marked improvements in both rendering quality and interactivity. This algorithm and rendering application could serve as a tool to enable virtual rehabilitation within 3D models of one's own home from a remote location.
TL;DR: This work proposes a client-end GPU-accelerated scene warping technique to approximate the rendered frames between key frames, meanwhile hiding the interaction delay and improving the user experience when the link between the server and client becomes wireless.
Abstract: In a remote rendering system, the display device and the main graphics processing unit (GPU) are located in different places, which is a kind of client-server architecture and is widely used in cloud gaming and virtual reality (VR). To reduce the interaction delay and improve the user experience especially when the link between the server and client becomes wireless, we propose a client-end GPU-accelerated scene warping technique to approximate the rendered frames between key frames, meanwhile hiding the interaction delay. The distributed rendering technique can warp and interpolate images on the client end with the server-rendered reference background images and the corresponding depth maps. A mobile GPU is also employed to accelerate the image warping operation. In addition, the foreground layer is rendered with GPU and then blended with the approximated background layer generated from background warping. A prototype of the proposed system is implemented with a commercialized smart phone. Compared with the video streaming-based cloud gaming system, the evaluation results show that the Game Mean Opinion Score (GMOS) increases from 1 to 4.1 for the interactive games even in a high network delay condition.
TL;DR: The gVirtualXRay as mentioned in this paper is an Open-source library that implements the attenuation law (also called Beer-Lambert) on GPU to simulate realistic X-ray images in realtime.
Abstract: We present an Open-source library called gVirtualXRay to simulate realistic X-ray images in realtime. It implements the attenuation law (also called Beer-Lambert) on GPU. It takes into account the polychromatism of the beam spectra as well as the finite size of X-ray tubes. The library is written in C++ using modern OpenGL. It is fully portable and works on most common desktop/laptop computers. It has been tested on MS Windows, Linux, and Mac OS X. It supports a wide range of windowing solutions, such as FLTK, GLUT, GLFW3, Qt4, and Qt5. The library also offers realistic visual rendering of anatomical structures, including bones, liver, diaphragm and lungs. The accuracy of the X-ray images produced by gVirtualXRay's implementation has been validated using Geant4, a well established state-of-the-art Monte Carlo simulation toolkit developed by CERN. gVirtualXRay can be used in a wide range of applications where fast and accurate X-ray simulations from polygon meshes are needed, e.g. medical simulators for training purposes, simulation of tomography data acquisition with patient motion to include artefacts in reconstructed CT images, and deformable registration. Our application example package includes real-time respiration and X-ray simulation, CT acquisition and reconstruction, and iso-surfacing of implicit functions using Marching Cubes.
TL;DR: This paper shows how progressive rendering by means of multi-frame sampling and frame accumulation can introduce high-quality visual effects using robust and straightforward implementations of WebGL.
Abstract: Information cartography services provided via web-based clients using real-time rendering do not always necessitate a continuous stream of updates in the visual display This paper shows how progressive rendering by means of multi-frame sampling and frame accumulation can introduce high-quality visual effects using robust and straightforward implementations For it, (1) a suitable rendering loop is described, (2) WebGL limitations are discussed, and (3) an adaption of THREEjs featuring progressive anti-aliasing, screen-space ambient occlusion, and depth of field is detailed Furthermore, sampling strategies are discussed and rendering performance is evaluated, emphasizing the low per-frame costs of this approach
TL;DR: This work presents a new method for progressive volume rendering by accumulating object-space samples over successively rendered frames, and demonstrates that it is particularly useful for rendering volumetric data with costly sampling functions.
Abstract: We present a new method for progressive volume rendering by accumulating object-space samples over successively rendered frames. Existing methods for progressive refinement either use image space methods or average pixels over frames, which can blur features or integrate incorrectly with respect to depth. Our approach stores samples along each ray, accumulates new samples each frame into a buffer, and progressively interleaves and integrates these samples. Though this process requires additional memory, it ensures interactivity and is well suited for CPU architectures with large memory and cache. This approach also extends well to distributed rendering in cluster environments. We implement this technique in Intel’s open source OSPRay CPU ray tracing framework and demonstrate that it is particularly useful for rendering volumetric data with costly sampling functions.
TL;DR: In this article, a three-dimensional virtual real-time display method of a physical product based on a physically based rendering (PBR) technology is presented, which includes the following steps that A, a 3D model of the physical product is created; B, the model is imported into development software for scene design, and a scene resource file is stored.
Abstract: The invention provides a three-dimensional virtual real-time display method of a physical product based on a physically based rendering (PBR) technology The method includes the following steps that A, a 3D model of the physical product is created; B, the 3D model is imported into development software for scene design, and a scene resource file is stored; C, 3D material rendering software in favor of the PBR technology process is selected for making the material of the physical product, and PBR material files containing five rendering maps in the PBR technology process is generated; D, each PBR material file is imported into the development software and packaged into an independent material resource file, and the material resource files are stored; E, by means of the development software, a user interaction interface and script codes are designed and compiled into programs which can be executed by various operating systems By means of the method, the three-dimensional virtual display effect of the physical product is more vivid, and better experience is provided for a consumer; the production requirements of the PBR material files of the physical product are lower, thus the time of material file production can be saved, and the overall efficiency is improved
TL;DR: A distributed data visualization framework for HPC environments based on the PBVR (Particle Based Volume Rendering) method, targeted to work also on systems without any hardware graphics acceleration capability, which are commonly found on modern HPC operational environments is presented.
Abstract: In this paper, we present a distributed data visualization framework for HPC environments based on the PBVR (Particle Based Volume Rendering) method. The PBVR method is a kind of point-based rendering approach where the volumetric data to be visualized is represented as a set of small and opaque particles. This method has the object-space and image-space variants, defined by the place (object or image- space) where the particle data sets are generated. We focused on the object-space approach, which has the advantage when handling large-scale simulation data sets such as those generated by modern HPC systems. In the object-space approach, the particle generation and the subsequent rendering processes can be easily decoupled. In this work, we took advantage of this separability to implement the proposed distributed rendering framework. The particle generation process utilizes the functionalities provided by the KVS (Kyoto Visualization System), and the particle rendering process utilizes the functionalities provided by the HIVE (Heterogeneously Integrated Visual- analytics Environment). The proposed distributed visualization framework is targeted to work also on systems without any hardware graphics acceleration capability, which are commonly found on modern HPC operational environments. We evaluated this PBVR-based distributed visualization infrastructure on the K computer operational environment by utilizing a CPU-only processing server for the particle data generation and rendering. In this preliminary evaluation, using some CFD (Computational Fluid Dynamics) simulation data sets, we obtained encouraging results for pushing further the development in order to make this system available as an effective visualization alternative for the HPC users.
TL;DR: A new approach to rendering high quality visualisations of molecular trajectories using a regular grid to be constructed every time the molecular structure deforms, allowing per-pixel lighting effects and ambient occlusion to be rendered every frame, at interactive refresh rates.
Abstract: Producing high quality depictions of molecular structures has been an area of academic interest for years, with visualisation tools such as UCSF Chimera, Yasara and PyMol providing a huge number of different rendering modes and lighting effects. However, no visualisation program supports per-pixel lighting effects with shadows whilst rendering a molecular trajectory in space filling mode. In this paper, a new approach to rendering high quality visualisations of molecular trajectories is presented. To enhance depth, ambient occlusion is included within the render. Shadows are also included to help the user perceive relative motions of parts of the protein as they move based on their trajectories. Our approach requires a regular grid to be constructed every time the molecular structure deforms allowing per-pixel lighting effects and ambient occlusion to be rendered every frame, at interactive refresh rates. Two different regular grids are investigated, a fixed grid and a memory efficient compact grid. The algorithms used allow trajectories of proteins comprising of up to 300,000 atoms in size to be rendered at ninety frames per second on a desktop computer using the GPU for general purpose computations. Regular grid construction was found to only take up a small proportion of the total time to render a frame. It was found that despite being slower to construct, the memory efficient compact grid outperformed the theoretically faster fixed grid when the protein being rendered is large, owing to its more efficient memory access patterns. The techniques described could be implemented in other molecular rendering software.
TL;DR: This paper proposes ARINC661 rendering based on OpenVG, a standard of 2D vector graphics API defined by the Khronos Group that is cost effective to rendering and certification of software quality by reducing complexity of application source code.
Abstract: ARINC 661 is a standard for the CDS (Cockpit Display Systems) defined by ARINC (Aeronautical Radio, Inc.) and used for application to communicate and display sensing data and information. ARINC 661 contains various two-dimensional GUI (Graphical User Interface) widget definitions. The widget set covers a variety of graphics elements including circles, arcs, crowns and others. Additionally, the symbol widgets provide rendering of user defined polygonal shapes which are typically described with triangle fans and triangle strips. ARINC 661 has various rendering features including text, image output, transformation of object and the halo effect which renders the outlines of graphics objects for highlighting. It is possible to use 3D graphics libraries like OpenGL and DirectX for implementation of graphics features. However those 3D graphics libraries are too heavy and over-powered to show 2D graphics primitives defined in ARINC 661. In this paper, we propose ARINC661 rendering based on OpenVG. OpenVG is a standard of 2D vector graphics API defined by the Khronos Group. OpenVG is designed for embedded system GUI rendering and its features are appropriate to implement ARINC661. Compare with OpenGL, OpenVG defined with fixed pipeline architecture without programmable shader. Therefore, OpenVG is cost effective to rendering and certification of software quality by reducing complexity of application source code. We also propose ARINC661 use cases for wireless application to communicate and display data from various sensor network, information gathered from unmanned aerial and land vehicles.
TL;DR: This paper proposes a real-time resolution-independent vector-embedded shading method for 3D animated objects that enables high-qualityreal-time Graphics Processing Unit (GPU)-based coloring for real- time 3D animation rendering through the authors' efficient SVG-embedding rendering pipeline while using a small amount of texture memory and transmission bandwidth.
Abstract: High-resolution textures are determinant of not only high rendering quality in gaming and movie industries, but also of burdens in memory usage, data transmission bandwidth, and rendering efficiency. Therefore, it is desirable to shade 3D objects with vector images such as scalable vector graphics (SVG) for compactness and resolution independence. However, complicated geometry and high rendering cost limit the rendering effectiveness and efficiency of vector texturing techniques. In order to overcome these limitations, this paper proposes a real-time resolution-independent vector-embedded shading method for 3D animated objects. Our system first decomposes a vector image consisting of layered close coloring regions into unifying-coloring units for mesh retriangulation and 1D coloring texture construction, where coloring denotes color determination for a point based on an intermediate medium such as a raster/vector image, unifying denotes the usage of the same set of operations, and unifying coloring denotes coloring with the same-color computation operations. We then embed the coloring information and distances to enclosed unit boundaries in retriangulated vertices to minimize embedded information, localize vertex-embedded shading data, remove overdrawing inefficiency, and ensure fixed-length shading instructions for data compactness and avoidance of indirect memory accessing and complex programming structures when using other shading and texturing schemes. Furthermore, stroking is the process of laying down a fixed-width pen-centered element along connected curves, and our system also decomposes these curves into segments using their curve-mesh intersections and embeds their control vertices as well as their widths in the intersected triangles to avoid expensive distance computation. Overall, our algorithm enables high-quality real-time Graphics Processing Unit (GPU)-based coloring for real-time 3D animation rendering through our efficient SVG-embedded rendering pipeline while using a small amount of texture memory and transmission bandwidth.
TL;DR: This work presents a hybrid ray tracing system, where the work is divided between the CPU cores and the GPU in an integrated chip, and communication occurs via shared memory, and introduces a method to support light paths with arbitrary recursion, such as multiple recursive Whitted‐style ray tracing and adaptive sampling.
Abstract: We present a hybrid ray tracing system, where the work is divided between the CPU cores and the GPU in an integrated chip, and communication occurs via shared memory Rays are organized in large packets that can be distributed among the two units as needed Testing visibility between rays and the scene is mostly performed using an optimized kernel on the GPU, but the CPU can help as necessary The CPU cores typically handle most or all shading, which makes it easy to support complex appearances For efficiency, the CPU cores shade whole batches of rays by sorting them on material and shading each material using a vectorized kernel In addition, we introduce a method to support light paths with arbitrary recursion, such as multiple recursive Whitted-style ray tracing and adaptive sampling where the result of a ray is examined before sending the next, while still batching up rays for the benefit of GPU-accelerated traversal and vectorized shading This allows our system to achieve high rendering performance while maintaining the flexibility to accommodate different rendering algorithms (Less)
TL;DR: The authors have analyzed the potential of various freeware or shareware software, well suited to the typical subjects of Engineering Design, in the areas of CAD-3D design and rendering.
Abstract: In Engineering Schools is a growing concern to offer its students the latest in 3D CAD modeling software that respond to advanced design needs. The latest software is increasingly comprehensive and complex. However, the time available for teaching such software is scarce and rarely get to know more than 20% of the potential of the software for a particular area. However, other 3D CAD solutions are on the market, low cost, very friendly, easy to learn interface, with much potential and you will get to have all the features of more advanced software, can fully meet the teaching requirements this area. Since the introduction of the EHEA (European Higher Education Area), the target level of teaching an engineering school in the field of 3D CAD modeling, should be to develop in students in the time available, the maximum capacities and skills three-dimensional geometric design. And this, when well planned, can be achieved with the proper use of low cost software. The authors have analyzed the potential of various freeware or shareware software, well suited to the typical subjects of Engineering Design, in the areas of CAD-3D design and rendering.
TL;DR: This work proposes an approach utilizing advanced memory management and bridging the Open Computing Language (OpenCL) and Open Graphics Library (OpenGL) drivers to optimize the final rendering frame rate, and illustrates the concept of the memory mapping technique and the hybrid OpenCL and OpenGL combination through a real molecular dynamics simulation example.
Abstract: Achieving real-time molecular dynamics rendering is a challenge, especially when the rendering requires intensive computation involving a large simulation data-set. The task becomes even more challenging when the size of the data is too large to fit into random access memory (RAM) and the final imagery depends on the input and output (I/O) performance. The large data size and the complex computation processing per frame pose a number of challenges. i.e. the I/O performance bottleneck, the computational processing performance costs, and the fast rendering challenge. Handling these challenges separately consumes a significant portion of the total processing time which may result in low frame rates. We address these challenges by proposing an approach utilizing advanced memory management and bridging the Open Computing Language (OpenCL) and Open Graphics Library (OpenGL) drivers to optimize the final rendering frame rate. We illustrate the concept of the memory mapping technique and the hybrid OpenCL and OpenGL combination through a real molecular dynamics simulation example. The simulation data-set specifies the evolution of 336,260 particles over 1981 time steps occupying 8 Gigabyte of memory. The dynamics of the system including the lipid-protein interactions can be rendered at up to 40 FPS.