TL;DR: In this paper, the authors describe a system that enables a user to execute, operate and interact with a software application such as a video game, on a client wherein the software application is executing on a remote server.
Abstract: Features are described herein that may be used to implement a system that enables a user to execute, operate and interact with a software application, such as a video game, on a client wherein the software application is executing on a remote server. The features enable the system to be implemented in an optimized fashion. For example, one feature entails intercepting graphics commands generated by the software application that are directed to a graphics application programming interface (API), manipulating the intercepted graphics commands to produce manipulated graphics commands that are reduced in size as compared to the intercepted graphics commands, and transferring the manipulated graphics commands from the server to the client for rendering thereon.
TL;DR: Interactive Computer Graphics: A Top-Down Approach with Shader-Based OpenGL, 6e, is the only introduction to computer graphics text for undergraduates that fully integrates OpenGL 3.1 and emphasizes application-based programming.
Abstract: This book is suitable for undergraduate students in computer science and engineering, for students in other disciplines who have good programming skills, and for professionals. Computer animation and graphicsonce rare, complicated, and comparatively expensiveare now prevalent in everyday life from the computer screen to the movie screen. Interactive Computer Graphics: A Top-Down Approach with Shader-Based OpenGL, 6e, is the only introduction to computer graphics text for undergraduates that fully integrates OpenGL 3.1 and emphasizes application-based programming. Using C and C++, the top-down, programming-oriented approach allows for coverage of engaging 3D material early in the text so readers immediately begin to create their own 3D graphics. Low-level algorithms (for topics such as line drawing and filling polygons) are presented after readers learn to create graphics.
TL;DR: In this paper, a method and system for providing hardware accelerated graphics for network enabled applications is presented, which includes providing a network enabled application on a host, the application requiring hardware accelerated GPUs not provided on the host; providing a 3D library wrapper at the host for connection to a broker of 3D graphics rendering resources.
Abstract: A method and system are provided for providing hardware accelerated graphics for network enabled applications. The method includes providing a network enabled application on a host, the application requiring hardware accelerated graphics not provided on the host; providing a 3D library wrapper at the host for connection to a broker of 3D graphics rendering resources. The broker receives a request for 3D graphics rendering resources, and evaluates available rendering resources and allocates a selected 3D graphics rendering resource to the 3D library wrapper, in order to return final 2D rendered images to a client. The network enabled application may execute on a virtual machine on the host or on a terminal services session on the host and is accessed by a remote client.
TL;DR: In this article, the authors present an approach for enabling a pleasing lightweight transition between two more complete renderings of content associated with a location-based service, where a device is caused to present the first rendering of a graphical user interface based on location information of a 3D model or models, panoramic image data, etc. corresponding to the starting location information.
Abstract: An approach is provided for enabling a pleasing lightweight transition between two more complete renderings of content associated with a location based service. A device is caused to present the first rendering of a graphical user interface based on location information of a three-dimensional model or models, panoramic image data, etc. corresponding to the starting location information. A change in rendering location is caused, leading to a series of transition renderings based in part on models and possibly image data associated with the intermediate locations, before finally the device presents the destination rendering similar to the starting rendering. The transition renderings provide a pleasing transition, which also allows the device time to fetch and process the heavier data associated with the final rendering.
TL;DR: Razor is presented, a new software rendering architecture for distribution ray tracing designed to produce high-quality images with high performance on future single-chip many-core hardware.
Abstract: Recent work demonstrates that interactive ray tracing is possible on desktop systems, but there is still much debate as to how to most efficiently support advanced visual effects such as soft shadows, smooth freeform surfaces, complex shading, and animated scenes. With these challenges in mind, we reconsider the options for designing a rendering system and present Razor, a new software rendering architecture for distribution ray tracing designed to produce high-quality images with high performance on future single-chip many-core hardware. Razor includes two noteworthy capabilities: a set of techniques for quickly building the kd-tree acceleration structure on demand every frame from a scene graph and a system design that allows for crack-free multiresolution geometry with each ray independently choosing its geometry resolution. Razor's per-frame kd-tree build is designed to robustly handle arbitrarily scene animation, while its per-ray multiresolution geometry provides continuous level of detail driven by ray and path differentials. Razor also decouples shading from visibility computations using a two-phase shading scheme inspired by the REYES system, and caches tessellated representations of freeform surfaces at multiple levels of detail.We present experimental results gathered from a prototype system implemented on eight CPU cores, and discuss which aspects of the system are most successful and which would benefit from further investigation.
TL;DR: JBricks as mentioned in this paper is a Java toolkit that integrates a high-quality 2D graphics rendering engine and a versatile input configuration module into a coherent framework, enabling the exploratory prototyping of interaction techniques and rapid development of post-WIMP applications running on cluster-driven interactive visualization platforms.
Abstract: Research on cluster-driven wall displays has mostly focused on techniques for parallel rendering of complex 3D models. There has been comparatively little research effort dedicated to other types of graphics and to the software engineering issues that arise when prototyping novel interaction techniques or developing full-featured applications for such displays. We present jBricks, a Java toolkit that integrates a high-quality 2D graphics rendering engine and a versatile input configuration module into a coherent framework, enabling the exploratory prototyping of interaction techniques and rapid development of post-WIMP applications running on cluster-driven interactive visualization platforms.
TL;DR: Cross-environment rendering and user interaction support provide a seamless computing experience in a multi-operating system computing environment as mentioned in this paper. But it is difficult to implement cross-application rendering and interaction support on mobile devices.
Abstract: Cross-environment rendering and user interaction support provide a seamless computing experience in a multi-operating system computing environment. Cross-environment rendering provides real-time display of applications running in a mobile operating system to be displayed within an environment of a desktop operating system. The mobile operating system and the desktop operating system may be running concurrently and independently on a shared kernel of a mobile computing device. A graphics server of the mobile operating system tears down and rebuilds the rendering context for each application as it composites the surface information. The rendering context may be established to match the resolution of the associated display, so that graphics will be appropriately rendered for that resolution. The mobile computing device may be a smartphone running the Android mobile operating system and a full desktop Linux distribution on a modified Android kernel.
TL;DR: Disclosed as mentioned in this paper is a system for producing images including techniques for reducing the memory and processing power required for such operations, and provides techniques for programmatically representing a graphics problem with consideration of the system resources such as the availability of a compatible GPU.
Abstract: Disclosed is a system for producing images including techniques for reducing the memory and processing power required for such operations. The system provides techniques for programmatically representing a graphics problem. The system further provides techniques for reducing and optimizing graphics problems for rendering with consideration of the system resources, such as the availability of a compatible GPU.
TL;DR: An optimized pruning algorithm is presented that allows for considerable geometry reduction in large botanical scenes while maintaining high and coherent rendering quality and the use of Precision and Recall (PR) as a measure of quality to rendering is introduced and how PR scores can be used to predict better scaling values.
Abstract: We present an optimized pruning algorithm that allows for considerable geometry reduction in large botanical scenes while maintaining high and coherent rendering quality. We improve upon previous techniques by applying model-specific geometry reduction functions and optimized scaling functions. For this we introduce the use of Precision and Recall (PR) as a measure of quality to rendering and show how PR-scores can be used to predict better scaling values. We conducted a user-study letting subjects adjust the scaling value, which shows that the predicted scaling matches the preferred ones. Finally, we extend the originally purely stochastic geometry prioritization for pruning to account for view-optimized geometry selection, which allows to take global scene information, such as occlusion, into consideration. We demonstrate our method for the rendering of scenes with thousands of complex tree models in real-time.
TL;DR: A pure web browser based medical imaging system that requires no installation of application software or any browser plug-in and functions in the same way as traditional full-blown medical imaging PACS (Picture Archiving and Communication Systems) viewer is presented in this paper.
Abstract: A pure web browser based medical imaging system that requires no installation of application software or any browser plug-in and functions in the same way as traditional full blown medical imaging PACS (Picture Archiving and Communication Systems) viewer fat clients. In addition, the system intelligently distributes the computing tasks of image rendering between browser and servers from complete server-side rendering to complete client-side rendering and anything between. It comprises a JavaScript medical image rendering library that can process original DICOM (Digital Imaging and Communications in Medicine) data sets and all standard web images at pixel level, a medical imaging server and a rendering load balancing component that can dynamically split the rendering computing from server to client according to their capabilities.
TL;DR: Practical Rendering and Computation with Direct3D 11 helps you understand the best way to accomplish a given task and thereby fully leverage the potential capabilities of Direct3d 11.
Abstract: Direct3D 11 offers such a wealth of capabilities that users can sometimes get lost in the details of specific APIs and their implementation. While there is a great deal of low-level information available about how each API function should be used, there is little documentation that shows how best to leverage these capabilities. Written by active members of the Direct3D community, Practical Rendering and Computation with Direct3D 11 provides a deep understanding of both the high and low level concepts related to using Direct3D 11. The first part of the book presents a conceptual introduction to Direct3D 11, including an overview of the Direct3D 11 rendering and computation pipelines and how they map to the underlying hardware. It also provides a detailed look at all of the major components of the library, covering resources, pipeline details, and multithreaded rendering. Building upon this material, the second part of the text includes detailed examples of how to use Direct3D 11 in common rendering scenarios. The authors describe sample algorithms in-depth and discuss how the features of Direct3D 11 can be used to your advantage. All of the source code from the book is accessible on an actively maintained open source rendering framework. The sample applications and the framework itself can be downloaded from http://hieroglyph3.codeplex.com By analyzing when to use various tools and the tradeoffs between different implementations, this book helps you understand the best way to accomplish a given task and thereby fully leverage the potential capabilities of Direct3D 11.
TL;DR: A geometric measurement, Effective Sampling Density of the scene, referred to as effective sampling for brevity, is presented for objective comparison and evaluation of LFR algorithms.
Abstract: Light field rendering (LFR) is an active research area in computer vision and computer graphics. LFR plays a crucial role in free viewpoint video systems (FVV). Several rendering algorithms have been suggested for LFR. However, comparative evaluation of these methods is often limited to subjective assessment of the output. To overcome this problem, this paper presents a geometric measurement, Effective Sampling Density of the scene, referred to as effective sampling for brevity, for objective comparison and evaluation of LFR algorithms. We have derived the effective sampling for the well-known LFR methods. Both theoretical study and numerical simulation have shown that the proposed effective sampling is an effective indicator of the performance for LFR methods.
TL;DR: A method to display a large amount of superquadric glyphs is presented and its use for visualization of measured secondorder tensor data in diffusion tensor imaging (DTI) and to stress and strain tensors of computational fluid dynamic and material simulations is demonstrated.
Abstract: Graphics hardware is advancing very fast and offers new possibilities to programmers. The new features can be used in scientific visualization to move calculations from the CPU to the graphics processing unit (GPU). This is useful especially when mixing CPU intense calculations with on the fly visualization of intermediate results. We present a method to display a large amount of superquadric glyphs and demonstrate its use for visualization of measured secondorder tensor data in diffusion tensor imaging (DTI) and to stress and strain tensors of computational fluid dynamic and material simulations.
TL;DR: This paper introduces a new technique, called peer-assisted rendering, that aims to enable interactive navigation in a 3D networked virtual environment using a resource-constrained device, by speeding up the rendering.
Abstract: This paper introduces a new technique, called peer-assisted rendering, that aims to enable interactive navigation in a 3D networked virtual environment using a resource-constrained device, by speeding up the rendering. A resource-constrained client requests part of the rendered scenes from other peers with similar viewpoints within the virtual environment, and merges the rendered parts into its own view. This approach is more scalable than previous solutions based on server-based pre-rendering. The goal of this paper is to make a strong case for the feasibility of peer-assisted rendering through the following two messages. First, by analyzing a large number of user traces from a popular virtual world called Second Life, we show that there are surprisingly many users with similar viewpoints and encompass large number of common objects in their viewing areas, indicating that a client can potentially find multiple other peers that can assist in rendering. Second, by combining three different rendering methods, each contributing to rendering of different classes of objects in the scene, we show that it is possible for a client to render the scene efficiently with little visual artifacts.
TL;DR: This work proposes a dynamic shader pipeline based on the SuperShader concept and illustrates the design decisions, and demonstrates the usage and the usefulness of the framework with implementations of dynamic rendering effects for medical applications.
Abstract: In this paper, we present a rapid prototyping framework for GPU-based volume rendering. Therefore, we propose a dynamic shader pipeline based on the SuperShader concept and illustrate the design decisions. Also, important requirements for the development of our system are presented. In our approach, we break down the rendering shader into areas containing code for different computations, which are defined as freely combinable, modularized shader blocks. Hence, high-level changes of the rendering configuration result in the implicit modification of the underlying shader pipeline. Furthermore, the prototyping system allows inserting custom shader code between shader blocks of the pipeline at run-time. A suitable user interface is available within the prototyping environment to allow intuitive modification of the shader pipeline. Thus, appropriate solutions for visualization problems can be interactively developed. We demonstrate the usage and the usefulness of our framework with implementations of dynamic rendering effects for medical applications.
TL;DR: Cross-segment load balancing is introduced which efficiently assigns all available shared graphics resources to all display output segments with dynamical task partitioning to improve performance in parallel rendering.
Abstract: With faster graphics hardware comes the possibility to realize even more complicated applications that require more detailed data and provide better presentation. The processors keep being challenged with bigger amount of data and higher resolution outputs, requiring more research in the parallel/distributed rendering domain. Optimizing resource usage to improve throughput is one important topic, which we address in this article for multi-display applications, using the Equalizer parallel rendering framework. This paper introduces and analyzes cross-segment load balancing which efficiently assigns all available shared graphics resources to all display output segments with dynamical task partitioning to improve performance in parallel rendering
TL;DR: A GPU-based approach for lightfield processing and rendering is described, with which it is able to achieve interactive performance for focused plenoptic rendering tasks such as refocusing and novel-view generation and is validated with experimental results on commercially available GPU hardware.
Abstract: Processing and rendering of plenoptic camera data requires significant computational power and memory bandwidth. At
the same time, real-time rendering performance is highly desirable so that users can interactively explore the infinite
variety of images that can be rendered from a single plenoptic image. In this paper we describe a GPU-based approach
for lightfield processing and rendering, with which we are able to achieve interactive performance for focused plenoptic
rendering tasks such as refocusing and novel-view generation. We present a progression of rendering approaches for
focused plenoptic camera data and analyze their performance on popular GPU-based systems. Our analyses are validated
with experimental results on commercially available GPU hardware. Even for complicated rendering algorithms, we are
able to render 39Mpixel plenoptic data to 2Mpixel images with frame rates in excess of 500 frames per second.
TL;DR: In this paper, a method for graphics rendering adaptation by a server that includes a graphics rendering engine that generates graphic video data and provides the video data via a communication resource to a client is presented.
Abstract: A method for graphics rendering adaptation by a server that includes a graphics rendering engine that generates graphic video data and provides the video data via a communication resource to a client. The method includes monitoring one or both of communication and computation constraint conditions associated with the graphics rendering engine and the communication resource. At least one rendering parameter used by the graphics rendering engine is set based upon a level of communication constraint or computation constraint. Monitoring and setting are repeated to adapt rendering based upon changes in one or both of communication and computation constraints. In preferred embodiments, encoding adaptation also responds to bit rate constraints and rendering is optimized based upon a given bit rate. Rendering parameters and their effect on communication and computation costs have been determined and optimized. A preferred application is for a gaming processor miming on a cloud based or data center server that services mobile clients over a wireless network for graphics intensive applications, such as massively multi-player online role playing games, or augmented reality.
TL;DR: In this article, a rendering control apparatus reads the data indicated by the detailed information of the rendering object according to the rendering order, and transfers the read data to a GPU, only the data not in common with the data already transferred to the GPU is read and transferred.
Abstract: PROBLEM TO BE SOLVED: To perform an efficient rendering process having high responsiveness, in a rendering system that provides a game screen for one or more client devices.SOLUTION: For each of a plurality of rendering objects to be used to generate a screen to be provided for a client device, identification information and detailed information indicating data necessary for rendering are acquired. By referring to the detailed information of each of the plurality of rendering objects, the rendering order of all the rendering objects is determined so as to allocate consecutive ordinal numbers to rendering objects having at least partial data indicated by the detailed information in common. A rendering control apparatus reads the data indicated by the detailed information of the rendering object according to the rendering order, and transfers the read data to a GPU. In this process, among the data indicated by the detailed information of the rendering objects that are consecutive in the rendering order, only the data not in common with the data already transferred to the GPU is read and transferred.
TL;DR: This paper presents an innovative real-time rendering approach of terrains relying on material stacks, based on a LoD hierarchy for material-stacks, which achieves real- time frame rates at high resolutions.
Abstract: Usually, terrain rendering relies on a 2D regular grid of height values, the so called height field. Height fields describe 2.5D surfaces and are not able to present complex 3D terrain features. In contrast, a 3D data representation quickly exceeds the available memory resources. To overcome this problem we apply material stacks. Material stacks combine the simplicity of 2D height fields and the extended modeling capabilities of 3D volumetric data. However, this approach requires expensive rendering and is difficult to realize in real-time. In this paper we present an innovative real-time rendering approach of terrains relying on material stacks. Our approach is based on two major steps: First, a LoD hierarchy for material-stacks is generated. Second, during rendering a multi-staged quadrangulation pipeline extracts terrain surface from the material stacks. As a result, we achieve real-time frame rates at high resolutions.
TL;DR: A novel leading digit law based method to identify computer graphics is proposed, where statistics of the most significant digits are extracted from image’s Discrete Cosine Transform coefficients and magnitudes of image's gradient, and then the Support Vector Machine (SVM) based classifiers are built.
Abstract: As the advent and growing popularity of image rendering software, photorealistic computer graphics are becoming more and more perceptually indistinguishable from photographic images. If the faked images are abused, it may lead to potential social, legal or private consequences. To this end, it is very necessary and also challenging to find effective methods to differentiate between them. In this paper, a novel leading digit law, also called Benford’s law, based method to identify computer graphics is proposed. More specifically, statistics of the most significant digits are extracted from image’s Discrete Cosine Transform (DCT) coefficients and magnitudes of image’s gradient, and then the Support Vector Machine (SVM) based classifiers are built. Results of experiments on the image datasets indicate that the proposed method is comparable to prior works. Besides, it possesses low dimensional features and low computational complexity.
TL;DR: A low computational method to perform ROI (Region Of Interest) based video encoding and adaptive streaming for remote rendering applications to minimize the latency in the interactive loop even when facing poor transmission conditions.
Abstract: This paper proposes a low computational method to perform ROI (Region Of Interest) based video encoding and adaptive streaming for remote rendering applications. The main objective of the proposed solution is to minimize the latency in the interactive loop even when facing poor transmission conditions. In order to do that, the knowledge of the depth map information provided by the rendering engine is exploited by the real-time video encoder to adapt the bitrate of the transmitted stream. Especially, thanks to an efficient coupling between the rendering and the video encoding stages, the macroblocks of each video frame are encoded with different quantization steps that follow an ROI partitioning. The details of this partitioning algorithm are provided as well with some implementation considerations. The simulation results demonstrate the benefit of our adaptive approach from the user experience point of view.
TL;DR: The proposed metric is novel in combining both latency and rendering quality into one score for measurement, and experiments validate that in many scenarios it can effectively distinguish the performance difference between systems while interaction latency can not.
Abstract: A new metric distortion over latency (DOL) is proposed in this paper to overcome the deficiency of the traditional metric interaction latency in measuring the interactive performance of the modern remote rendering systems, which are enhanced with different latency reduction techniques. The proposed metric is novel in combining both latency and rendering quality into one score for measurement. Our experiments validate that in many scenarios, our new metric can effectively distinguish the performance difference between systems while interaction latency can not. The paper also introduces how DOL can be efficiently calculated at runtime.
TL;DR: In this paper, a rendering format for a set of video frames is established and a graphics component, which is coupled to a graphics device and associated with an unsupported file type, is identified.
Abstract: The subject disclosure is directed towards providing a web application with access to hardware accelerated graphics. A rendering format for a set of video frames is established. A graphics component, which is coupled to a graphics device and associated with an unsupported file type, is identified. The graphics component generates image data compromising the hardware accelerated graphics. When the web application requests a set of video frames, the image data is transformed into the set of video frames in accordance with the format. Then, the set of frames is communicated to a display device.
TL;DR: In this paper, a system and method for a pre-print, 3D virtual rendering of a print piece is described, where a plurality of modular/pipelined architectural layers are managed, operated, and organized by a controller.
Abstract: A system and method for a pre-print, three-dimensional virtual rendering of a print piece is disclosed. A plurality of modular/pipelined architectural layers are managed, operated, and organized by a controller. A product definition is provided to a job ticket adaptation layer where it is transformed into a physical model. The physical model is then transformed into a display model via the product model layer. The display model is transformed into a scene that can be displayed on a graphical user interface as a three dimensional virtual rendering by a rendering layer, where the rendering includes one or more binding elements to satisfy the product definition. The modularity further enables different product description formats to be supported by only altering the job ticket adaption layer, and that different graphics rendering engines can be supported by altering only the rendering layer.
TL;DR: Using a combination of the highly parallel programming architecture CUDA and a graphics API, this work has achieved a real-time performance operating on 1080p HD multi-view video with a rendering quality that is comparable to the software implementation.
Abstract: Multi-view 3D may succeed stereo 3DTV in multimedia and TV applications. To this end, the MPEG committee has installed a special task force to establish a standard for multi-view 3D coding. One focal point of our research work is on an efficient implementation of the rendering part of such a multi-view 3D system, because it is a computationally expensive task and it determines the final reconstructing quality. Our free-viewpoint DIBR algorithm is implemented with an off-the-shelf GPU that can be integrated in advanced 3DTV systems. We present the principal steps of a representative free-viewpoint DIBR and show the key differences between the reference software and our GPU implementation. One of those differences is the joint execution of signal processing blocks to share memory usage. Using a combination of the highly parallel programming architecture CUDA and a graphics API, we have achieved a real-time performance operating on 1080p HD multi-view video with a rendering quality that is comparable to the software implementation.
TL;DR: This project is a framework designed to merge advanced 3D graphics with Virtual Reality interfaces in order to create an appropriate environment to study and learn relativity as well as to develop some intuition of the relativistic effects and the quadri-dimensional reality of space-time.
Abstract: Relativity, as introduced by Einstein, is regarded as one of the most important revolutions in the history of physics. Nevertheless, the observation of direct outcomes of this theory on mundane objects is impossible because they can only be witnessed when relative velocities close the speed of light are involved. These effects are so counterintuitive and contradicting with our daily understanding of space and time that physics students find it hard to learn Special Relativity beyond mathematical equations and to understand the deep implications of the theory.
Although we cannot travel at the speed of light for real, Virtual Reality makes it possible to experiment the effects of relativity in a 3D immersive environment. Our project is a framework designed to merge advanced 3D graphics with Virtual Reality interfaces in order to create an appropriate environment to study and learn relativity as well as to develop some intuition of the relativistic effects and the quadri-dimensional reality of space-time.
In this paper, we focus on designing and implementing an easy-to-use game-like application : a carom billiard. Our implementation includes relativistic effects in an innovative graphical rendering engine and a non-Newtonian physics engine to treat the collisions.
The innovation of our approach lies in the ability i) to render in real-time several relativistic objects, each moving with a different velocity vector (contrary to what was achieved in previous works), ii) to allow for interactions between objects, and iii) to enable the user to interact with the objects and modify the scene.
To achieve this, we implement the 4D nature of space-time directly at the heart of the rendering engine, and develop an algorithm allowing to access non-simultaneous past events that are visible to the observers at their specific locations and at a given instant of their proper time. We explain how to retrieve the collision event between the pucks and the cushions of the billiard game and we show several counterintuitive results for very fast pucks. The effectiveness of the approach is demonstrated with snapshots of videos where several independent objects travel at velocities close to the speed of light, c.
TL;DR: This paper examines a simple orthogonal compression approach that is mostly neglected: adapting the level-of-precision (LOP) of vertex data, and finds that it compresses vertex positions by about 70 % on average without loss in rendering performance or image quality.
Abstract: Video memory is a valuable resource that has grown much slower than the rendering power of GPUs over the last years. Today, video memory is often the limiting factor in interactive high-quality rendering applications. The most often used solution to reduce memory consumption is to apply level-of-detail (LOD) methods: only a simplified version of the mesh with less vertices and triangles is kept in memory. In this paper we examine a simple orthogonal compression approach that is mostly neglected: adapting the level-of-precision (LOP) of vertex data. The main idea is to quantize vertex positions according to the current view distance, and adapt precision by adding or removing single bit planes. We provide an analysis of the resulting image error, and show that visual artifacts can be avoided by simply constraining the quantization for critical vertices. Our approach allows both random access on vertex data as well as quickly switching between LOP. In experiments we found that our approach compresses vertex positions by about 70 % on average without loss in rendering performance or image quality.
TL;DR: This paper describes a decoupled parallel rendering approach and enable the two stages to execute in parallel and shows evidence that the performance of the method is much better than the coupled parallel rendering method.
Abstract: As the performance-price ratio of the GPU becomes higher, lots of systems are able to accommodate more than one GPU in node. Each GPU in node can afford powerful rendering ability. It is very important to effectively organize parallel rendering pipeline to fully exploit the compute units of the system. But lots of parallel rendering systems usually join hardware rendering stage with composition stage in the display thread and this frequently leads to GPU stall. In this paper, we describe a decoupled parallel rendering approach and enable the two stages to execute in parallel. With the frame buffer in the main memory, the full image rendering time is totally decided by the GPU rendering ability when the rendering task is large enough. Theoretical analysis and experiment results both evidence that the performance of our method is much better than the coupled parallel rendering method. We also test the scalability of the approach and get a linear performance speedup with the GPU number when the rendering task is large enough. The approach is easy to be implemented and any parallel rendering application can benefit from it.