Scispace (Formerly Typeset)
  1. Home
  2. Topics
  3. Rendering (computer graphics)
  4. 2024
  1. Home
  2. Topics
  3. Rendering (computer graphics)
  4. 2024
Showing papers on "Rendering (computer graphics) published in 2024"
Journal Article•10.38124/ijisrt/ijisrt24jul195•
Using Machine Learning to Identify Diseases and Perform Sorting in Apple Fruit

[...]

A Patidar, Abir Chakravorty
25 Jul 2024-International journal of innovative science and research technology
TL;DR: An innovative convolutional neural network architecture aimed at addressing challenges of detection and classification of apple fruit diseases is proposed and experimentally validated, achieving a remarkable classification accuracy of 95.37%.
Abstract: Fruit diseases play a major role in global agriculture, leading to substantial crop losses and influencing food production and economic stability. In this age of Industry 4.0 the fruit sorting is an important part in the food processing wherein this work plays a vital role. In this study, a solution for the detection and classification of apple fruit diseases is proposed and experimentally validated. Deep learning models offer promise for automating disease identification using fruit images, but encounter obstacles such as therequirement for extensive training data, computational complexity, and the risk of overfitting. This study introduces an innovative convolutional neural network (CNN) architecture aimed at addressing these challenges by incorporating a reduced number of layers, thus alleviating computational burdens while maintaining performance. Additionally, augmentation techniques such as shift, shear, scaling, zoom, and flipping are employed to diversify the training set without additional image acquisition. Our CNN model is specifically trained to identify common apple crop diseases like Scab, Rot, and Blotch. Rigorous experimental evaluation demonstrates the effectiveness ofour model, achieving a remarkable classification accuracy of 95.37%. Significantly, our model demonstrates reduced storage requirements and faster execution times compared to existing deep CNN architectures, enabling deployment on handheld devices and resource-limited environments. While other CNN models may offer similar accuracy levels, our approach emphasizes efficiency and resource optimization, rendering it practical for real-world applications in agriculture. Furthermore, our CNN model exhibits resilience to environmental variations and imaging parameters, enhancing its applicability across diverse agricultural settings. By leveraging advanced machine learning techniques, the approach developed in this experimental work contributes to modernizing fruits and vegetables sorting operations in food processing, crop management practices thus promoting agricultural sustainability. The scalability and portability of our model make it suitable for deployment in both small-scale farms and large-scale agricultural operations.

871 citations

Journal Article•10.1109/cvpr52733.2024.01920•
4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

[...]

Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, Xinggang Wang 
16 Jun 2024

119 citations

Journal Article•10.1109/cvpr52733.2024.00512•
SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering

[...]

Antoine Guédon, Vincent Lepetit
16 Jun 2024

55 citations

Journal Article•10.1109/cvpr52733.2024.01952•
Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

[...]

Tao Lü, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua Lin, Bo Dai 
16 Jun 2024

50 citations

Journal Article•10.1145/3658160•
A Hierarchical 3D Gaussian Representation for Real-Time Rendering of Very Large Datasets

[...]

Bernhard Kerbl, Andréas Meuleman, Georgios Kopanas, Michael Wimmer, Alexandre Lanvin, George Drettakis 
19 Jul 2024-ACM Transactions on Graphics
Abstract: Novel view synthesis has seen major advances in recent years, with 3D Gaussian splatting offering an excellent level of visual quality, fast training and real-time rendering. However, the resources needed for training and rendering inevitably limit the size of the captured scenes that can be represented with good visual quality. We introduce a hierarchy of 3D Gaussians that preserves visual quality for very large scenes, while offering an efficient Level-of-Detail (LOD) solution for efficient rendering of distant content with effective level selection and smooth transitions between levels.We introduce a divide-and-conquer approach that allows us to train very large scenes in independent chunks. We consolidate the chunks into a hierarchy that can be optimized to further improve visual quality of Gaussians merged into intermediate nodes. Very large captures typically have sparse coverage of the scene, presenting many challenges to the original 3D Gaussian splatting training method; we adapt and regularize training to account for these issues. We present a complete solution, that enables real-time rendering of very large scenes and can adapt to available resources thanks to our LOD method. We show results for captured scenes with up to tens of thousands of images with a simple and affordable rig, covering trajectories of up to several kilometers and lasting up to one hour. Project Page: https://repo-sam.inria.fr/fungraph/hierarchical-3d-gaussians/

18 citations

Journal Article•10.1016/j.automatica.2023.111305•
Robustifying event-triggered control to measurement noise

[...]

Koen J. A. Scheres, Romain Postoyan, W.P.M.H. Heemels
01 Jan 2024-Automatica
TL;DR: Noise-robust event-triggered control design based on set stabilization techniques. The approach guarantees Zeno-free static and dynamic triggering rules and a positive lower bound on the inter-event times.
Abstract: While many event-triggered control strategies are available in the literature, most of them are designed ignoring the presence of measurement noise. As measurement noise is omnipresent in practice and can have detrimental effects, for instance, by inducing Zeno behavior in the closed-loop system and with that the lack of a positive lower bound on the inter-event times, rendering the event-triggered control design practically useless, it is of great importance to address this gap in the literature. To do so, we present a general approach for set stabilization of (distributed) event-triggered control systems affected by additive measurement noise. It is shown that, under general conditions, Zeno-free static as well as dynamic triggering rules can be designed such that the closed-loop system satisfies an input-to-state practical set stability property. We ensure Zeno-freeness by proving the existence of a uniform strictly positive lower-bound on the minimum inter-event time. The general approach is applied to point stabilization and consensus problems as particular cases, where we show that, under similar assumptions as the original work, existing schemes can be redesigned to robustify them to measurement noise. Consequently, using this approach, noise-robust triggering conditions can be designed both from the ground up and by simple redesign of several important existing schemes. Simulation results are provided that illustrate the strengths of this novel approach.

14 citations

Journal Article•10.1109/cvpr52733.2024.00117•
ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering

[...]

Haokai Pang, Heming Zhu, Adam Kortylewski, Christian Theobalt, Marc Habermann 
16 Jun 2024

13 citations

Journal Article•10.1093/nar/gkae969•
MOBIDB in 2025: integrating ensemble properties and function annotations for intrinsically disordered proteins

[...]

Damiano Piovesan, Alessio Del Conte, Mahta Mehdiabadi, Maria Cristina Aspromonte, Matthias Blum, Giulio Tesei, Sören von Bülow, Kresten Lindorff‐Larsen, Silvio C. E. Tosatto 
29 Oct 2024-Nucleic Acids Research
TL;DR: MobiDB 2025 integrates ensemble properties and function annotations for intrinsically disordered proteins, enhancing structural and functional information with improved pipeline modules, faster predictions, and standardized annotation provenance, facilitating bulk downloads and intuitive interface.
Abstract: Abstract The MobiDB database (URL: https://mobidb.org/) aims to provide structural and functional information about intrinsic protein disorder, aggregating annotations from the literature, experimental data, and predictions for all known protein sequences. Here, we describe the improvements made to our resource to capture more information, simplify access to the aggregated data, and increase documentation of all MobiDB features. Compared to the previous release, all underlying pipeline modules were updated. The prediction module is ten times faster and can detect if a predicted disordered region is structurally extended or compact. The PDB component is now able to process large cryo-EM structures extending the number of processed entries. The entry page has been restyled to highlight functional aspects of disorder and all graphical modules have been completely reimplemented for better flexibility and faster rendering. The server has been improved to optimise bulk downloads. Annotation provenance has been standardised by adopting ECO terms. Finally, we propagated disorder function (IDPO and GO terms) from the DisProt database exploiting sequence similarity and protein embeddings. These improvements, along with the addition of comprehensive training material, offer a more intuitive interface and novel functional knowledge about intrinsic disorder.

13 citations

Journal Article•10.1109/icra57147.2024.10611537•
RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision

[...]

Mingjie Pan, Jiaming Liu, Renrui Zhang, Peixiang Huang, Xiaoqi Li, Hongwei Xie, Bing Wang, Li Liu, Li Du 
13 May 2024

12 citations

Journal Article•10.1109/tvcg.2023.3312127•
V4D: Voxel for 4D Novel View Synthesis

[...]

Wanshui Gan, Hongbin Xu, Yi Huang, Shifeng Chen, Naoto Yokoya 
01 Feb 2024-IEEE Transactions on Visualization and Computer Graphics
TL;DR: V4D: Voxel for 4D Novel View Synthesis achieves state-of-the-art performance with low computational cost by utilizing 3D voxel to model the 4D neural radiance field.
Abstract: Neural radiance fields have made a remarkable breakthrough in the novel view synthesis task at the 3D static scene. However, for the 4D circumstance (e.g., dynamic scene), the performance of the existing method is still limited by the capacity of the neural network, typically in a multilayer perceptron network (MLP). In this article, we utilize 3D Voxel to model the 4D neural radiance field, short as V4D, where the 3D voxel has two formats. The first one is to regularly model the 3D space and then use the sampled local 3D feature with the time index to model the density field and the texture field by a tiny MLP. The second one is in look-up tables (LUTs) format that is for the pixel-level refinement, where the pseudo-surface produced by the volume rendering is utilized as the guidance information to learn a 2D pixel-level refinement mapping. The proposed LUTs-based refinement module achieves the performance gain with little computational cost and could serve as the plug-and-play module in the novel view synthesis task. Moreover, we propose a more effective conditional positional encoding toward the 4D data that achieves performance gain with negligible computational burdens. Extensive experiments demonstrate that the proposed method achieves state-of-the-art performance at a low computational cost.

11 citations

Journal Article•10.1145/3613904.3642699•
An AI-Resilient Text Rendering Technique for Reading and Skimming Documents

[...]

Ziwei Gu, Ian Arawjo, Kenneth Li, Jonathan K. Kummerfeld, Elena L. Glassman 
11 May 2024
TL;DR: An AI-resilient text rendering technique for reading and skimming documents improves reading comprehension by making important information more salient.
Abstract: Readers find text difficult to consume for many reasons. Summarization can address some of these difficulties, but introduce others, such as omitting, misrepresenting, or hallucinating information, which can be hard for a reader to notice. One approach to addressing this problem is to instead modify how the original text is rendered to make important information more salient. We introduce Grammar-Preserving Text Saliency Modulation (GP-TSM), a text rendering method with a novel means of identifying what to de-emphasize. Specifically, GP-TSM uses a recursive sentence compression method to identify successive levels of detail beyond the core meaning of a passage, which are de-emphasized by rendering words in successively lighter but still legible gray text. In a lab study (n=18), participants preferred GP-TSM over pre-existing word-level text rendering methods and were able to answer GRE reading comprehension questions more efficiently.
Journal Article•10.1145/3659577•
Real-Time Neural Appearance Models

[...]

Tizian Zeltner, Fabrice Rousselle, Andrea Weidlich, Petrik Clarberg, Jan Novák, Benedikt Bitterli, Alex Evans, Tomáš Davidovič, Simon Kallweit, Aaron Lefohn 
20 Apr 2024-ACM Transactions on Graphics
TL;DR: Real-time neural appearance models enable film-quality visuals in real-time applications by leveraging learned hierarchical textures and neural decoders.
Abstract: We present a complete system for real-time rendering of scenes with complex appearance previously reserved for offline use. This is achieved with a combination of algorithmic and system level innovations. Our appearance model utilizes learned hierarchical textures that are interpreted using neural decoders, which produce reflectance values and importance-sampled directions. To best utilize the modeling capacity of the decoders, we equip the decoders with two graphics priors. The first prior—transformation of directions into learned shading frames—facilitates accurate reconstruction of mesoscale effects. The second prior—a microfacet sampling distribution—allows the neural decoder to perform importance sampling efficiently. The resulting appearance model supports anisotropic sampling and level-of-detail rendering, and allows baking deeply layered material graphs into a compact unified neural representation. By exposing hardware accelerated tensor operations to ray tracing shaders, we show that it is possible to inline and execute the neural decoders efficiently inside a real-time path tracer. We analyze scalability with increasing number of neural materials and propose to improve performance using code optimized for coherent and divergent execution. Our neural material shaders can be over an order of magnitude faster than non-neural layered materials. This opens up the door for using film-quality visuals in real-time applications such as games and live previews.
Journal Article•10.1109/cvpr52733.2024.00081•
Human Gaussian Splatting: Real-Time Rendering of Animatable Avatars

[...]

Arthur Moreau, Jifei Song, Helisa Dhamo, Richard A. Shaw, Yiren Zhou, Eduardo Pérez-Pellitero 
16 Jun 2024
Journal Article•10.1109/cvpr52733.2024.02045•
GS-IR: 3D Gaussian Splatting for Inverse Rendering

[...]

Zhihao Liang, Qi Zhang, Ying Feng, Ying Shan, Kui Jia 
16 Jun 2024
Journal Article•10.1109/cvpr52733.2024.01977•
Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering

[...]

Zhiwen Yan, Weng Fei Low, Yu Chen, Gim Hee Lee
16 Jun 2024
Journal Article•10.1109/tvcg.2024.3378692•
UPST-NeRF: Universal Photorealistic Style Transfer of Neural Radiance Fields for 3D Scene

[...]

Yaosen Chen, Qiangqiang Yuan, Zhiqiang Li, Yuegen Liu, Wei Wang, Chunyan Xie, Xuming Wen, Qien Yu 
01 Jan 2024-IEEE Transactions on Visualization and Computer Graphics
TL;DR: UPST-NeRF is a novel framework for photorealistic 3D scene stylization transfer that can generate photorealistic images from arbitrary novel views according to a given style image.
Abstract: Photorealistic stylization of 3D scenes aims to generate photorealistic images from arbitrary novel views according to a given style image, while ensuring consistency when rendering video from different viewpoints. Some existing stylization methods using neural radiance fields can effectively predict stylized scenes by combining the features of the style image with multi-view images to train 3D scenes. However, these methods generate novel view images that contain undesirable artifacts. In addition, they cannot achieve universal photorealistic stylization for a 3D scene. Therefore, a stylization image needs to retrain a 3D scene representation network based on a neural radiation field. We propose a novel photorealistic 3D scene stylization transfer framework to address these issues. It can realize photorealistic 3D scene style transfer with a 2D style image for novel view video rendering. We first pre-trained a 2D photorealistic style transfer network, which can satisfy the photorealistic style transfer between any content image and style image. Then, we use voxel features to optimize a 3D scene and obtain the geometric representation of the scene. Finally, we jointly optimize a hypernetwork to realize the photorealistic style transfer of arbitrary style images. In the transfer stage, we use a pre-trained 2D photorealistic network to constrain the photorealistic style of different views and different style images in the 3D scene. The experimental results show that our method not only realizes the 3D photorealistic style transfer of arbitrary style images, but also outperforms the existing methods in terms of visual quality and consistency. Project page:https://semchan.github.io/UPST_NeRF/.
Journal Article•10.48550/arxiv.2408.07967•
FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering

[...]

Guofeng Feng, Siyan Chen, Rong Fu, Zimu Liao, Yi Wang, Tao Liu, Z Pei, Hengjie Li, Xingcheng Zhang, Bo Dai 
15 Aug 2024
TL;DR: FlashGS, an open-source CUDA Python library, achieves 4x acceleration on mobile GPUs with reduced memory consumption through algorithmic and kernel-level optimizations, enhancing the efficiency of differentiable 3D Gaussian Splatting for large-scale and high-resolution rendering.
Abstract: This work introduces FlashGS, an open-source CUDA Python library, designed to facilitate the efficient differentiable rasterization of 3D Gaussian Splatting through algorithmic and kernel-level optimizations. FlashGS is developed based on the observations from a comprehensive analysis of the rendering process to enhance computational efficiency and bring the technique to wide adoption. The paper includes a suite of optimization strategies, encompassing redundancy elimination, efficient pipelining, refined control and scheduling mechanisms, and memory access optimizations, all of which are meticulously integrated to amplify the performance of the rasterization process. An extensive evaluation of FlashGS' performance has been conducted across a diverse spectrum of synthetic and real-world large-scale scenes, encompassing a variety of image resolutions. The empirical findings demonstrate that FlashGS consistently achieves an average 4x acceleration over mobile consumer GPUs, coupled with reduced memory consumption. These results underscore the superior performance and resource optimization capabilities of FlashGS, positioning it as a formidable tool in the domain of 3D rendering.
Journal Article•10.1093/bioinformatics/btae493•
ModDotPlot—rapid and interactive visualization of tandem repeats

[...]

Alexander P. Sweeten, Michael C. Schatz, Adam M. Phillippy
01 Aug 2024-Bioinformatics
TL;DR: Researchers introduce ModDotPlot, an interactive and alignment-free dot plot viewer, which rapidly visualizes genomic repeats via a k-mer-based containment index, outperforming StainedGlass in speed and accuracy, with a GUI for real-time navigation of entire chromosomes.
Abstract: Abstract Motivation A common method for analyzing genomic repeats is to produce a sequence similarity matrix visualized via a dot plot. Innovative approaches such as StainedGlass have improved upon this classic visualization by rendering dot plots as a heatmap of sequence identity, enabling researchers to better visualize multi-megabase tandem repeat arrays within centromeres and other heterochromatic regions of the genome. However, computing the similarity estimates for heatmaps requires high computational overhead and can suffer from decreasing accuracy. Results In this work, we introduce ModDotPlot, an interactive and alignment-free dot plot viewer. By approximating average nucleotide identity via a k-mer-based containment index, ModDotPlot produces accurate plots orders of magnitude faster than StainedGlass. We accomplish this through the use of a hierarchical modimizer scheme that can visualize the full 128 Mb genome of Arabidopsis thaliana in under 5 min on a laptop. ModDotPlot is bundled with a graphical user interface supporting real-time interactive navigation of entire chromosomes. Availability and implementation ModDotPlot is available at https://github.com/marbl/ModDotPlot.
Preprint•10.48550/arxiv.2406.13007•
NTIRE 2024 Challenge on Night Photography Rendering

[...]

Egor Ershov, Artyom Panshin, Oleg Karasev, S.A. Korchagin, Sergey Lev, Aleksandr V Startsev, Daniil Vladimirov, Ekaterina Zaychenkova, Nikola Banić, Dmitrii Iarchuk, M. A. Efimova, Radu Timofte, Arseniy Terekhin, Stephanie Yue, Yuyang Liu, Wei Mao, Lu Xu, Chao Zhang, Yasi Wang, Furkan Kınlı, Doğa Yılmaz, Barış Özcan, Furkan Kıraç, Shuai Liu, Jingyuan Xiao, Chaoyu Feng, Hao Wang, Guangqi Shao, Yuqian Zhang, Yibin Huang, Luo Wei, Liming Wang, Xiaotao Wang, Lei Lei, Simone Zini, Claudio Rota, Marco Buzzelli, Simone Bianco, Raimondo Schettini, Jun Guo, Tianli Liu, Min Wu, Ben Shao, Qirui Yang, Xianghui Li, Qi Cheng, F. Zhang, Zhiqiang Xu, Jingyu Yang, Huanjing Yue 
18 Jun 2024
TL;DR: Night photography rendering challenge results reviewed. Top solutions showcase state-of-the-art in nighttime photography rendering.
Abstract: This paper presents a review of the NTIRE 2024 challenge on night photography rendering. The goal of the challenge was to find solutions that process raw camera images taken in nighttime conditions, and thereby produce a photo-quality output images in the standard RGB (sRGB) space. Unlike the previous year's competition, the challenge images were collected with a mobile phone and the speed of algorithms was also measured alongside the quality of their output. To evaluate the results, a sufficient number of viewers were asked to assess the visual quality of the proposed solutions, considering the subjective nature of the task. There were 2 nominations: quality and efficiency. Top 5 solutions in terms of output quality were sorted by evaluation time (see Fig. 1). The top ranking participants' solutions effectively represent the state-of-the-art in nighttime photography rendering. More results can be found at https://nightimaging.org.
Journal Article•10.1109/tpami.2024.3387307•
PERF: Panoramic Neural Radiance Field from a Single Panorama

[...]

Guangcong Wang, Peng Wang, Zhaoxi Chen, Wenping Wang, Chen Change Loy, Ziwei Liu 
01 Jan 2024-IEEE Transactions on Pattern Analysis and Machine Intelligence
TL;DR: **PERF** is a novel view synthesis framework that trains a panoramic neural radiance field from a single panorama. It allows 3D roaming in a complex scene without expensive and tedious image collection.
Abstract: Neural Radiance Field (NeRF) has achieved substantial progress in novel view synthesis given multi-view images. Recently, some works have attempted to train a NeRF from a single image with 3D priors. They mainly focus on a limited field of view with a few occlusions, which greatly limits their scalability to real-world 360-degree panoramic scenarios with large-size occlusions. In this paper, we present PERF , a 360-degree novel view synthesis framework that trains a panoramic neural radiance field from a single panorama. Notably, PERF allows 3D roaming in a complex scene without expensive and tedious image collection. To achieve this goal, we propose a novel collaborative RGBD inpainting method and a progressive inpainting-and-erasing method to lift up a 360-degree 2D scene to a 3D scene. Specifically, we first predict a panoramic depth map as initialization given a single panorama and reconstruct visible 3D regions with volume rendering. Then we introduce a collaborative RGBD inpainting approach into a NeRF for completing RGB images and depth maps from random views, which is derived from an RGB Stable Diffusion model and a monocular depth estimator. Finally, we introduce an inpainting-and-erasing strategy to avoid inconsistent geometry between a newly-sampled view and reference views. The two components are integrated into the learning of NeRFs in a unified optimization framework and achieve promising results. Extensive experiments on Replica and a new dataset PERF-in-the-wild demonstrate the superiority of our PERF over state-of-the-art methods. Our PERF can be widely used for real-world applications, such as panorama-to-3D, text-to-3D, and 3D scene stylization applications. Project page and code are available at https://github.com/perf-project/PeRF .
Journal Article•10.1016/j.cag.2023.11.005•
Efficient ray sampling for radiance fields reconstruction

[...]

Shilei Sun, Ming Liu, Zhongyi Fan, Qingliang Jiao, Yuxue Liu, Liquan Dong, Lingqin Kong 
01 Feb 2024-Computers & graphics
TL;DR: Efficient ray sampling for radiance fields reconstruction accelerates training by reducing the number of rays while maintaining photorealistic rendering quality.
Abstract: Accelerating the training process of neural radiance field holds substantial practical value. The ray sampling strategy profoundly influences the convergence of this neural network. Therefore, more efficient ray sampling can directly augment the training efficiency of existing NeRF models. We propose a novel ray sampling approach for neural radiance field that improves training efficiency while retaining photorealistic rendering results. First, we analyze the relationship between the pixel loss distribution of sampled rays and rendering quality. This reveals redundancy in the original NeRF's uniform ray sampling. Guided by this finding, we develop a sampling method leveraging pixel regions and depth boundaries. Our main idea is to sample fewer rays in training views, yet with each ray more informative for scene fitting. Sampling probability increases in pixel areas exhibiting significant color and depth variation, greatly reducing wasteful rays from other regions without sacrificing precision. Through this method, not only can the convergence of the network be accelerated, but the spatial geometry of a scene can also be perceived more accurately. Rendering outputs are enhanced, especially for texture-complex regions. Experiments demonstrate that our method significantly outperforms state-of-the-art techniques on public benchmark datasets.
Proceedings Article•10.1609/aaai.v38i3.27946•
Fine-Grained Multi-View Hand Reconstruction Using Inverse Rendering

[...]

Qijun Gan, Wentong Li, Jinwei Ren, Jianke Zhu
24 Mar 2024-Proceedings of the ... AAAI Conference on Artificial Intelligence
TL;DR: Fine-grained multi-view hand reconstruction using inverse rendering achieves accurate hand mesh and texture reconstruction with improved rendering quality and robustness.
Abstract: Reconstructing high-fidelity hand models with intricate textures plays a crucial role in enhancing human-object interaction and advancing real-world applications. Despite the state-of-the-art methods excelling in texture generation and image rendering, they often face challenges in accurately capturing geometric details. Learning-based approaches usually offer better robustness and faster inference, which tend to produce smoother results and require substantial amounts of training data. To address these issues, we present a novel fine-grained multi-view hand mesh reconstruction method that leverages inverse rendering to restore hand poses and intricate details. Firstly, our approach predicts a parametric hand mesh model through Graph Convolutional Networks (GCN) based method from multi-view images. We further introduce a novel Hand Albedo and Mesh (HAM) optimization module to refine both the hand mesh and textures, which is capable of preserving the mesh topology. In addition, we suggest an effective mesh-based neural rendering scheme to simultaneously generate photo-realistic image and optimize mesh geometry by fusing the pre-trained rendering network with vertex features. We conduct the comprehensive experiments on InterHand2.6M, DeepHandMesh and dataset collected by ourself, whose promising results show that our proposed approach outperforms the state-of-the-art methods on both reconstruction accuracy and rendering quality. Code and dataset are publicly available at https://github.com/agnJason/FMHR.
Journal Article•10.1109/cvpr52733.2024.01866•
HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting

[...]

Yuheng Jiang, Zhehao Shen, Penghao Wang, Zhuo Su, Yu Hong, Yingliang Zhang, Jingyi Yu, Lan Xu 
16 Jun 2024
Journal Article•10.1109/cvpr52733.2024.01411•
NeuRAD: Neural Rendering for Autonomous Driving

[...]

Adam Tonderski, Carl Lindström, Georg Hess, William Ljungbergh, Lennart Svensson, Christoffer Petersson 
16 Jun 2024
Journal Article•10.3390/app14156543•
DExter: Learning and Controlling Performance Expression with Diffusion Models

[...]

Huan Zhang, Shreyan Chowdhury, Carlos Cancino-Chacón, Jinhua Liang, Simon Dixon, Gerhard Widmer 
26 Jul 2024-Applied Sciences
TL;DR: DExter is a new approach leveraging diffusion probabilistic models to render Western classical piano performances that enables the generation of interpretations guided by perceptually meaningful features by being jointly conditioned on score and perceptual-feature representations.
Abstract: In the pursuit of developing expressive music performance models using artificial intelligence, this paper introduces DExter, a new approach leveraging diffusion probabilistic models to render Western classical piano performances. The main challenge faced in performance rendering tasks is the continuous and sequential modeling of expressive timing and dynamics over time, which is critical for capturing the evolving nuances that characterize live musical performances. In this approach, performance parameters are represented in a continuous expression space, and a diffusion model is trained to predict these continuous parameters while being conditioned on a musical score. Furthermore, DExter also enables the generation of interpretations (expressive variations of a performance) guided by perceptually meaningful features by being jointly conditioned on score and perceptual-feature representations. Consequently, we find that our model is useful for learning expressive performance, generating perceptually steered performances, and transferring performance styles. We assess the model through quantitative and qualitative analyses, focusing on specific performance metrics regarding dimensions like asynchrony and articulation, as well as through listening tests that compare generated performances with different human interpretations. The results show that DExter is able to capture the time-varying correlation of the expressive parameters, and it compares well to existing rendering models in subjectively evaluated ratings. The perceptual-feature-conditioned generation and transferring capabilities of DExter are verified via a proxy model predicting perceptual characteristics of differently steered performances.
Journal Article•10.1111/cgf.15012•
TRIPS: Trilinear Point Splatting for Real‐Time Radiance Field Rendering

[...]

Linus Franke, Darius Rückert, Laura Fink, Marc Stamminger
30 Apr 2024-Computer Graphics Forum
TL;DR: TRIPS is a novel point-based radiance field rendering technique that achieves high-quality image synthesis while maintaining real-time performance. It combines ideas from Gaussian Splatting and ADOP, leveraging a screen-space image pyramid and a lightweight neural network to generate crisp images with minimal artifacts.
Abstract: Abstract Point‐based radiance field rendering has demonstrated impressive results for novel view synthesis, offering a compelling blend of rendering quality and computational efficiency. However, also latest approaches in this domain are not without their shortcomings. 3D Gaussian Splatting [KKLD23] struggles when tasked with rendering highly detailed scenes, due to blurring and cloudy artifacts. On the other hand, ADOP [RFS22] can accommodate crisper images, but the neural reconstruction network decreases performance, it grapples with temporal instability and it is unable to effectively address large gaps in the point cloud. In this paper, we present TRIPS (Trilinear Point Splatting), an approach that combines ideas from both Gaussian Splatting and ADOP. The fundamental concept behind our novel technique involves rasterizing points into a screen‐space image pyramid, with the selection of the pyramid layer determined by the projected point size. This approach allows rendering arbitrarily large points using a single trilinear write. A lightweight neural network is then used to reconstruct a hole‐free image including detail beyond splat resolution. Importantly, our render pipeline is entirely differentiable, allowing for automatic optimization of both point sizes and positions. Our evaluation demonstrate that TRIPS surpasses existing state‐of‐the‐art methods in terms of rendering quality while maintaining a real‐time frame rate of 60 frames per second on readily available hardware. This performance extends to challenging scenarios, such as scenes featuring intricate geometry, expansive landscapes, and auto‐exposed footage. The project page is located at: https://lfranke.github.io/trips
Proceedings Article•10.1609/aaai.v38i17.29833•
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling

[...]

Rui Liu, Yifan Hu, Yi Ren, Xiang Yin, Haizhou Li 
24 Mar 2024-Proceedings of the ... AAAI Conference on Artificial Intelligence
TL;DR: The proposed ECSS model effectively enhances emotion rendering in conversational speech synthesis by leveraging heterogeneous graph-based context modeling and contrastive learning-based emotion renderer.
Abstract: Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting. While recognising the significance of CSS task, the prior studies have not thoroughly investigated the emotional expressiveness problems due to the scarcity of emotional conversational datasets and the difficulty of stateful emotion modeling. In this paper, we propose a novel emotional CSS model, termed ECSS, that includes two main components: 1) to enhance emotion understanding, we introduce a heterogeneous graph-based emotional context modeling mechanism, which takes the multi-source dialogue history as input to model the dialogue context and learn the emotion cues from the context; 2) to achieve emotion rendering, we employ a contrastive learning-based emotion renderer module to infer the accurate emotion style for the target utterance. To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity, and annotate additional emotional information on the existing conversational dataset (DailyTalk). Both objective and subjective evaluations suggest that our model outperforms the baseline models in understanding and rendering emotions. These evaluations also underscore the importance of comprehensive emotional annotations. Code and audio samples can be found at: https://github.com/walker-hyf/ECSS.
Journal Article•10.48550/arxiv.2407.00435•
RTGS: Enabling Real-Time Gaussian Splatting on Mobile Devices Using Efficiency-Guided Pruning and Foveated Rendering

[...]

Weikai Lin, Yu Feng, Yuhao Zhu
29 Jun 2024
TL;DR: This paper proposes RTGS, a real-time Point-Based Neural Rendering system for mobile devices, combining efficiency-guided pruning and foveated rendering to achieve above 100 FPS while maintaining human visual quality.
Abstract: Point-Based Neural Rendering (PBNR), i.e., the 3D Gaussian Splatting-family algorithms, emerges as a promising class of rendering techniques, which are permeating all aspects of society, driven by a growing demand for real-time, photorealistic rendering in AR/VR and digital twins. Achieving real-time PBNR on mobile devices is challenging. This paper proposes RTGS, a PBNR system that for the first time delivers real-time neural rendering on mobile devices while maintaining human visual quality. RTGS combines two techniques. First, we present an efficiency-aware pruning technique to optimize rendering speed. Second, we introduce a Foveated Rendering (FR) method for PBNR, leveraging humans' low visual acuity in peripheral regions to relax rendering quality and improve rendering speed. Our system executes in real-time (above 100 FPS) on Nvidia Jetson Xavier board without sacrificing subjective visual quality, as confirmed by a user study. The code is open-sourced at [https://github.com/horizon-research/Fov-3DGS].
Journal Article•10.1145/3651290•
ProteusNeRF: Fast Lightweight NeRF Editing using 3D-Aware Image Context

[...]

Binglun Wang, Niladri Shekhar Dutt, Niloy J. Mitra
11 May 2024-Proceedings of the ACM on computer graphics and interactive techniques
TL;DR: ProteusNeRF is a novel framework for fast and lightweight NeRF editing using 3D-aware image context. It enables interactive editing of NeRFs through image-based edits and facilitates view-consistent image editing.
Abstract: Neural Radiance Fields (NeRFs) have recently emerged as a popular option for photo-realistic object capture due to their ability to faithfully capture high-fidelity volumetric content even from handheld video input. Although much research has been devoted to efficient optimization leading to real-time training and rendering, options for interactive editing NeRFs remain limited. We present a very simple but effective neural network architecture that is fast and efficient while maintaining a low memory footprint. This architecture can be incrementally guided through user-friendly image-based edits. Our representation allows straightforward object selection via semantic feature distillation at the training stage. More importantly, we propose a local 3D-aware image context to facilitate view-consistent image editing that can then be distilled into fine-tuned NeRFs, via geometric and appearance adjustments. We evaluate our setup on a variety of examples to demonstrate appearance and geometric edits and report 10-30x speedup over concurrent work focusing on text-guided NeRF editing. Video results and code can be found on our project webpage at https://proteusnerf.github.io.
Journal Article•10.1016/j.mechatronics.2023.103112•
A novel robotic system enabling multiple bilateral upper limb rehabilitation training via an admittance controller and force field

[...]

Ran Jiao, Wenjie Liu, Ramy Rashad, Jianfeng Li, Mingjie Dong, Stefano Stramigioli 
01 Feb 2024-Mechatronics
TL;DR: A novel robotic system enables multiple bilateral upper limb rehabilitation training via an admittance controller and force field, offering a feasible method for simulating various bimanual coordinated rehabilitation training tasks.
Abstract: Patients with hemiplegia are usually restricted to performing general bilateral activities of daily life (gbADLs). Bilateral training has been verified to contribute to the rehabilitation of physical functions. Although robotic systems are gradually being employed in the field of rehabilitation, few studies have performed simulations with regards to gbADLs for training. Therefore, a novel end-effector bilateral rehabilitation robotic system (EBReRS) for the upper limb is developed in this article for the task rendering of gbADLs, in which the gbADL-corresponding workspace is obtained via modularly designed bilateral parallelogram mechanisms. In addition, the interaction rendering of multiple bimanual modes (uncoupled, trans-soft-coupled, trans-semi-coupled, and rotation-coupled) is achieved by implementing the admittance model, the inner force field between robotic end-effectors, and the outer force field distributed around. Experiments of the proposed four rehabilitation training modes were carried out on the healthy subject, with the results showing a feasible method of the EBReRS in the simulation of multiple bimanual coordinated rehabilitation training tasks. In the future, the constructed EBReRS is expected to be exploited for home rehabilitation as a coordinated training device.
...

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve