Faster and clearer! NVIDIA achieves real-time SDF rendering for the first time, speeding up by 3 orders of magnitude
"Real-time rendering" is mainly used in the gaming field. It can convert graphic data into highly realistic 3D images in real time and is one of the key factors that determine the gaming experience.
The biggest challenge for real-time rendering is the rendering speed. Generally speaking, the rendering of a screen of game scenes must be within 1/24 seconds at least, so as not to feel like "flipping through a PowerPoint slide".
Recently, NVIDIA published a latest research result that increased the real-time rendering speed by 2-3 orders of magnitude.
In terms of rendering quality, it can also better handle graphic data of complex styles and proportions, and even synchronize shadows that may be formed by ambient lighting in real time.
DeepSDF, launched by Facebook and the MIT research team in 2019, is the best 3D reconstruction model in existing related research.
In comparison, NVIDIA's latest research is even better in terms of rendering speed and quality.
Orange represents DeepSDF rendering effect
This latest research is a paper titled "Neural Geometry Levels of Detail: Real-Time Rendering of Implicit 3D Shapes". It is the research result jointly published by NVIDIA and researchers from the University of Toronto and McGill University. It has been submitted to the preprint library arXiv.
In the paper, the researchers said that they have achieved the first SDF-based 3D high-fidelity real-time rendering by introducing an efficient neural network representation method, while achieving the most advanced geometric reconstruction quality. More importantly, compared with other studies, it has improved the rendering speed by 2-3 orders of magnitude.
1
SVO encoding, rendering speed doubled
SDF, or Signed Distance Function, is an effective representation method in computer graphics.
In existing research, a large, fixed-size multi-layer perceptron (MLP) is usually used to encode the SDF to approximate complex graphics with implicit surfaces. However, using large networks for real-time rendering leads to expensive computational costs because it requires each pixel to be passed forward through the network.
Based on this, the research team proposed a method of using sparse voxel octree (SVO) to encode geometric shapes, which can adaptively scale different discrete levels of detail LOD (Level of Detail) and reconstruct highly detailed geometric structures.
As shown in the figure, this method smoothly interpolates between geometries of different sizes and takes up reasonable memory for real-time rendering.
The researchers said that, like existing studies, they also used a small MLP to implement sphere tracking. Inspired by the classical surface extraction mechanism, they used orthogonal and spatial data structures that store distance values to finely discretize the Euclidean space so that simple linear basis functions can reconstruct the geometry.
In these works, the resolution or tree depth determines the LOD (different LODs can be mixed with SDF interpolation). In this regard, the researchers used a sparse voxel octree (SVO) to discretize the space and store learned feature vectors instead of signed distance values.
The benefit of this is that it allows the vector to be decoded into a scalar distance through a shallow MLP, which can further shorten the tree depth while inheriting the advantages of classic methods (such as LOD).
On this basis, the researchers also developed a ray traversal algorithm for this architecture, which achieved a rendering speed 100 times faster than DeepSDF. In addition, although it cannot be directly compared with the neural volume rendering method, its frame rate is 500 times faster than NeRF and 50 times faster than NSVF in a similar experimental environment.
2
Experimental test, rendering quality is more refined
Qualitatively, the researchers compared their method with four algorithms: DeepSDF, FFN, SIREN, and Neural Implicits (NI), all of which achieved the best performance in existing studies in terms of overfitting 3D geometry.
The following is a comparison of the 3D reconstruction results of different algorithms on the ShapeNet, Thingi10K and TurboSquid datasets.
It can be seen that the method shows better performance starting from LOD3. In the third LOD, not only the storage parameters are minimal, but also the inference parameters are fixed to 4737 floating point values at all resolutions, which is a 99% reduction compared to FFN and a 37% reduction compared to Neural Implicits.
More importantly, this method shows better reconstruction quality under low storage and inference parameters. As shown below:
Compared with NI and FFN, this method can render image details more accurately and is 50 times faster than FFN.
In addition, in terms of rendering quality, the researchers also tested the method on two special cases of Shadertoy: Oldcar, which contains a highly non-metric signed distance field; Mandelbulb, a recursive fractal structure that can only be represented by an implicit surface.
Both SDFs are defined by mathematical expressions from which they extract and sample distance values. The test results are as follows:
In contrast, only the architecture of this method can accurately capture the high-frequency details of complex examples. It can be seen that the results presented by FFN and SIREN are very unsatisfactory, which may be because they can only fit smooth distance fields and cannot handle discontinuities and recursive structures, making it difficult to highlight geometric details when rendering.
In short, by introducing LOD, a representation of implicit 3D graphics, this method can achieve state-of-the-art geometric reconstruction quality while allowing real-time rendering with less memory usage. However, the researchers also admitted that this method is not applicable to large scenes or very thin, volumeless objects, which will be a future research direction.
But from the current perspective, this method represents a major advancement in neural implicit function geometry, because it is the first representation based on SDF that can achieve real-time rendering and presentation. In the future, it is expected to be applied to multiple real-life scenarios such as scene reconstruction, robot path planning, and interactive content creation.
3
Related Authors
The first author of the paper is Towaki Takikawa, a computer science PhD student from the University of Toronto who previously worked in Nvidia's Hyperscale Graphics Research group.
My main research interests are in computer vision and computer graphics, and I am very interested in exploring machine learning driven 3D geometry processing algorithms. I also have some experience in software and hardware in robotics related projects.
Eight other scholars also participated in this study, including Joey Litalien, Kangxue Yin, Karsten Kreis1, Charles Loop, Derek Nowrouzezahrai, Alec Jacobson, Morgan McGuire, and Sanja Fidler.
Among them, Kangxue Yin is a Chinese scholar who worked at the Shenzhen Institutes of Advanced Technology (SIAT) of the Chinese Academy of Sciences for three years, and then was admitted to Simon Fraser University and obtained a doctorate degree.
He is currently a research scientist at NVIDIA, working on computer graphics and computer vision research.
Reference link:
-
https://nv-tlabs.github.io/nglod/
-
https://nv-tlabs.github.io/nglod/assets/nglod.pdf
-
https://arxiv.org/abs/2101.10994
-
https://github.com/nv-tlabs/nglod
Previous recommendations