What is Neural Rendering?

As our world becomes increasingly digitized, the methods by which we render these virtual worlds are rapidly changing. Neural rendering has huge potential in improving many aspects of the rendering pipeline by leveraging generative machine learning techniques. What is neural rendering? In this article we'll introduce the concept, compare it to classical computer graphics, and discuss what it means for the future.

Classic Rendering

Creating 3D virtual worlds today is a complicated and involved process. Each item, or asset, in a virtual scene is represented by a polygon mesh (Slide 1). This polygon mesh can either be modeled by an artist, or scanned into existence: both of these processes are manual and time consuming. The more detailed we want this specific asset to be, the more polygons the mesh will have.

The polygon mesh is only the beginning. Each surface in this 3D world also has a corresponding material, which determines the appearance of the mesh. At runtime, the material and mesh of the object are used as inputs to shader programs, which calculate the appearance of the object under given lighting conditions and a specific camera angle (Slide 2). Over the years, many different shader programs have been developed, though the fundamental principle is the same: use the laws of physics to calculate the appearance of an object. This is most evident in the approach known as Ray Tracing, where every light ray is traced from its source down to every surface it bounces on.

This render pipeline can create amazing results: every CGI effect in every movie you have seen, and every game you have ever played uses some form of this "classical computer graphics" pipeline. The main pain point for this pipeline is in the huge amount of work required to explicitly define every object and every material, and the large computation required to render a realistic or complex scene. Which leads us to the question: what if we didn't have to define every object and calculate every light bounce?

Enter Neural Rendering

So, what is neural rendering? Though still a very young field, it's one which has grown to encompass a large number of techniques-GANs are a form of neural rendering. The key concept behind neural rendering approaches is that they are differentiable. A differentiable function is one whose derivative exists at each point in the domain. This is important because machine learning is basically the chain rule with extra steps: a differentiable rendering function can be learned with data, one gradient descent step at a time. Learning a rendering function statistically through data is fundamentally different from the classic rendering methods we described above, which calculate and extrapolate from the known laws of physics.

One of the coolest flavors of neural rendering is novel view synthesis. In this problem, a neural network learns to render a scene from an arbitrary viewpoint. Slides 3 and 4 are figures from two great papers on this topic: one from Google Research [1] and the other from Facebook Reality Labs [2]. Both of these works use a volume rendering technique known as ray marching. Ray marching is when you shoot out a ray from the observer (camera) through a 3D volume in space and ask a function: what is the color and opacity at this particular point in space? Neural rendering takes the next step by using a neural network to approximate this function.

The Future of Rendering

We really just scratched the surface when it comes to neural rendering. If you want to learn more, we recommend this super extensive summary paper [3]. But before we go, what could this mean for the future?
With neural rendering, we no longer need to physically model the scene and simulate the light transport, as this knowledge is now stored implicitly inside the weights of a neural network. This means that it will be possible to render your face, while it is inside a VR headset (Slide 5), without ever having to store or distort a 3D polygon mesh of your face!

With neural rendering, the compute required to render an image is also no longer tied to the complexity of the scene (the number of objects, lights, and materials), but rather the size of the neural network (time required to perform a forward pass). This opens up the door for the possibility of really high quality rendering at a blazingly fast frame rate.
If you're interested in the intersection of machine learning and 3D, please check out our open source synthetic data toolkit zpy [5]. Your feedback, commits, and feature requests will be invaluable as we continue to build a more robust set of tools for generating synthetic data. Who knows? Perhaps the next great neural rendering model will be trained using data generated with zpy.

References

[1] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (arxiv.org/pdf/2003.08934.pdf)
[2] Neural Volumes: Learning Dynamic Renderable Volumes from Images (arxiv.org/pdf/1906.07751.pdf)
[3] State of the Art on Neural Rendering (arxiv.org/pdf/2004.03805.pdf)
[4] zpy: an open source synthetic data toolkit.

Top comments (1)

Wildan Mubarok • May 12 '24

The neural rendering part was confusing for me a little bit. Most text are for is Neural Radiance Fields! And neural rendering itself is much wider than that.