This was originally posted on Dangling Pointers. My goal is to help busy people stay current with recent academic developments. Head there to subscribe for regular summaries of computer science research.
Filtering After Shading With Stochastic Texture Filtering Matt Pharr, Bartlomiej Wronski, Marco Salvi, and Marcos Fajardo I3D'24
Incompatible Axioms
Every graphics programmer worth their salt has long been taught the following tenants:
Filtering or blending non-linear values is morally wrong. Absolve yourself of sin with tricks like the *
_UNORM_SRGB
suffix.Hardware bilinear filtering is free. Why read one texel when you can read four at the same throughput?
It is easy for these two commandments to come into conflict. For example, say you run a pixel shader which samples a normal from a normal map, and uses that result to compute the value of a specular highlight. Problems ensue when enabling hardware filtering of the normal map. The first one that smacks you in the face is the fact that the filtered normals are no longer normalized (unit length). Easy enough, just run a few pixel shader instructions to normalize then. A deeper problem remains: the lighting function is not linear, so this whole setup is a violation of axiom #1.
The morally correct course of action is to read N samples from the normal map, run the lighting calculation for each one, and then compute a weighted average of the samples. The cost of this moral high ground is lower FPS, not so much different than supersampling everything.
DLSS to the Rescue
The observation in this paper is that DLSS exists and works pretty well as a technique to achieve visual results similar to supersampling at a lower computational cost. The idea is to unashamedly render a frame with aliasing and then run a post-processing step which passes the current frame (and previous frames as well) to an AI model which generates a nice-looking image.
Given that DLSS works, this paper advocates for removing axiom #2. Bilinear filtering may be “free”, but it is also sinful. Stick with the morally superior approach of passing unfiltered texel values directly to pixel shaders. But which texel value should be selected? Monte Carlo estimation is used. In other words, randomly select a texel value, using the texture coordinates to assign higher probabilities to texels which are closer to the texture coordinates computed by the pixel shader.
The paper describes two approaches for Monte Carlo estimation, FRS is discrete, FIS is continuous. Fig. 8 illustrates these two approaches:
Source: https://doi.org/10.1145/3651293
For more details, the HLSL code in S.4. is pretty easy to comprehend.
Wider Filters
Another benefit of doing away with axiom #2 is the ability to use more sophisticated filters beyond {bi,tri}linear with only a small performance cost. For example, a bicubic filter can be implemented with this scheme. The number of texels read per frame is constant no matter how wide the filter is. This is much faster than typical bicubic filtering implementations which require the pixel shader to make many calls to the texture sampling hardware.
Results
Fig. 9 illustrates the benefits of not violating axiom #1. Compare (a) to {(e) and (g)}.
Source: https://doi.org/10.1145/3651293
Fig. 1 illustrates the performance gain from implementing tricubic filters with this approach:
Source: https://doi.org/10.1145/3651293
Dangling Pointers
Monte Carlo sampling seems ripe for hardware implementation. It seems like a minor addition to filtering HW to generate a random number and use that to randomly select a single texel rather than compute a weighted average. Woe to the engineer who gets to write the conformance test given the randomness involved.
It would be nice to see a comparison of this approach to other possibilities which allow filtering to take place after a non-linear function has been applied. Shooting from the hip:
The sample
interpolation modifier in HLSL allows a shader to indicate that a particular input has a different value for each (render target) sample. A smart driver can see that some inputs have the sample
modifier while others do not, and the driver can re-arrange the computation such that code that only depends on per-pixel inputs runs once per pixel. You can think of this like a fork
model. You start with a thread per pixel which runs part of the shader, and then that thread fork
s to one thread per sample, which runs the rest of the shader. I wonder if join
could be added to this scheme to create a fork-join
model. A join
intrinsic could indicate that all per-sample values should be converted back into a per-pixel value with an averaging operation.
Another approach is object space lighting, where the non-linear function runs in a separate pass and stores outputs in an intermediate texture, which can then be filtered by texture filtering invocations from the pixel shader.
Top comments (0)