Introduction
The Name
In Slavic mythology Leshy is a guardian of forests. Which makes perfect name for vegetation related system.
Application
Leshy is a system for rendering vegetation and streaming vegetation placement data.
VSP
Conquest was using VegetationStudioPro
(VSP) as a vegetation solution, as the world is procedural, vegetation placement needed to be runtime.
Fall of Avalon has static offline pre-prepared world, that means more optimizations can be introduced.
VSP has storage for offline baking, but VSP is no longer developed nor maintained. That means the issues won't be fixed by package update, we would need to fix these manually and code of VSP is complicated.
Issues
- Shaders - VSP uses "old" instancing model, which makes shaders a lot harder to maintain. Every time we upgraded Unity, all vegetation was glitching and required long days of fixing.
- Variations limitation - we needed the same prefab spawned with different rules, for VSP every such item () was a separate item, resulting in a lot of small jobs and memory duplication.
- Slow open world streaming - streaming was slow and required a lot of small jobs to be scheduled
- High memory footprint - rotation and scale weren't compressed
What was good
- Editor setup - level designers and artists were already familiar with VSP setup
- Procedural placement - algorithm of procedural placement of vegetation is the strongest part of VSP
- Baking - VSP can output generated vegetation placement into a file
Solution
VSP
From the list above you can find that editor mode part of VSP was not bad, therefore we kept it. That means vegetation setup, placement, and baking are done by VSP. Then our system will pick up baked data and transform it (baking time) and at runtime it will stream cells, render vegetation and place colliders next to the player.
Entities
As you know, from the previous post, we had experience with Entities. Entities are promoted as fast spawn, fast rendering, low memory overhead. Sounds like a perfect solution.
With transformed data in place, I did a few tests. Every test failed, having hundreds of thousands of vegetation entities spawned and removed is too much for entities, operations were too slow.
Additionally, memory footprint was a few times the number from VSP, as there was a requirement for some additional components and transform is float4x4
.
Custom solution
After Entities attempts failed, only fully custom solutions were possible:
- There was possibility to rewrite VSP jobs and how it deals with streaming and variants. The shader problem would still be in place.
- Custom kind of GPU driven pipeline, most complicated, with most unknowns.
- BRG-based rendering, new shiny tech empowering entities with low documentation (still more than other approaches), but entities is good usage example.
With all that, we chose BRG, GPU driven pipeline may yield better performance and memory, but shaders were unknown and may not mean we would achieve it. After time spent on Entities attempts, we wanted a more guaranteed path (and faster to do).
BRG
BatchRendererGroup (BRG) is the heart of modern Unity rendering. I mentioned that Entities Graphics uses it, but new GPU Resident Drawer also uses it.
BRG has a few parts:
- Resources registration - register meshes and materials to obtain burstable handles to them
- Batches - batch is a graphics buffer with description of the data it contains
- Culling callback - callback with struct to fill with drawing data, for most cases you will also perform frustum culling here and maybe other cullings if you are using them (custom LOD solution, custom occlusion culling). You can decide which rendering types you are supporting by passing them to
SetEnabledViewTypes
.
Resources registration
You need to register meshes and materials, you do it by respectively calling RegisterMesh
and RegisterMaterial
, which give, respectively, BatchMeshID
and BatchMaterialID
. After you are done with any asset, you must call UnregisterMesh
and UnregisterMaterial
with appropriate ID.
Batches
Batch consists of metadata
and buffer
. You can register batch with AddBatch
and that will give you BatchID
.
Metadata
is an array of MetadataValue
.
MetadataValue
defines which property it defines, how to interpret data, and where it starts.
static MetadataValue CreateMetadataValue(int nameID, int gpuOffset, bool isPerInstance) {
const uint IsPerInstanceBit = 0x80000000;
return new MetadataValue {
NameID = nameID,
Value = (uint)gpuOffset | (isPerInstance ? IsPerInstanceBit : 0),
};
}
Most common are:
-
unity_ObjectToWorld
- Local2World transform in form of matrix with 4 rows and 3 columns -
unity_WorldToObject
- Inverse of above -
unity_MatrixPreviousM
- Previous frame Local2World transform required to generate motion vectors
Example
Let's say we want batch where there is a max of 3 instances (to keep visualization possible), support motion vectors, per instance _TintColor
and per batch _DebugColor
.
Create batch buffer:
var maxInstances = 3;
var uintMatrixSize = sizeof(PackedMatrix) / sizeof(uint);
var uintColorSize = sizeof(Color) / sizeof(uint);
var transformsSize = uintMatrixSize * 3 * maxInstances; // (unity_ObjectToWorld, unity_WorldToObject, unity_MatrixPreviousM)
var colorsSize = uintColorSize * 1 * maxInstances; // _TintColor
var debugColorSize = uintColorSize * 1 * 1; // Per batch singe value of _DebugColor
var fullSize = transformsSize + colorsSize + debugColorSize;
var graphicsBuffer = new GraphicsBuffer(GraphicsBuffer.Target.Raw, fullSize, sizeof(uint));
Now create metadata descriptor:
var metadata = new NativeArray<MetadataValue>(5, Allocator.Temp, NativeArrayOptions.UninitializedMemory);
var offset = 0;
metadata[0] = CreateMetadataValue(Shader.PropertyToID("_DebugColor"), offset, false); // I like to keep per batch data first
offset += sizeof(Color); // Per batch mean it is just single value
metadata[1] = CreateMetadataValue(Shader.PropertyToID("unity_ObjectToWorld"), offset, true);
offset += sizeof(PackedMatrix) * maxInstances;
metadata[2] = CreateMetadataValue(Shader.PropertyToID("unity_WorldToObject"), offset, true);
offset += sizeof(PackedMatrix) * maxInstances;
metadata[3] = CreateMetadataValue(Shader.PropertyToID("unity_MatrixPreviousM"), offset, true);
offset += sizeof(PackedMatrix) * maxInstances;
metadata[4] = CreateMetadataValue(Shader.PropertyToID("_TintColor"), offset, true);
offset += sizeof(Color) * maxInstances;
// If you have more data keep adding it
Register and dispose no longer needed memory:
var batchID = _brg.AddBatch(metadata, graphicsBuffer.bufferHandle);
metadata.Dispose();
Then the buffer would be like this:
But it's hard to read, so let me break that semi-continuous into more described image:
Culling callback
This callback must fill draw data inside BatchCullingOutput
. It returns JobHandle therefore the code can be bursted. The callback also contains BatchCullingContext
which contains information and all data for performing right culling and producing draw data.
You should perform a few operations here:
- Frustum culling
- Layer culling
- Split calculation
If you implement your own Occlusion Culling you can apply it here as well, Leshy doesn't have occlusion culling.
You also need to determine which views your callback methods support. There are following views available:
-
Camera
- is the main rendering, GBuffer pass, called once per camera -
Light
- is the shadows maps pass, called for every light with shadows -
Picking
- when you click in scene view, this callback gets called to determine what you clicked on -
SelectionOutline
- as the name indicates, used to outline rendering in scene view -
Filtering
- I think it is used when you are using graying out non-matching when searching in hierarchy window
For main functionality you want to handle Camera
and Light
.
For camera splitMask
is not important, but you must assign it correctly for shadows.
Data
To prepare for the streaming chapter you must know how the data is represented for Leshy.
A vegetation item has rendering data (meshes, materials, lod distances, bounds, and optional collider prefab).
Each vegetation item has its own 2D grid, where each instance is assigned to a grid cell. Instance is represented by float3 position
, half4 quaternion
and half3 scale
. Rotation could be uint
but I wasn't aware of such compression method possibility at the time. I said 2D grid cell, but after instances are assigned, cell is no longer represented as square, but instead it is represented as circle.
Last piece is GPU data, the format is described at the BRG section. Leshy is fixed cost here, as it allocates the whole GPU memory at initialization.
Streaming
With rendering handled, the next part was streaming. Very important as it minimizes required memory; it also reduces work done inside culling callback as not present cells don't need to be processed.
Streaming has the following steps:
- Calculate visible cells - cell has position and radius, cell visibility is determined by two factors, which are ORed. Cell frustum visibility and cell distance to camera. Cell is checked as circle as sphere vs frustum is faster than box vs frustum.
- Cell content loading - cell data is stored in a file, to start loading we need to have loading slot available (to not load too much at once), then loading from file into CPU buffer starts, it will take several frames
- Filter data - particular vegetation types support lowering density (so you can lower vegetation density to boost performance), to do that we randomly remove X% of the loaded instances
- Cell transfer to GPU - the CPU buffer is then transferred to BRG buffer on the GPU, but that step requires a few actions:
- Find free cell(s) - the single vegetation cell into a single memory block on GPU is preferred, but sometimes it must be split into multiple GPU memory blocks
- Copy CPU minimal data to transfer buffer - CPU data needs padding to fill GPU buffer, so
MemCpyStride
is used to copy CPU buffer into CPU transfer buffer which will be transferred to GPU memory - Expand GPU data - CPU data with padding is not what is required by BRG so, we transfer minimal data from CPU to GPU, then compute shader is used to copy and expand data on GPU, from small transfer buffer into big BRG buffer
Stream-out is simple, just clear CPU buffers, and mark CPU representation of GPU memory block(s) as free.
Collisions
Large vegetation (trees, boulders) must have collision, otherwise it would be very odd. Having all spawned vegetation with colliders would kill Unity, so the idea is to place colliders only around the player. To do it, we go with the following steps:
- Dispose bitmask for removed cells
- For each cell, check if in radius
- If in radius, check each instance and generate bitmask which indicates which instance should have collider
- Move pooled colliders to the place where we need them
- Hide colliders which are not used anymore
Quality
Vegetation has its own setting in graphics settings. The artists must set up what each quality level does for each vegetation type for the following properties:
-
Density
- artists generate vegetation with Ultra density (1.0), then we can specify reduction for lower quality, density cannot change for vegetation with collisions as it would affect gameplay -
SpawnDistance
- defines the distance from which the cell will be marked as to spawn -
Shadows
- choose if vegetation should cast shadows -
BillboardDistance
- for vegetation with billboards defines when billboard starts
Handplacing
Last part of the system, introduced after a few months was the possibility to manually place vegetation and then Leshy during baking will pick it up. For Leshy runtime there is no difference between procedural and handplaced vegetation, all end up in the same data stream.
Issue
Leshy entered the main branch with no major issues or bugs. Also with knowledge from Drake development, Leshy was smooth. For a few months only minor workflow improvements were needed along with optimizations for baking process.
Biggest issue was discovered when we started to play on Xbox, the slower the game was running, the more weirdly the vegetation behaved. We tracked the issues were present for all new graphics APIs (DX12 for PC, DX12 for Xbox and PS5 API), deeper investigation resulted in discovery that there were two system changes required.
First:
Group data binding and compute shader dispatch within command queue, that eliminated some part of misbehavior but not all.
Second:
Don't use LockBufferForWrite
. It sounds like a great feature for transferring CPU data to GPU transfer buffer, which then is expanded through compute shader, but even with fence and waiting the data sometimes were corrupted. It may be we did something wrong, but the bug was present mostly on Xbox so debugging became very time consuming, with that in mind we decided to pay additional 0.01ms and go with ordinary SetData
.
Closing
You can see how important experience is. If no exploration was done during Drake development then the idea to just use BRG in a custom way won't pop in the mind. We would probably go with custom GPU pipeline, complex shaders tool and encounter more roadblocks. BRG is nicely integrated into engine (especially picking and shaders generation), provides burstable API and eliminates a lot of manual wiring and boilerplate when compared to doing similar instancing system written from scratch, so huge amount of time saved.
With more time I would do vegetation harvesting (so you can remove a particular instance from rendering) and LOD cross-fade. Probably in reverse order, because lack of LOD cross-fade you can see annoying pop-in.
BRG
When Leshy was developed, BRG was a new thing which felt like Unity's future of rendering, that had underlying promise of idea expansion, new features and so on. It was kind of true, new GPU Resident Drawer is based on BRG, it also has GPU occlusion culling. Unfortunately, occlusion culling is not made in a way that can be used by custom BRG-based system, what a shame. That puts a big question mark for how the future of BRG will look. Would it be frozen in its current state for custom projects and only will be expanded and developed for Unity's internal systems?
Another missing pieces are:
- Batch lighting - right now you need to deal with each shadow-casting light separately, more for that issue here.
- Keyword override - possibility to enable keyword for every material registered would be great, right now if you want such behaviour, you need to create a material copy and manually enable keyword, and somehow track changes to original material and redo them to the copy.
- Depth prepass view - right now camera view is used for depth prepass as well as for GBuffer passes. With separate culling I could skip depth prepass for most vegetation, and keep prepass only for big and close renderers. So for 20% of draws I would get 80% of final depth, that is how most of the engines do it.
Top comments (0)