beefed.ai

Posted on Apr 6 • Originally published at beefed.ai

BVH Refit vs Rebuild Strategies for Dynamic Scenes

#programming

You pushed animated characters into the scene and the renderer either hiccups (you hit a per-frame rebuild) or slowly loses traversal efficiency (you only refit and the tree quality degrades). Those are the two visible failure modes: hard stalls from rebuild spikes, or a steady drop in rays/sec and increased shader work because node overlap ballooned. You need a principled way to decide which update strategy to use and how to schedule work so the pipeline never blinks.

Contents

Quantifying the trade-off: when refit beats rebuild
How to refit well: algorithms, error bounds, and practical tricks
Multi-level and hybrid hierarchies: BLAS/TLAS, partial rebuilds, and scheduling
Measuring the impact: build time, rays/sec, and frame stability
Practical protocol: checklist and per-frame decision tree

Quantifying the trade-off: when refit beats rebuild

Start with the cost model and the concrete knobs the GPU APIs give you. A full, SAH-optimized bvh rebuild (top‑down SAH or spatial-splitting builders) typically produces the best trace performance but costs the most CPU/GPU time; fast parallel builders such as HLBVH/treelets let you push rebuilds toward real-time rates, but they still cost notably more than a simple refit on the same input set. On the other hand, a bvh refit merely recomputes leaf AABBs and propagates them up the existing topology — it is much cheaper but can increase traversal cost over time by introducing overlap and elongated nodes. These trade-offs are documented in both practical guides and academic studies.

Key, practical rules extracted from the API and industry guidance:

The DXR/Vulkan acceleration-structure model separates BLAS and TLAS and exposes ALLOW_UPDATE (DXR) / VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_UPDATE (Vulkan) to let you update an AS instead of rebuilding it; updates are faster but constrained (no topology/primitive-count changes). Use these flags where topology is stable.
Refit is orders of magnitude cheaper in many real engines and libraries; measurement and experience suggest a refit can be roughly 5–20× faster than a full SAH rebuild depending on builder choice and hardware, but runtime quality loss compounds without corrective measures.

Decision formula (practicalized)

When only instance transforms changed (rigid transforms): update TLAS / instance transforms — almost free.
When geometry vertices moved modestly (small deformation): perform refit on the BLAS and measure a quality metric (see next sections).
When topology or primitive count changed, or when a measured quality metric exceeds your threshold: schedule a rebuild of that BLAS.
When many BLASes degrade simultaneously, amortize rebuilds across frames and prefer fast-build modes where available.

A simple quantitative heuristic to start with

Compute SAH_delta = (SAH_after_refit - SAH_before) / SAH_before.
If SAH_delta > 0.10 (10%) and the BLAS is on the hot path (large screen-space contribution), prefer rebuild; otherwise keep refit and mark for periodic rebuild. Tune the 10% threshold to your content and hardware: it’s a rule-of-thumb that aligns with observed ray-throughput regressions in practice.

How to refit well: algorithms, error bounds, and practical tricks

Refit basics — what to do and why

The canonical refit() operation: recompute leaf AABBs from current vertex positions, then perform a bottom-up pass that recomputes ancestor bounds from children. This is O(n_nodes) and is trivially parallelizable per subtree. Most libraries provide a refit() primitive or an option in their builder.

Pseudocode (iterative bottom-up refit)

// C++-style pseudocode (single-threaded form for clarity)
void refitBVH(Node *root) {
    // assuming leaves have up-to-date per-primitive bounds
    // do post-order non-recursive traversal using a stack
    for (Node *n : postorder_nodes(root)) {
        if (n->isLeaf()) {
            n->bounds = computeLeafBounds(n);
        } else {
            n->bounds = union(n->left->bounds, n->right->bounds);
        }
    }
}

Selective / incremental refit

Avoid touching the whole tree every frame. Collect a set of modified leaves (bulk updates) and walk ancestors until the propagated bounds no longer change. Many systems (three-mesh-bvh, Warp, Embree-like implementations) implement a refit(nodeSet) that limits work to affected nodes. This reduces memory traffic and avoids redundant work.

Error bounds and motion envelopes

Compute a conservative bound of vertex motion between rebuilds: max_displacement = max(|v_new - v_old|) per vertex or per-primitive. Expand each primitive's AABB by that displacement to guarantee correctness without immediate rebuilds. For animated skinned meshes, compute per-frame bounds in object space and translate/rotate them into world space. Use those envelopes to decide whether a refit will produce overly large parent AABBs. The max_displacement approach is the standard way to get a provable bound on refit error.

Repairing topology: tree rotations, reinsertion, and local rebuilds

Refit preserves topology; when objects drift, topology becomes suboptimal. Use local restructuring: tree rotations, reinsertion of leaves, or small rebuilds of affected treelets to restore SAH quality without a global rebuild. Kopta et al. present a fast incremental update using rotations that trades a little build work per frame to avoid full rebuilds; Yoon et al. describe selective restructuring metrics for choosing nodes to modify. Those techniques get you most of the tracing quality back for a fraction of the rebuild cost.

Practical tricks that matter in production

Use conservative expansion (motion bounds) to avoid flicker when you do lazy refits. Expand tight bounds slightly to avoid oscillation between refit and rebuild decisions.
Keep vertex buffer layouts stable; many update APIs forbid changes to vertex formats or primitive counts when using updates — changing them forces a rebuild. Enforce topology-stability early in the asset pipeline.
Run refit on the GPU when you can: GPU-side refit implementations or LBVH-style fast rebuilds can hide latency of many updates, and asynchronous compute queues help hide the cost. Use worker threads to generate build commands and async compute for BLAS work.

Important: Refit is a cheap corrective. Treat local restructuring and periodic rebuilds as part of a continuous maintenance budget for your acceleration structures.

Multi-level and hybrid hierarchies: BLAS/TLAS, partial rebuilds, and scheduling

Why multi-level BVH is the practical default

The explicit TLAS/BLAS split (DXR/Vulkan) lets you avoid rebuilding geometry that does not deform: static geometry stays in compacted BLASes (fast trace), dynamic objects go into separately-managed BLASes updated/refit/rebuilt on their cadence. This separation is the single most practical lever for dynamic scenes.

Pattern: static BLAS + dynamic BLAS + frequent TLAS updates

Build static BLASes with PREFER_FAST_TRACE and compact them once. Build dynamic BLASes with ALLOW_UPDATE and either PREFER_FAST_BUILD or PREFER_FAST_TRACE depending on whether you plan to rebuild often. Update TLAS every frame with instance transforms only. This is the pattern recommended in vendor best practices.

Partial rebuilds and selective restructuring (how to limit scope)

Two proven approaches:
1. Selective restructuring / reinsertion: evaluate benefit metrics at node-level, restructure only nodes with the largest culling-looseness (Yoon et al.).
2. Treelet rebuilds / local rebuilds: rebuild small subtrees (treelets) where SAH degradation exceeds threshold. This is cheaper than a full rebuild and preserves global structure elsewhere. Kopta et al. and followups show strong results for animated scenes where motion is local.

Scheduling and amortization

Avoid scheduling many heavy rebuilds in the same frame; distribute them across frames (round-robin, rebuild budget per-frame). The NVIDIA best-practices explicitly recommends distributing rebuilds and periodically rebuilding updated BLASes to prevent long-term quality erosion. Use a per-frame rebuild budget (ms or bytes of work) and an LRU / priority queue keyed by SAH_delta × screen_importance.

Practical hybrid recipe (example)

Group geometry by expected update frequency: static, mostly-static (occasional rebuild), animated small-deformation (refit + rotations), fully-dynamic/topology-changing (always rebuild).
For many small moving objects (e.g., crowds), put each object into its own BLAS and update transforms in TLAS; rebuild BLASes in the background every N frames or when SAH_delta crosses the threshold.

Measuring the impact: build time, rays/sec, and frame stability

Metrics you must measure (not guess)

Build time (ms): wall-clock time for BLAS/TLAS builds or updates; measure with GPU timestamp queries for GPU builds or host timers for CPU builds.
Rays/sec (throughput): measure rays_per_frame * frames_per_second or extract hardware counters where available; ideally measure both primary and secondary ray throughput (different costs).
Frame stability (jitter): collect min/avg/max frame time; annotate spikes with the type of work performed that frame (rebuild / refit / permutations).
Traversal quality proxy: node traversals per ray or SAH-like metric; many builders expose postbuild info (triangle counts, compacted size) you can record.

Rule-of-thumb comparative table

Strategy	Typical cost (relative)	Trace quality (initial)	Best for
`refit`	0.05–0.2 × rebuild time (heuristic)	Drops over time without topology fixes	Small deformations, many objects, tight frame budgets
local treelet rebuild / rotations	0.2–0.6 × rebuild	Restores much of the quality	Localized deformation or drifting clusters
full SAH rebuild	1.0 × (baseline)	Best	Large deformations, topology changes, offline or background work
TLAS-only update	~0 (cheap)	Depends on BLAS quality	Rigid instance transforms

Notes: these numbers are workload- and hardware-dependent; vendor guidance and forum experience report refits being an order of magnitude cheaper than rebuilds in many cases and fast GPU builders (HLBVH/treelets) make rebuilds viable at scale when amortized or parallelized.

How to attribute performance regressions

Correlate spikes in GPU/CPU frame time with build calls (timestamps), then correlate rays/sec drops with a rising SAH proxy or increased node traversals per ray. Use Nsight (NVIDIA) or PIX (Windows DXR) to capture a frame, inspect acceleration-structure build times, and see which BLASes increased traversal cost. Tools and tutorials provided by vendors walk through this process.

A basic experiment to quantify the break-even

Capture baseline trace performance with the BLAS freshly built.
Apply N frames of your target animation using only refit and measure the decline in rays/sec.
Rebuild and measure the improvement and the time cost; the break-even is when rebuild cost / reclaimed frame-time savings < acceptable penalty.

Practical protocol: checklist and per-frame decision tree

Checklist (implement immediately)

Segregate geometry: mark static vs dynamic vs topology-varying assets at asset import.
Expose build flags: ensure you can build BLAS with ALLOW_UPDATE, PREFER_FAST_BUILD, or PREFER_FAST_TRACE per geometry.
Implement metrics: compute SAH (or node-traversal proxy), screen_importance (screen-space bbox), and build_time_estimate per BLAS.
Maintain a rebuild priority queue keyed by priority = SAH_delta × screen_importance / build_time_estimate.
Provide a rebuild budget: rebuild_ms_per_frame = fraction of frame budget you allow for AS maintenance (sample: 0.5–2.0 ms at 60 FPS).

Per-frame decision tree (pseudocode)

// high-level per-frame loop
collectChangedObjects(changedList);

for (obj : changedList) {
    if (obj.onlyTransformChanged) {
        updateTLASInstanceTransform(obj.instanceId); // cheap
        continue;
    }
    if (obj.topologyChanged) {
        scheduleImmediateRebuild(obj.BLAS);
        continue;
    }
    // vertex deformation, no topology change
    refitBLAS(obj.BLAS); // cheap update
    float sahDelta = estimateSAHDelta(obj.BLAS);
    if (sahDelta > SAH_REBUILD_THRESHOLD && obj.isVisibleOnScreen()) {
        enqueueForRebuild(obj.BLAS, priorityFor(obj));
    }
}

// amortize rebuilds according to rebuild_ms_per_frame budget
float budget = rebuild_ms_per_frame;
while (budget > 0 && !rebuildQueue.empty()) {
    BLASInfo info = popHighestPriority(rebuildQueue);
    float estimatedTime = estimateBuildTime(info);
    if (estimatedTime <= budget) {
        doRebuild(info);
        budget -= estimatedTime;
    } else {
        // partially rebuild (treelet) or defer
        if (canDoLocalRepair(info)) {
            doLocalRepair(info);
            budget -= estimatedTimeLocalRepair;
        } else {
            defer(info);
            break;
        }
    }
}

Tuning knobs and starting values

SAH_REBUILD_THRESHOLD: start at 10–15% (0.10–0.15) and tune by measuring rays/sec.
rebuild_ms_per_frame: start with 0.5–2.0 ms for 60 FPS targets; increase for VFX/film offline budgets.
Screen importance: use pixel area × LOD weight. High screen-space contribution justifies earlier rebuilds.

Implementation pitfalls to avoid

Do not mark BLAS with ALLOW_UPDATE if you expect topology changes — the API forbids certain changes during updates and will require a full rebuild anyway.
Avoid many scattered small rebuilds in a single frame — they cause CPU/GPU stalls. Batch and distribute them.
Beware driver/library quirks: older OptiX/driver combos historically had host→device copy bottlenecks when doing many transform updates; organize transforms to be contiguous and prefer single-block uploads when possible. Check vendor notes for your stack.

Closing

Treat bvh refit as the low‑latency, high-frequency tool and bvh rebuild as the quality recovery operation you schedule and amortize. Use motion envelopes and selective restructuring to extend the life of a refit, separate static and dynamic content into BLAS/TLAS so you only touch what moves, and instrument SAH or node-traversal proxies to drive rebuild decisions rather than guessing. Do the math on build time vs. reclaimed trace cost and schedule rebuilds into a strict per-frame budget so your renderer preserves rays/sec without ever stalling the frame.

Sources:
Best Practices for Using NVIDIA RTX Ray Tracing (Updated) - NVIDIA developer blog; practical guidance on BLAS/TLAS organization, when to update vs rebuild, and scheduling recommendations.

DirectX Raytracing (DXR) Functional Spec - Microsoft DXR spec; details on ALLOW_UPDATE, TLAS/BLAS semantics, and update constraints.

Vulkan Acceleration Structures (VK_KHR_acceleration_structure) — Build flags and updates - Vulkan documentation; ALLOW_UPDATE semantics and update constraints.

Fast, Effective BVH Updates for Animated Scenes (Kopta et al., I3D 2012) - Introduces tree rotations and lightweight incremental updates for animated scenes.

Ray Tracing Dynamic Scenes using Selective Restructuring (Yoon, Curtis, Manocha, EGSR 2007) - Selective restructuring metrics and partial-rebuild strategies for dynamic BVHs.

Maximizing Parallelism in the Construction of BVHs, Octrees, and k-d Trees (Tero Karras, HPG 2012) - HLBVH and fast parallel BVH construction techniques used to make rebuilds feasible.

Fast BVH Construction on GPUs (Lauterbach et al., 2009) - Early GPU BVH builders and hybrid approaches for fast construction.

RT-DEFORM: Interactive ray tracing of dynamic scenes using BVHs (Lauterbach et al., RT 2006) - Detecting BVH quality degradation and strategies for deformable geometry.

Cycles BVH — Blender Developer Documentation - Practical implementation notes: two-level BVH, refit usage, and when refit degrades tree quality.

Warp runtime docs — refit() and rebuild() semantics (NVIDIA Warp) - Example library semantics for refit vs rebuild and notes on constructors for different platforms.

OptiX Host API — refit property and builder options - OptiX builder properties supporting refit and trade-off discussion.

Real-Time Rendering — Ray Tracing Resources and Ray Tracing Gems references - Curated resources and practical references for BVH construction, dynamic scenes, and real-time ray tracing techniques.