Zero-shot point-cloud registration actually transfers: BUFFER-X inside splatreg

#computervision #cuda #python #opensource

splatreg registers 3D Gaussian splats: two 3DGS scans in, one SE(3)/Sim(3) transform out, optionally one fused splat. Its coarse-init stage seeds a Levenberg–Marquardt refine, and until recently the practical default for real scans was a classical FPFH+RANSAC seed. This post is about what happened when I swapped in BUFFER-X (ICCV 2025), a zero-shot learned registration model — and, since it's probably the more useful part, the exact recipe for building its 2023-era CUDA extensions on a 2026 stack.

Why zero-shot matters for a splat registrar

Per-dataset-trained backbones like PSReg and DiffusionPCR top the 3DMatch leaderboard at 95%+ registration recall. But a splat registrar should not require training a per-scene or per-sensor model to align two captures someone made with a phone and a drone. So splatreg deliberately keeps a generalist seed: BUFFER-X is a single pretrained model that claims to register across sensors and scales with no per-dataset tuning. The question was whether the claim survives contact with the official benchmarks when wired in as a real seed.

The numbers

I ran the complete official gt.log pair sets — not a curated subset — with a pair counted as recalled at RRE < 15° and RTE < 0.3 m:

3DMatch (8/8 scenes, n=1619): BUFFER-X seed 0.962 recall, median RRE 1.46°, vs 0.630 / 2.12° for the classical FPFH seed.
3DLoMatch (the hard 10–30% overlap split, n=1781): 0.777 / 2.77° vs 0.122 / 103.4°.

That 3DLoMatch line is the story: 6.4× the recall, and the classical seed's median error of 103° means it isn't "less accurate" there — it's landing in random basins. BUFFER-X won every scene on both splits.

The caveat that keeps these numbers honest: both seeds were pushed through the identical lighter feature_align refine, so the comparison isolates the seed. These are not full-pipeline absolute numbers to lay next to leaderboard entries; they answer "which seed should splatreg trust on real scans," nothing more.

Here is one real low-overlap pair (7-scenes-redkitchen 35→46, ground-truth overlap 0.10) watched end to end — the classical seed slews the fragment into the wrong basin at 151.5° error, then BUFFER-X + refine locks on at 2.0°. Both transforms are actual library outputs; the animation interpolates between real estimates, nothing is hand-posed:

The build recipe (the part you actually came for)

BUFFER-X ships native extensions written for an older stack. Getting them to build on CUDA 12.8 / RTX 5090 (sm_120) / torch 2.11 / numpy 2.4, without sudo, took a day of archaeology. The full recipe is in docs/BUFFERX_BUILD_MODERN_CUDA.md; these are the walls I hit:

1. pointnet2_ops hardcodes dead GPU architectures. Its setup.py sets TORCH_CUDA_ARCH_LIST = "3.7+PTX;5.0;...", and nvcc 12.8 flat-out rejects compute_37. Patch that line to your real arch ("12.0" for sm_120) and install with pip install --no-build-isolation ..

2. The KPConv C++ wrappers use the numpy 1.x C-API. numpy 2.x removed it. The port is mechanical once you know it: NPY_IN_ARRAY → NPY_ARRAY_IN_ARRAY, and cast the PyObject* handles to PyArrayObject* everywhere PyArray_NDIM/DIM/DATA is called.

3. You don't need apt install libtbb-dev. pip install tbb tbb-devel drops tbb/tbb.h under <venv>/include; point CPLUS_INCLUDE_PATH there (plus a --depth 1 clone of header-only Eigen) and the wrappers compile sudo-free.

4. Two CUDA deps don't deserve a build at all. BUFFER-X only uses knn_cuda.KNN with k=1 — that's torch.cdist + topk. And torch_batch_svd is just torch.linalg.svd, which batches natively now. Tiny pure-torch shim modules on the path replace both; they ship in docs/bufferx_shims/.

5. The silent killer: the pretrained checkpoints are full-model state dicts. The keys are prefixed Desc./Pose.. Load them into a submodule with strict=False and nothing matches — you get randomly initialized weights that produce garbage seeds with no error anywhere. If your zero-shot model performs like a random-pose generator, check this first.

Using it

pip install splatreg

from splatreg.api import register
result = register(target, source, init="bufferx")   # zero-shot seed + LM refine

If the BUFFER-X weights or extensions are absent, init="bufferx" logs a note and falls back to the classical robust seed — it never fails silently. Everything downstream (Sim(3) scale recovery, spherical-harmonic rotation via real-basis Wigner-D, pose covariance for pose graphs, merge + dedupe) is identical regardless of which seed you chose.

What it doesn't do

Zero-shot does not mean magic. Below roughly 40% retained overlap the rotation-disambiguating geometry is physically absent, and no seed fixes that — splatreg flags those cases as ambiguous rather than silently wrong-posing, and scale is unobservable under thin overlap no matter what. The 0.962/0.777 figures are seed-isolation numbers under one shared refine, not leaderboard entries; per-dataset-trained models still hold the absolute 3DMatch record and I say so in the README. The BUFFER-X path needs a real CUDA build (the recipe above) — CPU-only installs get the classical fallback. And splatreg itself registers splats; if all you have are raw point clouds, BUFFER-X upstream serves you directly without any of my wrapping.

Every number here has a reproduction path in RESULTS.md, and the figure/GIF generators live in examples/.