DEV Community: Arkadiusz

A 3D Body from Eight Questions — No Photo, No GPU

Arkadiusz — Wed, 22 Apr 2026 12:34:49 +0000

8 questions in, 58 Anny body params out. A small MLP trained with a physics-aware loss, runs in milliseconds on CPU. Height accuracy 0.3 cm, mass 0.3 kg, BWH 3-4 cm — better than our photo pipeline on circumferences, without needing a photo. That's the questionnaire path I promised in the previous post.

The whole story begins with one observation: that height and weight can estimate body measurements quite accurately (Bartol's regression). The original idea isn't as accurate as it claims, but after a bit of tuning the results are quite promising.

The questionnaire addresses privacy, speed and cost concerns. Plus we skip the phase where the user spends 5 minutes scrolling for perfect-light, tight-clothes photos. Additionally, it helped us find and address a mass calculation inconsistency in the Anny model, and model the "muscle weighs more" problem.

Backstory

When we want to create a digital twin, we naturally think of HMR photo reconstruction. This route has a lot of ups and downs. During one "down", the research agent brought up this:

The most striking finding is from Bartol et al. (2022): a simple linear regression from just height + weight (no photo!) predicts 15 body measurements at 1.2-1.6 cm MAE. Many deep learning methods with photos don't even beat this.

At first I quickly calculated the number of combinations and the number of people, and thought it didn't make sense. But then, after comparing friends, I thought there might be something to it.

It's not just height and weight

Intuitively we all know that you can be a man with 178cm and 80kg with a belly, or from the gym. So it wasn't a surprise that we came up with these two bodies:

They are a bit cartoonish and pushed to extremes, but clearly show the problem.

Next obvious thing to do: the weights from the original regression are public, so we downloaded them and ran them on our validation set. Raw BWH MAE landed around 9-11 cm, up to ~25 cm at the worst. Some of that is measurement-convention mismatch — Bartol slices SMPL meshes at fixed landmark vertex indices (e.g. a "lower belly point" vertex for waist), while we follow ISO 8559-1 anatomical rules (waist at the narrowest point, bust at the breast prominence). Same plane-sweep math, different slice location — bust alone is off by ~10 cm systematically. After correcting for that bias, BWH MAE drops to ~7 cm. Still above Bartol's own ~3 cm BWH MAE on their data (paper Table 6 on BODY-fit+W: chest 3.0, waist 4.0, hips 2.2 cm), but that's not really the story — we're evaluating on a different population. Anny has explicit body-shape variation at fixed h/w that an h+w regression fundamentally can't see, regardless of how well it was trained. And I'm not saying this to undermine the original research, but rather the opposite — it's a good spark for this project.

What else carries signal

As the previous example showed, the same height and weight can produce different results, but we can differentiate them via more params. Some obvious ones:

Build/belly — muscular athletic or soft with a belly. Common knowledge is that muscle weighs more than fat, so a fat-heavy body will have more volume (and thus different measurements) than an athletic one.
Shape — there are people with wider hips, while others have a bigger bust. This difference is part of body shape, which tells us how the weight is distributed. The problem I will describe later is that people don't know their shape.
Cupsize — relevant for women, quite an obvious feature.

These are the features we naturally think of. To make sure they carry enough signal and aren't too noisy, we ran the numbers against the dataset. The method is simple — bucket people by height (±1 cm), weight (±1 kg), and shape, then measure how much waist variation is left as each additional feature is locked in.

Features locked	Waist std inside bucket	Theoretical best MAE
h, w, shape, build	2.25 cm	~1.8 cm
+ belly	2.08 cm	~1.7 cm
+ cup, gender	1.30 cm	~1.0 cm

Smaller std inside a bucket means the features explain more of what's going on. Build does most of the work — on its own, it moves the waist by about 1.8 cm at fixed h/w/shape. Belly adds another ~0.2 cm. Cup plus gender knocks 0.8 cm more off. Each feature earns its place.

Side-finding: build signal is strongest on inverted-triangle shapes — 8 of the top 10 high-signal buckets are inverted triangle. The narrow waist amplifies relative fat changes; shapes with wider baseline waists (apple, rectangle) show smaller absolute shifts.

At the extremes: same height, same weight, different body shape — bust can differ by 25 cm, hips by 30. Six clothing sizes at identical h/w. A height+weight regression simply can't see this — the signal isn't there in the input.

And there's a floor. Even with every questionnaire input locked, about 1.3 cm of waist variation stays, coming from ~50 continuous blendshape params that don't map to any multiple-choice question. So the theoretical best a form can ever do is ~1 cm waist MAE.

Model & dataset

The previous article describes the available body models. After the initial phase we operate solely on the Anny model, heavily leveraging its explainable features. Thanks to it, tasks like generating a huge dataset of people are easy.

The dataset we generate and use for distribution analysis, training and validation contains a couple of tens of thousands of synthetically generated bodies, validated against a broad population distribution. For each body in the dataset we determine the described features using the body measurements.

Anny is full of blendshapes, but for the virtual try-on, not all of them matter. We carefully selected 58 of them which matter here. The 8 questionnaire questions one-hot encode into 20 features, so the space is 20 input x 58 output params. We actually train two such models — one per gender. Male and female bodies differ enough that a shared network wastes capacity reconciling them.

Training a small MLP

The original paper used simple regression to predict the params, so that was the obvious starting point. On our synthetic dataset it gets around 2.5 cm BWH MAE — decent. The problem was mass: Ridge predicts each of the 58 params independently, but mass depends on many of them working together (torso width × depth × height, hip volume, limb fat...). L2 regularization shrinks them all toward zero, and the small errors compound. Result: 3.9 kg mean mass error, 9.7 kg at p95, up to 16 kg for heavy bodies — even after output standardization and tuned regularization (the best Ridge we could build on this dataset).

So we moved to an MLP. Two hidden layers, 256 units each, ReLU, a bit of dropout. Tiny — about 85 KB of weights, trains on a laptop in ~60 minutes per gender. Nothing fancy architecturally.

The loss is the interesting part. The user already gives us their exact height and weight — those need to match precisely in the generated body, not just be close on average. Standard MSE on the 58 params doesn't care about that and treats every param equally. And mass isn't a param at all, instead it's a consequence of volume, which comes out of the body model's forward pass.

So we include the forward pass in the loss. The MLP's 58 outputs go through Anny — blendshapes, vertices, volume — and we compare the resulting mass and height against the user-provided targets. Gradients from a mass error flow back through all the volume-related params together. Ridge couldn't do that because each output was solved independently; the MLP can, because the hidden layers couple them. This is what closes the mass gap.

The dotted arrow is the whole trick. Anny's forward is surprisingly autograd-friendly — blendshapes are linear, volume is a sum of signed tetrahedra. No custom backward, standard PyTorch ops end to end. Measurements like waist are differentiable too, but that's a whole story for the measurements tuning post.

On top of params, mass, and height, we added a waist term. That's it — bust and hip looked tempting, but in practice they introduced more noise than signal, and waist carries the most body-shape signal anyway.

Honest results

Height is essentially solved — 0.3 cm mean MAE on both genders. Mass lands right there too, around 0.3 kg mean (p95 under 1 kg). Circumferences are harder; BWH sits at 3-4 cm, with waist the weakest.

Averages lie about the tails, and a person who gets a 15 cm bust error doesn't care that the mean is 4 cm. So we tracked p95 (5% of predictions worse than this) and max alongside the mean, and actively optimized for them — barrier terms in the loss that specifically penalize outliers on height and mass.

	Male	Female
Height — mean / p95 / max	0.3 / 0.8 / 3.9 cm	0.3 / 0.8 / 4.6 cm
Mass — mean / p95 / max	0.5 / 1.2 / 3.3 kg	0.4 / 1.0 / 2.1 kg
Bust — mean / p95 / max	4.9 / 11.9 / 18.4 cm	2.7 / 6.6 / 11.0 cm
Waist — mean / p95 / max	4.3 / 10.0 / 20.7 cm	4.0 / 9.0 / 13.0 cm
Hips — mean / p95 / max	3.3 / 8.4 / 14.8 cm	3.3 / 8.0 / 13.3 cm

For comparison: on the same validation set, Bartol's h+w regression sits at ~7 cm BWH MAE (bias-corrected, as above). Our photo-based pipeline from the previous post gets 5-8 cm BWH MAE on real people. The questionnaire beats both — without needing a photo.

The numbers above are from synthetic Anny bodies — same model we train against. We also validated on a small group of real people measured by hand with tape. First results there were ugly — mass off by several kg even when circumferences were close. That pushed us to fix how mass is calculated in the first place (next section). After those fixes landed, real-people numbers line up with the synthetic ones on the measurements we tested.

Worth remembering: it's a statistical model, so what you get is the population-average body for your inputs, not your exact body. Everyone is different — but it's a very good base for measurements tuning, which then gets <1 cm error. I'm planning the next article on that.

Lessons learned

The most striking was the real-world inconsistency in Anny's anthropometry module. To calculate the mass, the approach is simple: calculate the volume of the body and multiply by body density. Primary school math. But Anny used 980 kg/m³ density, which is indeed the value you get after typing "average person density" into a web search. However, it's more subtle than it initially seems.

The first thing is that the value is different for men and women. The second is that "body density" isn't one number — it depends on the convention. Whole-body density (lungs included, ~985 kg/m³) is what you'd measure by submerging someone in a tank — just below water, which is why humans barely float. Tissue-only density (~1030–1080 kg/m³) is what hydrostatic weighing reports after subtracting residual lung air, and it's what fat-vs-muscle composition actually gives you. The 980 kg/m³ figure sits between these two conventions — close to whole-body but not quite. The third is that "muscle weighs more". The per-gender tissue-only medians we ended up using (male ~1059, female ~1031 kg/m³) live in clad-body, derived from body-fat percentage via the Siri two-component model. Empirically the correction works — lean bodies gain mass, soft bodies lose it — though the absolute scale still rests on the 980 calibration being roughly right for the "average" subject.

Density isn't unique for all people, and muscle has a different density than fat. Not much, but it can change the mass by 2–3 kg. To respond to that, clad-body estimates body fat using the Navy formula.

The second finding (which will be described more in the measurements tuning post) is that each cm matters. A 2 cm shift across all torso circumferences (bust, waist, hips) moves the computed mass by ~2 kg!

All the above summed together had a big impact on predicting incorrect mass. Once we adjusted density via body-fat estimation, athletic bodies gained up to 1 kg and soft bodies lost up to 2. Small in absolute terms, but it's the difference between matching the scale and being systematically off for anyone not shaped like the average.

Another thing that had a big impact on mass accuracy: the ancestry feature. For a while, mass MAE refused to drop below 3 kg no matter what we tried. The error distribution looked bimodal which seemed to be suspicious.

Turns out Anny has three race blendshapes (african, asian, caucasian) that subtly affect body proportions. In training we sampled them randomly, but at inference we hardcoded them to a uniform mix — the user hadn't told us anything about ancestry. So the MLP was trained on one distribution and predicted under another: a 3 kg noise floor we'd built ourselves. The fix is simple: add ancestry to the questionnaire, four categories mapped to fixed blendshape values. Training and inference now use the same numbers for the same label. Mass MAE dropped from ~3 kg to under 0.5. Some odd height errors on a few bodies disappeared too.

The broader lesson (like last time!) is that spending time on the dataset and the evaluation harness paid off more than spending time on the model. A bigger network wouldn't have caught the bugs. Running the pipeline on real people is what exposed the mass calculation issue, not synthetic eval. The MLP itself with 2 layers and 256 units is boring. The work that mattered was upstream and downstream of it.

Is it the final form?

Definitely not. As I mentioned, people are struggling more than anticipated when choosing the body shape. What's missing in the current form but should be addressed are long/short arms/legs, which also impact how mass is distributed. SHAPY's Attributes-to-Shape (A2S) goes further in the same direction — a body described with a whole set of attributes like "muscular", "pear-shaped", "long torso", "broad shoulders". Plenty of ideas to borrow from there.

A better idea that is going around us is to make it more interactive and exactly features based. Instead of asking "what's your body shape?", show a body the user can adjust directly — bust vs hips, arms vs legs, shoulder width — and let them tune what they see. Probably where we're heading next.

Try it

Live in the PWA at clad.you/size-aware/size-me — eight questions, 3D body in under a second. Also exposed as a REST API at api.clad.you — questionnaire or photo in, body params + measurements out. Free for now while we work out whether anyone actually wants this; key at clad.you/developers.

Originally published on clad.you.

A 3D Body Scan for Nine Cents — Without SMPL

Arkadiusz — Fri, 27 Mar 2026 16:51:21 +0000

Everyone in computer vision uses SMPL for human body reconstruction. There is one problem: SMPL has a non-commercial license that blocks production use, unless you pay a lot. For us — a tiny, two-person startup — it was out of reach.

Instead we built a fully commercial pipeline using Naver's Anny and Meta's MHR — both with permissive licenses. Both appeared in November 2025. The whole pipeline is cheap enough to become consumer-grade — and by consumer-grade I mean not just cost, but also time and UX. A normal person shouldn't need instructions, a special photo, or more than a minute. Did we solve the holy grail of fashion? Here is how it works, what it costs, and where it breaks down.

The SMPL licensing trap

SMPL became the standard for body models. Almost all the research about Human Mesh Recovery (HMR) you read uses it under the hood. However, the license is very clear — while its source is available, it cannot be used for commercial purposes unless you get a commercial sub-license from Meshcapade — the exclusive sub-licensor appointed by Max Planck. Pricing has never been public; you email sales and negotiate. Recently Epic Games acquired Meshcapade (February 2026, closing April), so the future of SMPL commercial licensing is unclear. Either way — not an option for a two-person startup.

The SMPL license also prohibits training neural networks for commercial use. Given how many commercial products rely on HMR models trained with SMPL supervision, the licensing gap in this space is wider than most people realize.

You need to pay when you create a 3D mesh from params. And here is the biggest trap: almost all work on body models at some point has SMPL params and performs a non-commercial conversion of them into a 3D mesh. HMR is a good example of this: most of the NNs do not predict a mesh but rather SMPL params, and only then create a mesh.

What's more, it's very often not clear that SMPL is used there. Even things you would never suspect, like Naver's MultiHMR — which does have Anny-based checkpoints that skip SMPL entirely, but the model itself is non-commercial. So close, yet so far. The list goes on: HMR 2.0, TokenHMR, SMPLer-X, ROMP, HybrIK, WHAM — their code may be MIT or Apache 2.0, but they all output SMPL params. To get a mesh from those params you need the SMPL model files, and those are non-commercial.

The landscape (March 2026)

Two body models changed everything in late 2025. Both commercially permissive.

Anny from Naver Labs Europe. The big deal: its 11 shape params actually mean something — gender, age, weight, height, muscle, cup size. When a customer asks "will this fit my hips?" you need params like these, not abstract PCA coefficients. On top of that it has 256 local change blendshapes for things like waist circumference or breast volume. 14K vertices, 163 bones, fully differentiable. Great work from the HUMANS team.

MHR from Meta. Different philosophy: 45 abstract shape coefficients. Shape param #37 means nothing to a human. But the mesh quality is excellent — 18K vertices, 127 joints, 7 LOD levels. And critically: Meta's SAM 3D Body outputs MHR params directly — under Meta's SAM License, which is commercially permissive but not Apache 2.0. Right now it's the best single-image HMR you can use commercially.

We use both. MHR because it's the only body model with a commercial HMR path (SAM 3D Body). Anny for semantic understanding and fit guidelines. The bridge between them is our own MHR→Anny conversion — more on that below.

	SMPL/SMPL-X	MHR	Anny
License	Non-commercial (Meshcapade for commercial)	Apache 2.0	Apache 2.0
Vertices	6,890 / 10,475	18,439	13,718
Shape params	10 / 10 (abstract PCA)	45 (abstract)	11 semantic + 256 local
Param meaning	None	None	Human-readable
HMR support	Almost everything	SAM 3D Body	MultiHMR (non-commercial)
Production-ready	Behind paywall	Yes	Yes

How it works

We decided to build the whole pipeline around Anny as the target body model. The semantic params are the reason — they don't just describe the body, they let you manipulate it meaningfully. Want to tune a specific region like waist or bust? Change one param. Want to predict how a body looks two kilos lighter, or two months into pregnancy? You can do that too. No other body model gives you this.

The pipeline has two input paths that converge into the same downstream flow: body mesh → measurements → measurement tuning → physics draping.

The photo path starts with SAM 3D Body. Given the SMPL trap, we had a hard time finding a commercially usable HMR model. We're optimistic about Naver's MultiHMR on Anny-only params, but it's also non-commercial. Fortunately, in late 2025, Meta published SAM 3D Body — single photo in, MHR body params out, 12GB VRAM. A sharp eye can spot one thing though: SAM 3D outputs MHR params while Anny is a different model entirely. So the photo path needs a bridge — we built our own MHR→Anny converter since Anny's built-in regressor gave quite poor results.

The questionnaire path is different. We trained a model that predicts Anny body params directly from 8 inputs — height, weight, gender, body shape, and a few more. No photo, no GPU, <1s inference. More on the questionnaire in the next post.

Both paths converge at measurement tuning. This is the key insight of the whole pipeline. No single method — photo or questionnaire — nails body measurements on the first attempt. The goal isn't to get 1 cm MAE from reconstruction alone. It's to get a close enough body model that you can then tune to less than 1 cm. Given a reconstructed body and a few known measurements, tuning optimizes the body params to close the gap.

Once you have a tuned body mesh, the garment gets draped on it with physics. But that's a different post.

Here's what the output looks like — an Anny body with circumference measurements:

What it costs to run

Our goal was simple: do it at almost no cost, as we don't have many funds. The most expensive thing is obviously the GPU. We tried to avoid it as much as possible, but in some cases we had to use it. Here are real numbers from our GCP billing — not estimates, not projections.

The photo body reconstruction step (SAM 3D Body + MHR→Anny conversion) runs as a Cloud Run job with an L4 GPU at ~$0.67/hour. We ran it 30 times during development — about 61 minutes of total GPU time, ~$2.60 total (GPU + CPU + memory). That's about $0.09 per run. And honestly — it's not optimized. Each successful job takes about 5 minutes, but the actual compute (SAM 3D inference + fitting) is around 80 seconds. The rest is loading models — SAM 3D weights (~66s) and Anny blendshapes (~2 min) get re-downloaded on every cold start. With proper caching that should bring the total under 2 minutes and the cost to ~$0.03 per run.

And keep in mind — body reconstruction is a one-time thing per user. You build their body model once, then every try-on reuses it.

The questionnaire path is <1s, no GPU at all. Physics draping takes about 1 minute on an L4 — so another ~$0.01 per try-on. But unlike body reconstruction, draping runs on every garment the user tries.

Important caveat on the tables below: our numbers are raw infrastructure cost — no margin, no API layer, no support. The third-party prices are commercial APIs that include all of that. Still, the difference is large enough to matter.

Body scan and measurement APIs:

Provider	Per-scan cost	What you get
3DLOOK	$2-5	Measurements from 2 photos
Meshcapade (SMPL commercial)	~€1-5	Body model (credit-based: 100-500 credits per avatar, €5/500 credits)
Bold Metrics	Contact sales	AI body data from questionnaire
Our pipeline, photo path (current, unoptimized)	~$0.09	3D body + measurements
Our pipeline, photo path (optimized)	~$0.03-0.04	Same
Our pipeline, questionnaire path (no GPU)	<$0.01	Same

Virtual try-on (diffusion, 2D only — no measurements, no fit info):

Provider	Per-image cost
FASHN	$0.049-0.075
IDM-VTON on Replicate	~$0.025
Google Vertex AI VTO (Imagen)	$0.06
Our pipeline (body recon + physics draping)	~$0.10-0.15

Honest results

The standard way to evaluate HMR models is MPJPE (mean per-joint position error) and PVE (per-vertex error) — both in millimeters. SAM 3D Body scores well on these: ~55mm MPJPE on 3DPW, ~61mm PVE on RICH. But for size-aware VTO none of that matters. What matters is circumference MAE in centimeters — waist, bust, hips. That's what tells you if a garment will fit.

Recovering from a photo has one fundamental flaw — it's impossible to determine absolute measurements of the person. Even with one known measurement, pose, lighting, or camera lens can throw off the rest. A recent study on SAM 3D Body's anthropometric fidelity confirms what we see in practice — the model suffers from regression to the mean. The encoder discards fine geometric cues and the low-dimensional shape space can't capture variation like muscle atrophy or pregnancy. Every body gets smoothed toward a generic average. This is architectural, not tunable.

The MHR→Anny conversion itself works well: ~10mm mean nearest-neighbor surface error across the body. The accuracy gap comes from the upstream photo reconstruction, not from the conversion step.

We're still early on evaluation. We've tested the full photo pipeline on a couple of real people measured by hand with tape — not a large dataset, not final numbers. From these first results the BWH (bust-waist-hips) MAE is roughly 5-8 cm. Some measurements are much better than others — waist tends to be very close, bust is the weakest point.

For context: single-image body reconstruction from in-the-wild photos typically achieves 3-12 cm MAE on circumferences (survey). Controlled environments with synthetic data get 1.6-3 cm. Professional 3D body scanners get ±0.5-1.6 cm. We're somewhere in the middle — and still evaluating.

The key insight: no single method nails all measurements on the first attempt. The goal isn't 1 cm MAE from reconstruction alone. It's to get a close enough body that measurement tuning can close the gap — and that's what we do.

What we'd do differently

If starting over, we'd invest in reliable body measurement infrastructure from day one. Sounds obvious, but neither MHR nor Anny ship with a measurement library. You get a mesh with 14-18K vertices and no standard way to extract waist circumference from it. We had to build our own — ISO 8559-1 plane-sweep circumferences, landmark detection, contour separation. That measurement layer ended up being foundational for everything downstream: accuracy evaluation, measurement tuning, fit guidelines. Without it you're guessing. We're planning to open-source this as clad-body — because anyone working with MHR or Anny will hit the same wall.

The other surprise was UX. When you tell someone to upload a photo for body reconstruction (tight clothes, one person, good lighting), scrolling through their gallery to find a suitable picture rarely takes less than 3 minutes — and by then they've lost focus. The questionnaire has its own problem: people don't always know their body shape or belly type. Both paths have friction we didn't expect. We're testing both on real people to see which one actually works for consumers.

Validation is harder than we thought. There aren't good datasets for this use case. Model agency measurements can't be fully trusted. The only reliable method is what we're doing now: measuring real people by hand and comparing.

Conclusions

A commercial 3D body pipeline without SMPL is possible today — for nine cents per body. The first results show that cm-perfect measurements are within reach once you combine reconstruction with tuning. We're actively evaluating and developing this — the pipeline will likely look quite different soon as we improve accuracy and speed. You can try it yourself. More on that in the next posts.