Building a Photo Animation SPA: What Actually Works and What Kills Conversion

#webdev #performance #javascript #startup

When we started shipping Živá Fotka, I assumed the hard part was the AI that animates a still photo into a short clip. It wasn't. The hard part was the five seconds between a user dropping a JPEG and seeing something move on their screen. That's the window where most of our funnel dies, and the window where most of our engineering budget ended up going.

This is a write-up of what actually moved the needle and what turned out to be fashionable nonsense. Numbers are from production traffic across five domains (CZ, SK, PL, EN, DE) of the same product.

The funnel we had to defend

The pipeline is short on paper: user uploads a photo, we preprocess it in the browser, we hand it off to a rendering worker, we return a preview clip, and then we pitch them on a paid export. In reality that's three transitions and each one bleeds users.

Our numbers, averaged over about 60 days:

Upload started → upload finished: ~92%
Upload finished → preview visible: ~74%
Preview visible → paid export: ~6.8%

The middle step is where the money was. A 26% drop between "my photo is uploaded" and "I can see the animation" is catastrophic when the rest of the product is already working. Our instinct was to optimize the model. Our data said optimize the wait.

Client-side prep beat server-side prep, by a lot

The first rewrite moved image preparation (resize, EXIF normalization, format conversion) from the server to the browser. On paper this sounds worse: you're giving the client work to do on possibly weak hardware. In practice it was the single biggest preview-visibility win we shipped.

Why it worked:

Upload size dropped from ~4.2 MB average to ~480 KB after client-side resize to 1280px longest edge.
Average upload time on 4G dropped from 6.1s to 0.9s.
Server-side "prepare" step went from ~1.8s p50 to effectively zero.

The kicker is that the perceived-latency win is bigger than the raw numbers suggest. You can show progress locally before you even hit the network, so the user sees motion immediately.

A minimal version of the resize step that survived three rewrites:

async function prepare(file) {
  const bmp = await createImageBitmap(file, { imageOrientation: "from-image" });
  const max = 1280;
  const scale = Math.min(1, max / Math.max(bmp.width, bmp.height));
  const w = Math.round(bmp.width * scale);
  const h = Math.round(bmp.height * scale);

  const canvas = new OffscreenCanvas(w, h);
  canvas.getContext("2d").drawImage(bmp, 0, 0, w, h);

  return await canvas.convertToBlob({ type: "image/webp", quality: 0.85 });
}

Two things that caught us:

imageOrientation: "from-image" matters. Without it, iPhone portraits come in sideways and every user thinks your product is broken.
OffscreenCanvas is not universally available on older Android WebViews. Fall back to a regular canvas and a toBlob call, don't assume.

WebP for stills, WebM for clips, nothing fancy

We spent too long evaluating AVIF for stills and HEVC for clips. Both are technically better. Neither survived the Android fragmentation test we run before shipping.

Final decision:

Stills: WebP at quality 0.85. Meaningfully smaller than JPEG, decodes fast on every device we care about, supported everywhere except IE which we don't care about.
Clips: WebM (VP9) at ~1.2 Mbps for previews, ~3.5 Mbps for paid exports.
MP4 (H.264) as fallback, served only to UAs we know are broken for WebM.

AVIF saved us another 18–22% on still size but added ~140ms median decode time on mid-range Androids. For a product where the first frame has to appear within a second, that trade-off went the wrong way. We'll revisit in a year.

Caching that actually helped

There's a canonical advice stack for this kind of product: CDN, HTTP cache, service worker. We tried all three and only two paid off.

What worked:

Aggressive Cache-Control: public, max-age=31536000, immutable on the rendered clip URLs, because they're content-addressed by hash.
A small <link rel="preload"> for the preview-player JS bundle injected on upload, so by the time the preview is ready, the player is already parsed.

What didn't:

Service worker caching for the preview clips. Our users don't re-watch. The cache hit rate was under 4%, and the SW added complexity we regretted every time it misbehaved on iOS.
"Prefetch the next variant" heuristics. Users don't swipe through variants the way we assumed.

Takeaway: cache what you know will be hit, not what might be hit. We killed the SW a nothing got worse.

Mobile latency is not a myth, it's the product

Two findings that I keep repeating to anyone who will listen.

First, desktop performance numbers are a lie. Our p50 time-to-preview was 2.1s on desktop and 5.9s on mobile. The mobile number is the one that predicts conversion. When we optimized desktop we saw no movement in the funnel. When we optimized mobile, the whole thing shifted.

Second, the variance on mobile is where the damage is. Our p95 on mobile was 14.2s. That long tail is where people leave. Chasing the median gave us small wins. Chasing the p95 (mostly by moving work client-side and by pre-warming render workers) gave us a 4.1-point lift in preview-visibility rate, which is enormous for us.

Practical things that helped p95:

Render worker pre-warm. Keep a small pool always-ready. Cold starts on our GPU queue were the single worst outlier.
Skeleton preview. Show the user's own uploaded photo with a subtle animation overlay while the real clip renders. It's not honest but it works because it reassures the user something is happening, which buys you an extra 3–4 seconds of patience.
Kill request retries. We used to retry failed uploads twice. On flaky mobile connections that meant a user waited 30+ seconds for a failure they could have reacted to at 10s. Now we fail fast and invite a retry.

The multi-domain setup that wasn't worth half of what I thought

We run the same product on five domains: zivafotka.cz, zivafotka.sk, zywafotka.pl, alivephoto.online, and lebendigfoto.de. The idea was localized SEO. The cost was a month of config work and a permanent tax on ops.

What we learned:

One codebase, per-domain i18n, per-domain sitemap, hreflang links between them. This is the only configuration that stayed sane.
A single CDN origin with host-aware rewrites was enough. We experimented with per-domain origins and regretted it within a week.
hreflang is non-negotiable if you want Google to treat the sites as regional variants and not duplicates. We got this wrong for three weeks. Organic traffic dropped on every domain simultaneously. Fixing hreflang brought it back.
Per-domain GSC properties are annoying but not optional. One aggregated GA4 property across all five is fine and gives a cleaner cross-market funnel view.

Honest evaluation: the CZ domain still does the heavy lifting. SK and DE are justifying the overhead. PL and the .online English domain are marginal. If I did this again I would start with two domains, not five, and add the rest only when one of the first two saturated.

What I'd do differently

If I were to start this product over tomorrow with what I know now:

Client-side prep first. Always. Never put resize on the server.
Ship WebP + WebM + MP4 fallback from day one. Don't pretend AVIF is ready.
Instrument mobile p95 specifically, not an aggregated p50. Dashboards default to the wrong metric.
Pre-warm your workers. Cold-start tax is invisible in dev and brutal in prod.
Do not write a service worker until you have measured that users come back.
Pick two domains, not five. Hreflang is not free.

The frustrating thing about a photo animation SPA is that most of the learnings don't show up in the AI layer. They show up in the five seconds around it. That's where we spent the next sprint after launch, and it's where we got the conversion numbers that made the product stop being a demo and start being a business.

Jakub, builder @ Inithouse