I run PixelDrive, an API + editor that turns templates
into branded images. You design once, mark layers as variables, then POST data
and get a PNG. The most common request was localization: teams were making the
same graphic in 10 languages by hand.
Here's how I built translation into the render pipeline, and why I self-hosted
it instead of calling a cloud translation API.
The shape of the problem
Translation here is not "translate documents." It's short, repetitive design
copy (headlines, CTAs, labels) rendered onto images, on a server that's
already CPU-bound doing the actual rendering. Two facts drove every decision:
- The text is tiny and cacheable. Translate
"Spring Sale" -> esonce, store it, and every future use is a sub-millisecond cache hit. - The box has no spare CPU and no GPU. Anything I run competes with the image renderer.
Why self-host
With caching, the per-word cost of a paid API basically disappears, so cost was
not the deciding factor. The deciding factors were control and not wanting a
network hop in the render path. The trade-off is you have to pick a model that
is small, fast on CPU, and good on short copy.
The sweet spot:
-
Model:
facebook/nllb-200-distilled-1.3B(200 languages, purpose-built for translation). -
Runtime: CTranslate2 with int8
quantization. This is the key piece. It shrinks the model to ~1.3 GB and runs
CPU inference fast. Do not run raw
transformerson CPU for this.
A multi-stage Docker build does the conversion once, so the runtime image
carries no torch:
# stage 1: convert + quantize (needs torch, thrown away)
FROM python:3.11-slim AS converter
RUN pip install ctranslate2 "transformers[sentencepiece]" torch \
--extra-index-url https://download.pytorch.org/whl/cpu
RUN ct2-transformers-converter \
--model facebook/nllb-200-distilled-1.3B \
--quantization int8 --output_dir /models/nllb-ct2
# stage 2: runtime (ctranslate2 + tokenizer only)
FROM python:3.11-slim
RUN pip install ctranslate2 "transformers[sentencepiece]" fastapi "uvicorn[standard]"
COPY --from=converter /models /models
The service itself is a tiny FastAPI app with /translate and
/translate_batch. NLLB uses FLORES-200 codes (spa_Latn), so I map ISO codes
(es) and normalize typographic punctuation (em dashes, curly quotes) that the
tokenizer would otherwise drop as <unk>.
Wiring it into rendering
The render service is the only place that needs translation, and it already has
a Redis/Kvrocks cache. The flow:
- A text field can carry a language:
{ "headline": { "text": "Spring Sale", "lang": "es" } }. - The render cache key already includes the full payload (so
esandfrare different cache entries). On a cache miss only, translate. -
translatePayload()collects every text value with alang, checks the cache (tr:<lang>:<sha(text)>), batches the misses to the translation service, writes results back to the cache, and swaps the text in. - The harness draws the translated text. The same cache key is used by the editor, so a translation done in the editor is reused at render time and vice versa.
Two operational details that mattered:
- The translation container is CPU-capped (a couple of cores). Even a burst of cache misses can't starve the renderer.
- It's best-effort: if the service is unavailable, it falls back to the original text. Translation never breaks a render.
The result
POST /v1/render
{
"templateId": "...",
"payload": {
"headline": { "text": "Welcome to our spring sale", "lang": "es" }
}
}
renders "Bienvenido a nuestra venta de primavera" onto the image. In the
editor there's a one-click "translate the whole template" button with a
searchable 70-language picker, and the bulk-CSV flow supports a per-row lang
column so a single upload can render a batch across markets.
Because every (text, language) pair is cached forever, the model only runs on
genuinely new strings, which for marketing copy is rare after warmup. A small
CPU model plus aggressive caching turned out to be a better fit than a cloud API
for this particular shape of problem.
If you want to see it in action: pixeldrive.pro.
Happy to answer questions about the setup in the comments.
Top comments (0)