I built a production ML inference API with FastAPI, Celery and Docker — here's the full architecture

sada — Sun, 21 Jun 2026 03:51:35 +0000

Para 1 — The problem
"Most ML tutorials end at model.fit().
Getting a model into production is a completely
different skill. Here's how I built a real async
inference microservice."

Para 2 — Architecture diagram
Paste the ASCII diagram from your ARCHITECTURE.md

Para 3 — The three components
FastAPI handles HTTP (why async matters)
Celery handles background work (why not just threads)
Redis handles both queue and results (why one service)

Para 4 — Key code snippet (predict_async endpoint)
Show 15 lines of code — the async endpoint that
dispatches to Celery and returns task_id immediately

Para 5 — Testing strategy
"I used in-memory Celery eager mode so tests
run without Redis. Here's the conftest pattern."
Show 10 lines of conftest.py

Para 6 — The result
Screenshot of the UI dashboard
Screenshot of 47 tests passing

Closing line:
"If you want the full source code with Docker,
CI pipeline, Postman collection and deployment
guide, I packaged it here: [Gumroad link]"

DEV Community: sada

I built a production ML inference API with FastAPI, Celery and Docker — here's the full architecture