We spun up the first Mirage container. It looked cocky in docker ps
, even served a clean 200. We thought: hell yeah, prod vibes.
Then someone hit the DB. Brick wall. Container “running,” but the service was a wax statue. Lesson #1: process alive ≠ system alive.
Day 0 — Naive boot
What we thought:
- Simple ASGI app, SQLite, Alembic migrations.
- Compose to glue it, nothing fancy.
What happened:
- DB init blocked startup. Import-time schema creation froze everything.
- Logs filled with
OperationalError
. - Container alive, endpoints dead.
Patch:
- Lazy DB init, async startup, short timeouts.
- Service now boots instantly, even if DB ghosts us.
Caveat: we learned fast—DB fallback isn’t free. Our in-memory SQLite behaves nothing like Postgres. Attackers probing concurrency, types, or isolation will sniff the toy. For now, it’s fine for dev loops, but TODO: move to WAL-backed SQLite with jitter, or a lightweight Postgres sidecar, so Mirage doesn’t leak “cheap decoy.”
Persistence discipline
What we tried:
- SQLite file for dev, Postgres later.
Pain:
- File collisions on container restart.
- Migrations stomping each other, state corruption.
Patch:
- Async SQLAlchemy for proper async flow.
- Alembic migrations gated behind
APATE_USE_ALEMBIC
. - Rudimentary file lock to avoid racey migrations.
Caveat: file lock is a band-aid. Real fix: advisory locks in Postgres (later), or one-shot init jobs in Compose/K8s. Note for future self: never trust humans to “just remember the flag.”
Surface tightening
- Favicon probes spammed 404 logs → return 204.
- Added env-driven CORS.
- Sensitive routes gated.
TODOs we spotted mid-flight:
- Handle HEAD/OPTIONS consistently.
- Add robots.txt + a boring 1x1 favicon instead of “nothing.”
- Rate limiting + small randomized latency on select routes, so we don’t look “too perfect.”
Observability, but noisy on purpose
Before: dead silent when things broke.
Now:
-
/metrics
with Prometheus counters (requests, latencies, DB fallback count). - Webhook alerts with dedup/backoff.
- Metrics for alert failures themselves.
Self-own: right now, degraded mode leaks via the public health probe (it literally says “fallback engaged”). That’s a fingerprint. Fix queued: external health always boringly “OK,” internal metrics carry the truth.
Testing loop
What we had: slow, flaky tests with external deps.
What we have: 53 async tests run in ~0.5s in-process with ASGI.
What we still need: property-based tests for degraded paths (DB flaps mid-request, migration lock races).
Early wins
- Deterministic startup.
- Sub-second tests.
- Real metrics instead of vibes.
TODOs and risks we logged for later
- DB realism: WAL-backed SQLite + jitter, or Postgres-lite sidecar.
- Health signals: split readiness/liveness; external = opaque, internal = honest.
- Migrations: kill the file lock, replace with advisory locks or init jobs.
- Surface polish: HEAD/OPTIONS, robots.txt, rate limiting, fake latency.
- Observability polish: structured logs with request IDs, Prometheus histograms + exemplars, maybe traces.
- CI gates: ruff, mypy, bandit, pip-audit, safety; pinned deps; multi-arch builds; distroless images.
- Config: doc all env vars (DB URL, fallback toggles, alembic flag, origins, alerts).
Closing thought
Honeypots don’t have to be perfect, but they do have to be believable. Right now Mirage boots, degrades, and talks enough to fool a casual scan. But any serious fingerprint will still smell the shortcuts.
The plan: brick it less, make failure louder, make the illusion tighter.
Prod or bait? If they have to stop and wonder, Mirage did its job.
You can check the progress of my work in Github Apate
Or just myself at Rizzy1857
Any and every comment or contribution to make it better will be appreciated.
Top comments (0)