z4j: a self-hosted control plane for Python task queues

#z4j #celery #rq #django

z4j is an open-source dashboard for Python background-job systems. It connects to the task queue (or queues) running in production and gives operators a single place to observe, retry, schedule, and audit the jobs flowing through them

The product page is at z4j.com. Source lives at github.com/z4jdev/z4j and the umbrella package on PyPI is z4j. Documentation is at z4j.dev, and a live demo runs at demo.z4j.dev

The problem it addresses

Most Python applications that run background work for long enough end up with more than one queue technology under the hood. A consolidation gets discussed every other quarter and rarely happens. There is usually a library or service boundary that pre-dates the consolidation effort, and rewriting the tasks across that boundary is expensive enough that the second queue stays

The operational cost of that fragmentation is real: separate dashboards (or no dashboard at all for some engines), separate retry mechanisms, separate audit trails, separate scheduling stories. Incident response begins with figuring out which system to inspect before any actual investigation can start

z4j is built around that fragmentation. Jobs from every supported engine appear in one list. The same retry action works against Celery and Dramatiq. Schedules from every backend are editable in the same form. The audit log records every action across every engine in a single HMAC-chained sequence

Six engine adapters are supported today: Celery, RQ, Dramatiq, Huey, arq, and taskiq. Seven scheduler adapters cover APScheduler, Celery Beat, Huey periodic, RQ Scheduler, arq cron, taskiq scheduler, and z4j's own scheduler. Framework integrations exist for Django, Flask, and FastAPI. The complete public PyPI surface is nineteen packages, each one its own thin install

Architecture

The system has two halves.

The first half is the central control service: a FastAPI backend paired with a React dashboard (TanStack Start v1, React 19.2, TypeScript). Operators point a browser at it. State lives in Postgres for production, with SQLite supported for local development. The control service speaks an authenticated WebSocket protocol to every application it observes.

The second half is a small pip library installed into each application. It captures task lifecycle events, discovers the registered task graph at startup, and executes commands sent back over the WebSocket. Adapters are shipped as separate optional installs, so the surface is pay-as-you-go:

pip install z4j[django,celery]

The library always connects outward to the control service. Observed applications do not need to be reachable from the control service's network. That suits the common case where workers run in private subnets or behind NAT, and it removes a class of inbound-firewall configuration entirely.

A note on licensing because this comes up early. The control service (the z4j distribution) is licensed under AGPL v3. Every adapter library (z4j-django, z4j-celery, and the rest) is Apache 2.0. The split is intentional: dashboard forks distributed for commercial resale stay open under AGPL, while the application code that imports a z4j adapter is not subject to AGPL terms. Application teams can deploy z4j without it touching their own license posture

What works well today

A single job list across every wired engine, with unified retry, cancel, and bulk-action controls.
Persistent history. Failures from last week are still inspectable; the dashboard is not bound to whatever the broker has in memory right now.
Schedule CRUD against any of the seven scheduler backends, with one shared form.
HMAC-chained audit log. Every action is verifiable as untampered after the fact, which closes a class of "did someone retry that on purpose" questions during postmortems.
Redaction by default on common credential patterns (token, password, secret, authorization and their conventional spellings).
Single auth surface. There is no per-engine login.
Outward-only connection model from observed applications. No reverse network path required from the control service to workers.

What is not solid yet

The product is honest about its rough edges and the docs flag them as well.

The schedule-edit form parses cron expressions correctly, but the ergonomics around timezone display and human-readable preview need another pass.
Test coverage is thinner against unusual Celery configurations (custom result backends, signed messages, multi-vhost RabbitMQ topologies) than against the mainstream Redis and RabbitMQ paths.
Dashboard mobile layout works but a couple of views (release notes, deep schedule detail) are not visually tight on narrow screens.
Production hardening documentation (TLS termination, secret rotation, Kubernetes deployment with a managed Postgres) exists but is less polished than the local install path.
Bulk operations have UI affordances for cancel and retry but not yet for re-prioritization across mixed engines.

Where it fits, and where it does not

z4j is most useful where there is more than one queue technology in production, or where a single auditable action and history surface across the queue infrastructure is a hard requirement. Compliance-driven environments fit naturally because of the chained audit log and the default redaction posture.

For a single-engine Celery deployment that is already happy with Flower, the migration argument is weaker. Flower is purpose-built for the single-Celery case and remains a reasonable choice there. The clearest differentiation shows up either when a second engine has joined the stack, or when retention and audit requirements push beyond what an engine's own admin UI provides.

Trying it

The fastest read is the demo at demo.z4j.dev. It is a static replay of a real install, seeded with jobs, schedules, alerts, and notification history. Every action button is wired to an in-memory mock, so the full flow can be clicked through without an install

For a local install with Postgres:

git clone https://github.com/z4jdev/z4j.git
cd z4j
docker compose -f docker-compose.postgres.yml up -d
docker compose -f docker-compose.postgres.yml logs -f z4j

The control service prints a one-time admin setup URL on first boot. Clicking it sets the admin password and opens the dashboard. The container is named z4j and the image published on Docker Hub is z4jdev/z4j:latest

Feedback and issues

Bug reports, feature gaps, and integration questions go to github.com/z4jdev/z4j/issues. Response target on issues is 24 hours, and that target is currently being met