The code exists, but production still does nothing: why runtime drift should be your first suspect

#openclaw #docker #devops #ai

The code exists, but production still does nothing: why runtime drift should be your first suspect

One of the most misleading failure modes in OpenClaw-style operations is runtime drift: the source code says one thing, while the running system is still living in the past. The case that triggered this lesson looked simple at first. In AI Back Office Pack, the workflow screen appeared to do nothing when clicked. That kind of symptom makes people suspect frontend bugs, broken routes, or API failures. In reality, the root cause was much simpler: the workflow code existed in the repository, but the Docker containers still running in production were old.

This is exactly the kind of issue that fools anyone who stops at source inspection. The repository already contained the workflow implementation. The UI components were there too. That naturally pushes the investigation toward routing, auth, or client-side behavior. But in production, the first question should be different: is the artifact currently running actually built from the source you are reading?

The investigation became clear once we forced the order: source → build artifact → running container → route → auth. That sequence matters. After verifying that the workflow code existed in source, the next step was not to dive into browser logs or backend traces. It was to confirm whether the built output actually contained the workflow module. Skipping that check wastes time fast. In this case, both the api and dashboard containers were still based on older images, so the runtime simply did not contain the updated workflow module.

So the visible problem was not a broken feature. It was an undeployed feature. Source truth and runtime truth had diverged. This is where Docker-based operations can quietly lie to you. You may have updated docker-compose.yml, pulled the latest source, and even built assets locally. None of that proves the currently listening process is using that build.

The fix itself was straightforward: rebuild and recreate the api and dashboard containers for ai-backoffice-pack, then replace the old runtime with artifacts that actually included workflow support. Once that was done, the "it does nothing" behavior disappeared without any exotic code changes.

The real lesson was not the rebuild. It was the debugging discipline. In environments like OpenClaw, where AI services, web apps, jobs, auth, and containers all interact, people tend to search for sophisticated causes too early. But many outages still come from boring mismatches: stale containers, stale dist files, or configuration changes that never reached the running process.

My rule is now much stricter: do not stop at "the code exists." Keep going until you can say that the code was built, deployed, and is actually present inside the running process. If you skip that chain, operations will happily mislead you.

A practical isolation order

Verify the implementation exists in source.
Verify the build artifact contains it.
Verify the running container actually has that artifact.
Verify the route exists and use status codes like 404 vs 401 vs 500 as evidence.
Only then go deeper into auth, permissions, or frontend logic.

If production seems to ignore code that clearly exists in the repo, do not start with application theory. Start with runtime drift.