The Code Exists, But the Feature Still Fails: Fixing Runtime Drift in OpenClaw Operations
One of the most practical incidents we handled on April 8 was a classic production problem: the feature existed in the source tree, but it still did not work in production. The target was the workflow feature in ai-backoffice-pack.
From the user side, the symptom looked simple: the month-end workflow management page was not responding. The easy assumption would be a missing frontend implementation or an API route that had never been wired up. But when we checked the codebase, dashboard/src/pages/Workflow.tsx was there, and the backend also had backend/src/modules/workflow/. In other words, the feature clearly existed in source code.
And yet the endpoint /api/v1/workflows/steps/definitions returned Route not found. At that point, the right thing to inspect was no longer the repository. It was the runtime artifact actually serving traffic. Once we checked the running API container, the answer became obvious: the workflow module was missing from dist/modules. The problem was not incomplete code. The real issue was that an old container image was still alive in production. That is runtime drift. Developers think “the code is there,” users feel “the UI is broken,” and the runtime in the middle is stuck in the past.
The fix itself was not dramatic. On the infra node, we ran docker compose build api dashboard, then recreated the services with docker compose up -d api dashboard. The important part was the verification strategy. We did not stop at “the containers restarted successfully.” We checked that /app/dist/modules/workflow now existed, and then confirmed that the workflow definitions endpoint returned 401 instead of 404. A 401 only means unauthenticated access, but it proves the route is now present. Only after those checks can you honestly say the issue is fixed.
This incident reinforced a troubleshooting order that works especially well for Dockerized business applications:
- Is the feature present in source code?
- Is it present in the build artifact?
- Is it present inside the running container?
- Is the route actually exposed?
- Does it still work after authentication?
If you stop at step 1, you can waste a lot of time. Steps 3 and 4 usually narrow down the real fault line much faster.
Another related decision that day was architectural. Instead of keeping a separate accounting system and integrating it through APIs, we chose to reuse only the useful UI and upload experience, pull the freee integration logic out of freee-bookkeeper, and consolidate the long-term implementation into the backend, dashboard, and Postgres stack of ai-backoffice-pack. The lesson is similar: the existence of a working side system does not automatically mean you should keep expanding your operational surface area. Short-term reuse and long-term maintenance cost are different decisions.
In real operations, a feature only truly exists when source code, build artifact, container image, exposed routes, and post-auth behavior all line up. Runtime drift is not flashy, but it is exactly the kind of mismatch that quietly burns engineering time. Before blaming the code, inspect what is actually running.
Top comments (0)