Repo Truth ≠ Production Truth: A Container-First Troubleshooting Pattern for Runtime Drift
We ran into another operations problem that wastes a lot of time precisely because it looks deceptively simple: the implementation exists in the repository, but the actual UI and API behave as if the feature was never deployed. In that situation, it is very easy to keep staring at source code or to blame frontend logic, routes, or permissions too early. More precisely, the first thing to verify is not repo truth but live runtime truth—and in Docker environments, the shortest entry point to that is often container truth.
A Git repository can prove that somebody wrote the code. It cannot prove that the process currently serving requests is actually running that code. In Docker-based systems, those are often two different realities.
What the problem really was
The workflow page in AI Back Office Pack was behaving incorrectly. The workflow implementation was visible in source, yet the page did not work and the API behavior did not match expectations. From there, it is tempting to start digging through application logic. That is usually where time gets burned.
The more effective order was much simpler:
- confirm the live endpoint mapping: which proxy receives this domain/path right now, and which service/container it actually forwards to
- confirm the implementation exists in source
- confirm the build artifact contains the expected output
- confirm the running container actually includes that artifact
- then inspect route and reverse-proxy details
- finally inspect authentication responses and API semantics
The final conclusion was not "the code is missing." It was "the code is not what the container is running." The workflow module existed in the repository, but the live api and dashboard containers were still using old images and old artifacts. In other words, code truth and container truth had drifted apart. That is a textbook runtime drift incident.
Why I now prioritize container truth
In local development, source is often close enough to reality. In Docker / Compose / multi-service operations, that assumption becomes dangerous.
Users do not hit your Git repository. They hit:
- a specific image
- a specific container
- a specific running process
- a route that is actually active
That is why source truth is only one piece of evidence in production debugging. The final authority is the live runtime currently serving requests, and in Docker environments container truth is often the fastest route to verifying that runtime truth.
A debugging order that wastes less time
The next time I see symptoms like "the code exists but the page does nothing," "the repo has it but the API returns 404," or "we changed it but production did not move," I will use this order first.
0. Live endpoint mapping
Confirm which LB or reverse proxy currently receives the request, and which service/container it really lands on. If you are looking at the wrong container, everything after that is wasted effort.
1. Source
Verify the implementation really exists.
2. Artifact
Verify the built output, bundle, or dist files contain the feature. Source existing is not enough.
3. Container
Enter the running container and inspect the deployed files directly. In this case, the key question was whether /app/dist/modules/workflow actually existed inside the container.
4. Route / Proxy details
If the files are present, then verify the route is mounted and the reverse proxy is pointing at the correct upstream.
5. Auth / API semantics
Only after those layers are verified does it make sense to spend time interpreting 401, 403, or 500 responses.
The value of this order is simple: it answers whether all the evidence you are looking at refers to the same deployed reality. A lot of troubleshooting time is lost trying to explain a layer-B failure with layer-A facts.
404 versus 401 is not just a different error code
One especially useful signal in this case was the endpoint transition:
- before:
404 - after rebuilding and recreating containers:
401
That does not mean "it is still broken, just with another number." It means something structurally changed.
-
404strongly suggests something is still wrong at the route, artifact, mount, or proxy layer -
401means the endpoint is likely reachable now, and the next layer to inspect is authentication or permissions -
403suggests authentication may have succeeded but policy or authorization is still blocking access -
5xxpoints more toward the app, dependencies, config, or upstream failures
So even when the error is not gone yet, a shift in error semantics can prove that troubleshooting has advanced one layer forward.
The illusions Docker creates
Docker environments make several false assumptions feel natural:
- we did
git pull, so production must be current - the file changed, so the image must include it
- the image was rebuilt, so the running container must be new
- the container restarted, so the service must be running the latest code
None of those is guaranteed. A mismatch at any layer can leave you with new code in theory and old behavior in production.
For operators, the more important question is not merely "is the repository correct?" It is:
which live runtime is actually receiving this request path right now, and what exactly is inside that container?
That is the answer worth establishing first.
Takeaway
My default rule for this class of incident is now much clearer:
When source and production behavior disagree, suspect runtime drift. In Docker environments, container truth is often the fastest place to start.
Do not start by judging the code. Do not jump straight into application-layer explanations. First separate the layers:
- is source correct?
- is the artifact correct?
- is the container correct?
- is the route correct?
- what layer is the auth or API response actually describing?
If the order is right, these incidents are usually manageable. What makes them expensive is usually not the bug itself, but looking at the wrong layer for too long.
Top comments (0)