Building a Self-Healing Backend with AI + Docker

kumar — Sun, 12 Apr 2026 17:37:45 +0000

I had this idea that kept bugging me: what if your backend could fix itself?

Not in some hand-wavy "AI will handle it" way. I mean actually tail its own logs, spot a real error, figure out what's wrong in the code, patch it, rebuild the container, and move on. While you sleep.

So I built it. Five Docker containers. One of them watches the others, and when something breaks, it calls an LLM to generate a code fix, applies it, restarts the broken service, and verifies the fix worked. No human in the loop.

This post is the full breakdown of how it works, what surprised me, and where it falls apart.

The Setup

The demo is a small data pipeline. Two FastAPI services and a MongoDB instance, all running in Docker.

Service A holds raw records. It doesn't validate them — just stores whatever it gets.
Service B is the strict one. It receives records from Service A, validates every field against business rules, and writes the good ones to a separate database. Bad records get rejected.
The AI container sits alongside them. It has access to the Docker socket, can SSH into the other containers, and tails their logs in real time.

There's also an init container that seeds the database on startup : 1000 records, most of them clean, but a handful intentionally malformed. Different kinds of malformed: numbers wrapped in weird JSON objects, mixed-case field names, nested data serialized as strings. The kind of stuff that happens when real systems talk to each other.

When Service A tries to transfer everything to Service B, those malformed records get rejected. Service A logs the rejections as errors. And that's when the AI container wakes up.

                +-----------+        +-----------+
                | Service A |------->| Service B |
                | (no       |  HTTP  | (strict   |
                |  validation)       |  validation)
                +-----+-----+        +-----+-----+
                      |                     |
                      | logs errors         | rejects bad data
                      v                     v
              +----------------------------------+
              |        AI Orchestrator           |
              |                                  |
              |  1. tail logs from all services  |
              |  2. regex match on error pattern |
              |  3. build prompt with context    |
              |  4. call LLM for a code fix      |
              |  5. apply patch, rebuild, verify |
              +----------------------------------+
                      |
                      v
                +-----------+
                |  MongoDB  |
                +-----------+

The Log Watcher

The core of the system is a shell script that runs inside the AI container. It streams logs from the other containers using docker compose logs -f and watches every line against a configurable regex pattern.

The idea is simple. Most log lines are noise — request timings, debug output, health checks. But when a line matches the error pattern (in my case, something like transfer_remote_rejections), the system wakes up.

Here's the stripped-down logic:

# Stream logs from all monitored services
docker compose logs --tail 0 -f service_a service_b mongodb | while read -r line; do

    # Append to a rolling buffer (keeps last N lines for context)
    echo "$line" >> "$BUFFER_FILE"

    # Check if this line matches our error pattern
    if echo "$line" | grep -Eq "$ERROR_REGEX"; then

        # Hash the line to avoid retriggering on the same error
        signature=$(echo "$line" | sha256sum | awk '{print $1}')

        if should_trigger "$signature"; then
            echo "Detected error. Triggering AI fix."
            run_ai_fix "$line" "$signature"
        fi
    fi
done

A few things I want to highlight because they matter:

Rolling buffer, not just the matched line. When the AI needs to fix something, it doesn't just get the error — it gets the last 30 lines of logs for context. A rejection error alone doesn't tell you much. But 30 lines of context? Now you can see the actual payload that failed, the validation error message, the traceback.

Signature-based deduplication. Without this, the same error triggers the fix loop over and over. Each matched line gets hashed, and if we've already triggered on that hash within a cooldown window (say, 3 minutes), we skip it.

Reconnection. Log streams can drop. The outer while true loop reconnects automatically with a short delay.

The Fix Prompt

When the watcher triggers, it builds a prompt and sends it to an LLM. This is the part that took the most iteration to get right.

The naive version — "here's an error, fix it" - doesn't work. The model needs structure. It needs to know what the system is, what the error means, and exactly what files to look at.

Here's roughly what the prompt looks like:

You are debugging a running backend service inside Docker.

Detected error pattern: transfer_remote_rejections
Matched log line: [truncated to ~1400 chars]

Recent log context (last 30 lines):

[...actual log output...]


Task:
- The receiving service rejects records with unexpected payload shapes.
- Fix the validation/normalization code to handle these variants:
  - Numbers wrapped as {"$numberInt": "42"} or {"$numberLong": "999"}
  - Object keys with inconsistent casing (e.g., "Category" vs "category")
  - Nested objects serialized as JSON strings instead of dicts
  - Nested objects sent as a list of {key, value} pairs
- Only modify the receiving service's code. Preserve the API contract.
- Rebuild the container, run the transfer again, verify counts.

The key decisions:

Tell the model exactly which file to modify. Don't let it go exploring the whole repo. In my case, the fix always lives in the receiving service's main application file.
List the variant shapes explicitly. The model can't guess what "malformed" means in your context. Be specific about what the data looks like and what it should be normalized into.
Include the verification step. The prompt doesn't just say "fix the code" — it says "fix the code, rebuild, re-run the transfer, check the counts." The AI needs to know when it's done.

Retries and Cooldowns

The fix doesn't always work on the first try. Sometimes the model gets it 80% right - handles three out of four variants, misses one. That's fine, because the system is built for retries.

MAX_RETRIES=3
attempt=1

while [ "$attempt" -le "$MAX_RETRIES" ]; do
    echo "Fix attempt $attempt/$MAX_RETRIES"

    # Run the LLM with a timeout
    timeout 900 run_llm_fix < "$PROMPT_FILE"
    exit_code=$?

    if [ "$exit_code" -eq 0 ]; then
        echo "Fix succeeded on attempt $attempt"
        break
    fi

    attempt=$((attempt + 1))
    sleep 2
done

The 900-second timeout (15 minutes) is generous on purpose. The model doesn't just edit a file - it also rebuilds the container, waits for it to become healthy, triggers the transfer, and checks the results. That whole cycle takes time.

And the cooldown between error signatures prevents the system from going into a spin loop when something truly can't be fixed. Three strikes and it stops, leaving the error for a human to look at.

The Part Nobody Talks About: SSH Into Running Containers

Here's something I didn't see coming when I started this project. The AI container needs to actually do things inside the other containers - read files, apply patches, restart processes. You can't just docker exec for everything.

The solution I landed on: the AI container generates an SSH keypair on startup, shares the public key through a Docker volume, and all service containers configure their SSH daemons to accept it.

# docker-compose.yml (simplified)
services:
  ai_orchestrator:
    volumes:
      - ssh_keys:/shared-keys          # writes the keypair here
      - /var/run/docker.sock:/var/run/docker.sock

  service_a:
    volumes:
      - ssh_keys:/shared-keys:ro       # reads the public key

  service_b:
    volumes:
      - ssh_keys:/shared-keys:ro

volumes:
  ssh_keys:

Each service container runs a small entrypoint script that waits for the public key to appear, copies it into authorized_keys, starts the SSH daemon, and then launches the actual application.

This means the AI container can do things like:

ssh service_a "tail -n 50 /var/log/app/service.log"
ssh service_b "cat /app/main.py"

It felt overengineered at first, but it turned out to be the cleanest way to give the AI full access without mounting every source directory as a shared volume.

The Entrypoint Trick: Logs That Go Two Places

One challenge with Docker is that you want logs to go to stdout (so docker logs works) but you also want them in a file (so the AI can read them via SSH or tail them).

The solution is a thin entrypoint wrapper:

#!/bin/sh
LOG_FILE="/var/log/app/service.log"
mkdir -p "$(dirname "$LOG_FILE")"

# Run the actual command, redirect all output to the log file
"$@" >> "$LOG_FILE" 2>&1 &
MAIN_PID=$!

# Tail the log file to stdout (so docker logs still works)
tail -n +1 -F "$LOG_FILE" &
TAIL_PID=$!

wait "$MAIN_PID"
kill "$TAIL_PID" 2>/dev/null

Every container uses this as its entrypoint. The actual service command gets passed as arguments. Output goes to a file and to stdout. Everybody's happy.

What the Fix Actually Looks Like

For the curious — what does the AI actually change?

In my demo, the receiving service has strict Pydantic validation. It expects fields like long_value to be integers, object_values to be a dict with specific keys, etc. But the malformed records come in with stuff like:

{
  "long_value": {"$numberLong": "900000000000000001"},
  "object_values": "{\"category\": \"ALPHA\", \"quality\": \"HIGH\", \"multiplier\": 2}"
}

The AI adds a normalization layer before validation — something like:

def normalize_payload(raw: dict) -> dict:
    """Unwrap MongoDB extended JSON and normalize shapes."""

    # Handle {"$numberLong": "..."} and {"$numberInt": "..."} wrappers
    for field in ["long_value", "short_value", "integer_value"]:
        val = raw.get(field)
        if isinstance(val, dict):
            raw[field] = int(val.get("$numberLong") or val.get("$numberInt", 0))

    # Handle object_values as a JSON string
    obj = raw.get("object_values")
    if isinstance(obj, str):
        obj = json.loads(obj)
        raw["object_values"] = obj

    # Handle object_values as [{key, value}, ...] list
    if isinstance(obj, list):
        raw["object_values"] = {item["key"]: item["value"] for item in obj}
        obj = raw["object_values"]

    # Normalize mixed-case keys
    if isinstance(obj, dict):
        normalized = {k.lower(): v for k, v in obj.items()}
        raw["object_values"] = normalized

    return raw

That's it. A normalization function that handles four different "dirty" shapes. The AI writes this, plugs it into the ingest endpoint, rebuilds the container, and re-runs the transfer. All 1000 records pass.

The Results

Before the AI fix:

source_total:           1000
transferred:             996
rejected:                  4

After the AI fix runs (automatically, no human):

source_total:           1000
transferred:            1000
rejected:                  0

The whole cycle — error detection, prompt construction, LLM call, code patch, container rebuild, verification — takes about 2-3 minutes depending on the model speed.

What I'd Do Differently

The prompt is the product. I spent way more time tuning the prompt than writing the orchestration logic. If your prompt is vague, the model will make creative decisions you don't want. Be specific about what to change, what not to touch, and how to verify success.

Shell scripts are actually fine for this. I started rewriting the orchestrator in Python, then stopped. The core logic is "tail logs, grep for patterns, run a command." Shell does this natively. Don't overcomplicate the glue code.

SSH access was the right call. I tried volume mounts first (share source code between containers). It works but gets messy fast with permissions and file locking. SSH gives you a clean interface — "read this file, write this file, run this command" — without coupling container filesystems.

The circuit breaker matters more than the AI. The cooldown, the retry limit, the signature dedup — that's what prevents the system from doing something stupid in a loop. The AI fix is the flashy part, but the guardrails are what make it safe to actually run.

When Would You Use This For Real?

Honestly? Not in production. Not yet. But here's where it makes sense today:

Staging environments where you want fast iteration on integration bugs
Demo environments that need to self-recover when data gets messy
Data pipelines where upstream systems send unpredictable payloads and you need the receiving end to adapt
Internal tools where the cost of an hour of downtime is higher than the risk of an automated fix

The interesting thing is that the pattern — tail logs, detect errors, call an LLM, apply a fix, verify — doesn't require Docker at all. You could do the same thing with systemd services, Kubernetes pods, or Lambda functions. Docker just makes it easy to prototype.

Try It Yourself

The whole thing is five containers and a docker-compose.yml. The stack:

2x FastAPI services (Python, one stores data, one validates it)
1x MongoDB
1x Init container (seeds test data with intentional malformed records)
1x AI orchestrator (tails logs, calls LLM, applies fixes)

You need an LLM API key and Docker. That's it.

The orchestrator shell script is under 300 lines. The FastAPI services are under 400 lines each. There's no framework, no agent library, no orchestration platform. Just containers, logs, regex, a prompt, and an API call.

If you've built something similar - or think this is a terrible idea : I'd genuinely like to hear about it. Drop a comment or ping me. The best feedback I've gotten on this project has been from people who tried to poke holes in it.

DEV Community: kumar