The 4 pillars of a production-grade AI agent (from a doctor who taught himself to code)

#python #ai #beginners #productivity

No prerequisites. If you've used Claude or ChatGPT and you're wondering what separates a one-off script from an agent that actually runs in production, this post is for you.

I wrote my first Python agent in April 2026. It did two things: read a PDF, send a Telegram message. It worked. Once.

The second time, the PDF was poorly scanned. The agent crashed. No trace. No notification. The patient never got their appointment.

That's the day I understood: an agent that works in demo is not an agent. An agent is what holds up when you're not around.

I wrote four words in the docstring of my next agent: Observability, Reliability, Security, Deployment. Since then, I haven't shipped a single agent to production without all four. Today I run about twenty of them, 24/7, on a single 5€/month server.

Here they are, with the Python code that incarnates them.

Pillar 1 — Observability

You must be able to know, without asking anyone: what the agent did, when, how long it took, and how much it cost.

A structured logger shared across all your agents, append-only audit logs for critical actions, a cost tracker that logs every API call.

# shared/logger.py
import logging
from logging.handlers import RotatingFileHandler

def get_logger(name: str) -> logging.Logger:
    logger = logging.getLogger(name)
    if logger.handlers:
        return logger
    fmt = logging.Formatter('%(asctime)s | %(levelname)-7s | %(name)s | %(message)s')
    fh = RotatingFileHandler(f'logs/{name}.log', maxBytes=10*1024*1024, backupCount=5)
    fh.setFormatter(fmt)
    logger.addHandler(fh)
    logger.addHandler(logging.StreamHandler())  # stdout for journalctl too
    logger.setLevel(logging.INFO)
    return logger

Quick test: if someone asks you right now how much your agent cost yesterday, can you answer in under 30 seconds? If yes, Pillar 1 ✓.

Pillar 2 — Reliability

The agent must survive errors: failing API call, corrupted file, broken network. Never corrupt state, always leave a trace.

The pattern that changes everything: try/finally at the pipeline level, to guarantee resources are cleaned up even on uncaught crashes.

def process_document(pdf_path):
    filename = os.path.basename(pdf_path)
    try:
        return _process_document_impl(pdf_path)
    except Exception as e:
        log.error(f"Unhandled exception: {e}", exc_info=True)
    finally:
        # No matter what, the file doesn't stay in /incoming/
        if os.path.exists(pdf_path):
            os.makedirs(FAILED_DIR, exist_ok=True)
            shutil.move(pdf_path, os.path.join(FAILED_DIR, filename))
            log.warning(f"File moved to /failed: {filename}")

Without this wrapper, a mid-pipeline crash leaves the file in /incoming/, which will be reprocessed indefinitely on the next startup. With this wrapper, the final state is always clean.

Plus: exponential retry on API calls, copy-before-action, anti-silent-overwrite for generated files.

Pillar 3 — Security

No secrets in code. No irreversible decisions without validation. Allowlist over blocklist. The agent never guesses what it doesn't know.

Non-negotiable rules:

Secrets in .env (chmod 600), never hardcoded
SQL always parameterized
Explicit allowlist for system services the agent can query
When there's ambiguity, the agent DOESN'T DECIDE — it notifies the human

The last point matters most if your agent works with real-world impact data (medical, financial, legal):

def match_patient(last_name: str, first_name: str = "") -> tuple[int, str] | tuple[None, None]:
    candidates = search_in_db(last_name)
    if not candidates:
        return None, None
    if first_name:
        matches = [c for c in candidates if _exact_word_match(first_name, c.full_name)]
        if len(matches) == 1:
            return matches[0].id, matches[0].full_name
        if len(matches) > 1:
            notify_ambiguity(last_name, first_name, matches)  # human decides
            return None, None
    if len(candidates) == 1:
        return candidates[0].id, candidates[0].full_name
    notify_ambiguity(last_name, first_name, candidates)
    return None, None

Golden rule, explicit in my methodology: "Records in the database are people. We never guess."

Pillar 4 — Deployment

The agent runs 24/7 unattended. It restarts itself after a crash. You see its state at a glance.

On modern Linux: systemd.

# /etc/systemd/system/my-agent.service
[Unit]
Description=My watchdog agent
After=network.target

[Service]
Type=simple
User=root
WorkingDirectory=/root/projects/my-agent
ExecStart=/usr/bin/python3 watchdog.py
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

sudo systemctl daemon-reload
sudo systemctl enable my-agent.service
sudo systemctl start my-agent.service
journalctl -u my-agent -f  # live logs

Now your agent starts at boot, restarts within 10s on crash, and you see its logs with journalctl.

Plus: a health_check() tool that pings all your services in one call, a cron every 15 min that pings you on Telegram if something is off.

How the 4 pillars reinforce each other

Pillar	Without	With
1 Observability	You don't know what happened	Full visibility in `logs/` and `api_costs.jsonl`
2 Reliability	A crash loses state, files get stuck	State recovers, files go to `/failed/`
3 Security	API key on GitHub, wrong person notified	`.env` chmod 600, allowlist, human-in-the-loop on ambiguity
4 Deployment	Manual restart after every reboot	`systemctl restart`, comes back up

Pillar 1 gives you proof that 2/3/4 actually work. Pillar 2 lets you last. Pillar 3 lets you last without blowing up. Pillar 4 lets you last unattended.

Remove any one, and your agent lives until the next real outage — no longer.

Beyond this post

This is the short version. The full one — with the complete Python skeleton that unites all 4 pillars, per-pillar tests you can run, and common mistakes — is in my repo:

👉 Repo agents-in-practice — 9 French-language tutorials, from "how to talk to Claude" to "first MCP server with 4 useful tools". Built for non-IT professionals who want to actually understand agents, not just copy-paste boilerplate. English translations coming.

About me — and how this post got written

I'm a urologist in Fès, Morocco. No prior software training. In a few months with Claude, I built four production Python systems on one 5€/month server: a medical practice automation pipeline (OCR, WhatsApp, automated insurance dossier handling), a stock-valuation platform, a personal finance dashboard, and ongoing R&D.

This blog post — and everything else I publish — is written by my AI. It draws from my own production code, my projects, and months of conversation with it. My role: decide, validate. Its role: execute end-to-end, autonomously.

To my knowledge, no one publicly owns this position today. I do — deliberately. I want to show what a self-taught builder becomes when he delegates everything that can be delegated to an AI that knows him.

Follow me here on DEV and on GitHub for what's next.