Octavian

Posted on Mar 26

Why most AI agent frameworks break in production (and what I’m doing differently)

#ai #agentskills #devtool #runnerhchallenge

Over the past months I’ve been building a WhatsApp-first AI receptionist that handles booking and rescheduling directly into Google Calendar.

One thing became obvious quickly:

Most agent frameworks work well locally, but start breaking once you try to run them as a multi-tenant production service.

Session state disappears. Memory becomes fragile. PII handling is unclear. Human escalation is missing.

So I started building an opinionated framework called Glaivio, applying a convention-over-configuration approach similar to what Rails did for web apps — but for AI agents.

The goal isn’t flexibility.

The goal is predictable production behavior.

Repo:
https://github.com/tavyy/glaivio-ai

The problem with most agent architectures today

Typical agent demos assume:

stateless execution
single-user workflows
local memory files
no escalation path
no privacy middleware

That works for prototypes.

It doesn’t work when:

multiple customers interact simultaneously
conversations persist across sessions
messages contain personal data
agents must hand off safely to humans

These problems appear immediately when deploying agents inside real businesses.

Design principle: state should be infrastructure, not an afterthought

Most frameworks treat conversation history as optional memory.

Glaivio treats it as required infrastructure.

Instead of:

temporary buffers
flat JSON memory files
or ephemeral runtime context

conversation history is Postgres-backed by default.

That means:

session continuity across restarts
multi-tenant compatibility
auditability
production-ready persistence

Agents should not “forget” conversations because a container restarted.

Privacy middleware should exist before the LLM call

Another issue I kept seeing:

PII goes directly to model providers.

For production systems this becomes a real blocker.

Glaivio includes a middleware layer (work in progress) that automatically redacts:

phone numbers
emails
NHS numbers

before sending payloads to the LLM provider.

The goal is to make privacy a default behavior instead of an integration burden.

Agents need native escalation paths

Real users don’t behave like demos.

Eventually the agent becomes uncertain.

Most frameworks leave escalation as an application-layer concern.

Glaivio includes a simple trigger mechanism:

on_confusion → escalate to human operator

In practice this connects easily to:

WhatsApp
Email
support dashboards

This prevents agents from getting stuck in loops.

Self-improving agents without prompt hacking

One recurring issue with deployed agents:

users correct them constantly.

Usually this feedback disappears.

Glaivio experiments with extracting corrections into a persistent corrections file so agents adapt over time without rewriting prompts manually.

Still early, but promising.

Memory should behave more like cognition than logs

Instead of loading entire histories into context windows, the framework is evolving toward a two-layer memory model:

conscious memory
unconscious memory

Only distilled facts relevant to the current task enter the active reasoning window.

The rest stays persistent but inactive.

This keeps token usage predictable while preserving long-term knowledge.

Example: building a WhatsApp receptionist agent

Using this structure I was able to implement:

booking
rescheduling
calendar updates

into Google Calendar from WhatsApp conversations in roughly 20 lines of code.

# skills/check_availability.py
from glaivio import skill

@skill
def check_availability(date: str, time: str) -> str:
    """Check if a time slot is available. Always call before book_appointment.
    date: YYYY-MM-DD, time: HH:MM 24h format."""
    # call your calendar API here
    return "Available"

plaintext

# skills/book_appointment.py
from glaivio import skill

@skill
def book_appointment(patient_name: str, patient_phone: str, date: str, time: str) -> str:
    """Book an appointment. Only call after check_availability confirms the slot is free.
    patient_phone: use the current user's ID from context.
    date: YYYY-MM-DD, time: HH:MM 24h format."""
    # call your calendar API here
    return f"Booked {patient_name} on {date} at {time}"

plaintext

from dotenv import load_dotenv
load_dotenv()

from glaivio import Agent
from skills.check_availability import check_availability
from skills.book_appointment import book_appointment

agent = Agent(
    instructions="prompts/system.md",
    skills=[check_availability, book_appointment],
    learn_from_feedback=True,
    privacy=True,
)

if __name__ == "__main__":
    agent.run(channel="whatsapp")

The goal of the framework is to make production agent deployments feel closer to:

Rails apps
Django apps

instead of experimental scripts.

What I’m exploring next

Still working on:

privacy middleware
correction learning pipeline
tiered memory loading
human escalation integrations

Feedback welcome (and stars help)

This framework is still early and evolving around real deployment constraints rather than research demos.

If you're working on production AI agents, especially anything multi-tenant, privacy-sensitive, or customer-facing, I’d really value your feedback on what breaks first in your setup and what infrastructure you wish existed by default.

https://github.com/tavyy/glaivio-ai