Over the past months I’ve been building a WhatsApp-first AI receptionist that handles booking and rescheduling directly into Google Calendar.
One thing became obvious quickly:
Most agent frameworks work well locally, but start breaking once you try to run them as a multi-tenant production service.
Session state disappears. Memory becomes fragile. PII handling is unclear. Human escalation is missing.
So I started building an opinionated framework called Glaivio, applying a convention-over-configuration approach similar to what Rails did for web apps — but for AI agents.
The goal isn’t flexibility.
The goal is predictable production behavior.
Repo:
https://github.com/tavyy/glaivio-ai
The problem with most agent architectures today
Typical agent demos assume:
- stateless execution
- single-user workflows
- local memory files
- no escalation path
- no privacy middleware
That works for prototypes.
It doesn’t work when:
- multiple customers interact simultaneously
- conversations persist across sessions
- messages contain personal data
- agents must hand off safely to humans
These problems appear immediately when deploying agents inside real businesses.
Design principle: state should be infrastructure, not an afterthought
Most frameworks treat conversation history as optional memory.
Glaivio treats it as required infrastructure.
Instead of:
- temporary buffers
- flat JSON memory files
- or ephemeral runtime context
conversation history is Postgres-backed by default.
That means:
- session continuity across restarts
- multi-tenant compatibility
- auditability
- production-ready persistence
Agents should not “forget” conversations because a container restarted.
Privacy middleware should exist before the LLM call
Another issue I kept seeing:
PII goes directly to model providers.
For production systems this becomes a real blocker.
Glaivio includes a middleware layer (work in progress) that automatically redacts:
- phone numbers
- emails
- NHS numbers
before sending payloads to the LLM provider.
The goal is to make privacy a default behavior instead of an integration burden.
Agents need native escalation paths
Real users don’t behave like demos.
Eventually the agent becomes uncertain.
Most frameworks leave escalation as an application-layer concern.
Glaivio includes a simple trigger mechanism:
on_confusion → escalate to human operator
In practice this connects easily to:
- support dashboards
This prevents agents from getting stuck in loops.
Self-improving agents without prompt hacking
One recurring issue with deployed agents:
users correct them constantly.
Usually this feedback disappears.
Glaivio experiments with extracting corrections into a persistent corrections file so agents adapt over time without rewriting prompts manually.
Still early, but promising.
Memory should behave more like cognition than logs
Instead of loading entire histories into context windows, the framework is evolving toward a two-layer memory model:
- conscious memory
- unconscious memory
Only distilled facts relevant to the current task enter the active reasoning window.
The rest stays persistent but inactive.
This keeps token usage predictable while preserving long-term knowledge.
Example: building a WhatsApp receptionist agent
Using this structure I was able to implement:
- booking
- rescheduling
- calendar updates
into Google Calendar from WhatsApp conversations in roughly 20 lines of code.
# skills/check_availability.py
from glaivio import skill
@skill
def check_availability(date: str, time: str) -> str:
"""Check if a time slot is available. Always call before book_appointment.
date: YYYY-MM-DD, time: HH:MM 24h format."""
# call your calendar API here
return "Available"
plaintext
# skills/book_appointment.py
from glaivio import skill
@skill
def book_appointment(patient_name: str, patient_phone: str, date: str, time: str) -> str:
"""Book an appointment. Only call after check_availability confirms the slot is free.
patient_phone: use the current user's ID from context.
date: YYYY-MM-DD, time: HH:MM 24h format."""
# call your calendar API here
return f"Booked {patient_name} on {date} at {time}"
plaintext
from dotenv import load_dotenv
load_dotenv()
from glaivio import Agent
from skills.check_availability import check_availability
from skills.book_appointment import book_appointment
agent = Agent(
instructions="prompts/system.md",
skills=[check_availability, book_appointment],
learn_from_feedback=True,
privacy=True,
)
if __name__ == "__main__":
agent.run(channel="whatsapp")
The goal of the framework is to make production agent deployments feel closer to:
- Rails apps
- Django apps
instead of experimental scripts.
What I’m exploring next
Still working on:
- privacy middleware
- correction learning pipeline
- tiered memory loading
- human escalation integrations
Feedback welcome (and stars help)
This framework is still early and evolving around real deployment constraints rather than research demos.
If you're working on production AI agents, especially anything multi-tenant, privacy-sensitive, or customer-facing, I’d really value your feedback on what breaks first in your setup and what infrastructure you wish existed by default.

Top comments (0)