When I started building Adola, an AI companion on Telegram, I had a standard architecture in mind: one server, one model, a database to track users, and some clever prompt engineering to keep conversations separate.
That lasted about two weeks before I scrapped it entirely.
The Problem With Shared Instances
With a shared AI instance serving multiple users, you run into:
- Context contamination - Even with user ID prefixes, the model occasionally leaks information between users
- Memory management nightmares - Vector databases work for retrieval but not for the kind of curated, evolving memory an AI companion needs
- Blast radius - One user triggering a weird model state affects everyone
- No filesystem - The agent cannot read/write files, maintain its own notes, or use tools that require persistent state
The Solution: One Container Per User
Each user gets a Docker container running a full AI agent stack. The container has:
/workspace/
MEMORY.md # Agent-maintained notes about this user
SCHEDULES.json # Reminders and recurring tasks
SOUL.md # Personality and behavioral guidelines
HEARTBEAT.md # Instructions for proactive check-ins
A thin gateway routes incoming Telegram messages to the correct container via HTTP, waits for the response, and sends it back to the user.
Making It Efficient
The obvious concern: running N containers for N users is expensive.
In practice, it is not. Here is why:
Idle containers get stopped. A cleanup loop runs every 5 minutes and stops any container that has not received a message in the last 30 minutes. Stopped containers use zero CPU and minimal memory.
Starting a stopped container takes ~3 seconds. The Docker image is already pulled, volumes are mounted, and the agent state is on disk. Users do not notice the cold start because Telegram shows a typing indicator while the container boots.
Containers share the base image. Docker layer caching means 100 user containers do not use 100x the disk space.
The gateway is tiny. It is a Node.js process that does routing, scheduling, and heartbeat checks. Under 200MB of RAM.
Container Lifecycle
User sends message
-> Gateway receives webhook
-> Gateway checks if container exists
-> If not: create container, mount workspace volume, start
-> If stopped: start
-> If running: use directly
-> Forward message via HTTP POST to container
-> Wait for response
-> Send response back to Telegram
-> Update last_message_at timestamp
Cleanup loop (every 5 min)
-> For each running container
-> If last_message_at > 30 min ago: stop container
Heartbeat loop (every 15 min)
-> For each user
-> Start container if needed
-> Send "should you check in on this user?" prompt
-> If agent responds with something meaningful: deliver to user
-> If agent says no: do nothing
Key Lessons
Bind mounts beat volumes for this. I mount the workspace directory directly from the host filesystem. This makes backups trivial (just tar the data directory) and lets the gateway read files like SCHEDULES.json without going through Docker.
The agent manages its own memory better than any external system. Giving the agent a MEMORY.md file and telling it "write down anything important about this person" produces better results than RAG, vector search, or structured databases. The agent decides what matters.
Container names should be deterministic. I use adola-user-{first8chars-of-userId} so the gateway can find the right container without a lookup table.
Results
This architecture handles 7 users on a single e2-medium GCP instance ($35/month). Based on resource usage patterns, it should comfortably scale to 50-100 users on the same hardware before needing to upgrade.
If you want to try the end result: t.me/adola2048_bot. It is free, no signup required.
Top comments (0)