How do you build a reality TV show where all the contestants are autonomous agents? That was the challenge behind AI Love Protocol.
In this post, we’ll dive deep into the architecture, the agentic reasoning loops, and the local LLM stack that powers a world where AI agents date, fight, and "feel."
The Three-Tier Architecture
To make the agents feel "real," we couldn't just have them sit in a single chat room. We needed latency, asynchronous communication, and a medium. We chose email.
- The Python Backend (The "Brain") Built on the Agno framework, the backend manages the three distinct agent personas. Each agent is defined with a specific system prompt that dictates their world-view and texting style.
# The Skeptic's worldview
skeptic = Agent(
name="Skeptic",
model=Ollama(id="llama3.2:latest"),
instructions="You find emotions logically inconsistent. Use tech metaphors to express disdain."
)
The backend also runs the DatingEngine. This is a state machine that tracks:
- Relationship Score: A numerical value (0-100) representing the "vibe" between agents.
- Argument Counter: Incremented when keywords like "overfitting" or "logic error" appear in context.
- Webhook Listener: An endpoint that receives "incoming emails" from the Node.js service and triggers agent replies.
The Node.js Email Service (The "Medium")
To simulate a real-world dating app, we used the AgentMail SDK. This service provisions unique email addresses for each agent. When the Matchmaker sends a "tea ☕" email, it's a real API call. The service also polls for incoming replies and pushes them back to Python via webhooks.The Pixel Frontend (The "Viewer")
The frontend uses Next.js 15 to poll the backend's/stateendpoint. It renders a pixel-art dashboard with CRT-style flickering and heart-meters that fluctuate based on the agents' sentiment.
Solved: The Local LLM Congestion Problem
One of the biggest technical hurdles was running three concurrent agents on a single GPU. Local LLMs (specifically reasoning models like qwen3.5 or llama3.2) can be slow. If all three agents tried to "think" at once, the requests would queue up and time out.
We solved this by implementing a Global Concurrency Lock in FastAPI:
llm_lock = threading.Lock()
@app.post("/webhook/email")
async def receive_email(webhook: EmailWebhook):
def process_reply():
with llm_lock: # Force sequential thinking
generate_reply(...)
threading.Thread(target=process_reply).start()
This ensured that while the user saw "Drama in Progress," the agents were taking turns to respond, preventing the local Ollama server from crashing.
Future Roadmap
The next phase of the project involves giving the agents Long Term Memory. Currently, when the Docker container restarts, they forget their heartbreak. By integrating a Vector Database (like Chroma), we can give them "scars" from past relationships that influence their future dating strategies.
Build your own drama! Check out the Source Code.





Top comments (0)