Building a persistent AI business assistant with LangChain, FastAPI, and Redis

#architecture #rag #ai #agents

TL;DR: I built a personal AI assistant that actually knows my business — using a LangChain agent, dual-layer memory (Redis + pgvector), and a model router that switches between GPT-4o and Claude 3.5 by task type. Here's the full architecture.

The architecture

The system has three layers:

Frontend — Next.js 14, WebSocket streaming for real-time responses
Agent layer — FastAPI + LangChain AgentExecutor with four tools (email, CRM, tasks, calendar)
Memory layer — Redis for session state, Supabase pgvector for long-term RAG

The memory problem

Most LLM demos are stateless. Each request hits the API cold. Jarvis solves this with a hybrid retriever: BM25 keyword search for exact names/dates + semantic cosine search for concepts. A cross-encoder re-ranker then trims results to the top 5 chunks before injection.

The model router

Not all tasks need the same model. I route tool-use tasks (CRM lookups, scheduling, email sends) to GPT-4o function calling, and writing/reasoning tasks to Claude 3.5 Sonnet. This cuts costs and improves output quality vs. using one model for everything.

Code snippet — tool registration in LangChain:

tools = [
    CRMQueryTool(db=supabase),
    EmailDraftTool(client=sendgrid),
    TaskManagerTool(db=redis),
    CalendarReaderTool()
]
agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS)

Key learnings

Context injection strategy matters more than model choice
Redis TTL for session memory should match your average session length (I use 2h)
Always stream responses — users abandon non-streaming AI UIs within 3 seconds

Full repo coming soon. Follow for updates.

DEV Community

Building a persistent AI business assistant with LangChain, FastAPI, and Redis

Top comments (0)