TL;DR: I built a personal AI assistant that actually knows my business — using a LangChain agent, dual-layer memory (Redis + pgvector), and a model router that switches between GPT-4o and Claude 3.5 by task type. Here's the full architecture.
The architecture
The system has three layers:
- Frontend — Next.js 14, WebSocket streaming for real-time responses
- Agent layer — FastAPI + LangChain AgentExecutor with four tools (email, CRM, tasks, calendar)
- Memory layer — Redis for session state, Supabase pgvector for long-term RAG
The memory problem
Most LLM demos are stateless. Each request hits the API cold. Jarvis solves this with a hybrid retriever: BM25 keyword search for exact names/dates + semantic cosine search for concepts. A cross-encoder re-ranker then trims results to the top 5 chunks before injection.
The model router
Not all tasks need the same model. I route tool-use tasks (CRM lookups, scheduling, email sends) to GPT-4o function calling, and writing/reasoning tasks to Claude 3.5 Sonnet. This cuts costs and improves output quality vs. using one model for everything.
Code snippet — tool registration in LangChain:
tools = [
CRMQueryTool(db=supabase),
EmailDraftTool(client=sendgrid),
TaskManagerTool(db=redis),
CalendarReaderTool()
]
agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS)
Key learnings
- Context injection strategy matters more than model choice
- Redis TTL for session memory should match your average session length (I use 2h)
- Always stream responses — users abandon non-streaming AI UIs within 3 seconds
Full repo coming soon. Follow for updates.
Top comments (0)