In this post, we'll dive into how we built Agentbase, a full-stack platform that lets anyone build, manage, and deploy knowledge-grounded AI agents. We wanted a solution that combined the sleekness of a modern web UI with the power of Python-based agent frameworks, all while keeping the vector storage local and fast.
The Stack
I chose a "best of breed" approach for the tech stack:
- Frontend: Next.js 14 (App Router) for a responsive, server-rendered UI.
- Backend: FastAPI for high-performance Python APIs.
- Agent Framework: Agno (formerly Phidata) for orchestrating LLMs and tools.
- Vector DB: LanceDB for embedded, serverless vector search.
- Auth/DB: Supabase for user management and metadata storage.
- LLM: Ollama for running models locally (privacy-first!).
Key Challenges & Solutions
1. Robust RAG Ingestion
One of the hardest parts of RAG is reliable file ingestion. I initially faced issues where large documents were being chunked poorly, leading to bad search results.
Solution: I implemented a CustomTextReader in our backend to enforce strict chunk sizes. Additionally, I added a "lazy-loading" mechanism. Instead of hoping background tasks finish, our agent runner checks if the knowledge base is loaded at runtime before answering queries.
# backend/app/services/agent_runner.py
if knowledge:
# Lazy load: Re-load only if not already populated to ensure robustness
knowledge.load(recreate=False)
2. Streaming Responses
Users expect AI to "type" out answers. Waiting for a full response feels slow.
Solution: I used Python's generators and FastAPI's StreamingResponse to push tokens to the frontend as they are generated. On the client side, we used TextDecoder and a ReadableStream reader to append text in real-time, giving that satisfying typewriter effect.
3. Handling Local Models
Hardcoding models is a bad practice. Users want to use the models they have installed.
Solution: I built a dynamic endpoint (/api/v1/models/ollama) that queries the user's running Ollama instance for available tags.
# backend/app/api/endpoints/llm.py
@router.get("/ollama")
def list_ollama_models():
ollama_url = f"{settings.OLLAMA_BASE_URL}/api/tags"
response = requests.get(ollama_url)
# ... returns list of models
This populates a dropdown in the UI, ensuring users never make typos when selecting llama3 or mistral.
The Result
The result is a snappy, functional platform where you can:
- Create an Agent: "Marketing Assistant".
- Upload Knowledge: Upload your company's branding PDF.
- Chat: Ask "What are our brand colors?" and get an accurate answer cited from the PDF.
What's Next?
Somethings I plan to add in the future:
- Multi-agent orchestration (teams of agents).
- Web browsing tools for live research.
- One-click deployment to Vercel/Render.
Check out the code on GitHub and start building your own agent army today!






Top comments (0)