Building Agentbase: An Open Source RAG Platform with Next.js, FastAPI, and Agno

#programming #ai #javascript #beginners

In this post, we'll dive into how we built Agentbase, a full-stack platform that lets anyone build, manage, and deploy knowledge-grounded AI agents. We wanted a solution that combined the sleekness of a modern web UI with the power of Python-based agent frameworks, all while keeping the vector storage local and fast.

The Stack

I chose a "best of breed" approach for the tech stack:

Frontend: Next.js 14 (App Router) for a responsive, server-rendered UI.
Backend: FastAPI for high-performance Python APIs.
Agent Framework: Agno (formerly Phidata) for orchestrating LLMs and tools.
Vector DB: LanceDB for embedded, serverless vector search.
Auth/DB: Supabase for user management and metadata storage.
LLM: Ollama for running models locally (privacy-first!).

Key Challenges & Solutions

1. Robust RAG Ingestion

One of the hardest parts of RAG is reliable file ingestion. I initially faced issues where large documents were being chunked poorly, leading to bad search results.

Solution: I implemented a CustomTextReader in our backend to enforce strict chunk sizes. Additionally, I added a "lazy-loading" mechanism. Instead of hoping background tasks finish, our agent runner checks if the knowledge base is loaded at runtime before answering queries.

# backend/app/services/agent_runner.py
if knowledge:
  # Lazy load: Re-load only if not already populated to ensure robustness
 knowledge.load(recreate=False)

2. Streaming Responses

Users expect AI to "type" out answers. Waiting for a full response feels slow.

Solution: I used Python's generators and FastAPI's StreamingResponse to push tokens to the frontend as they are generated. On the client side, we used TextDecoder and a ReadableStream reader to append text in real-time, giving that satisfying typewriter effect.

3. Handling Local Models

Hardcoding models is a bad practice. Users want to use the models they have installed.

Solution: I built a dynamic endpoint (/api/v1/models/ollama) that queries the user's running Ollama instance for available tags.

# backend/app/api/endpoints/llm.py
@router.get("/ollama")
def  list_ollama_models():
 ollama_url =  f"{settings.OLLAMA_BASE_URL}/api/tags"
 response = requests.get(ollama_url)
  # ... returns list of models

This populates a dropdown in the UI, ensuring users never make typos when selecting llama3 or mistral.

The Result

The result is a snappy, functional platform where you can:

Create an Agent: "Marketing Assistant".
Upload Knowledge: Upload your company's branding PDF.
Chat: Ask "What are our brand colors?" and get an accurate answer cited from the PDF.