DEV Community

Cover image for Building Agentbase: An Open Source RAG Platform with Next.js, FastAPI, and Agno
Harish Kotra (he/him)
Harish Kotra (he/him)

Posted on

Building Agentbase: An Open Source RAG Platform with Next.js, FastAPI, and Agno

In this post, we'll dive into how we built Agentbase, a full-stack platform that lets anyone build, manage, and deploy knowledge-grounded AI agents. We wanted a solution that combined the sleekness of a modern web UI with the power of Python-based agent frameworks, all while keeping the vector storage local and fast.

The Stack

I chose a "best of breed" approach for the tech stack:

  • Frontend: Next.js 14 (App Router) for a responsive, server-rendered UI.
  • Backend: FastAPI for high-performance Python APIs.
  • Agent Framework: Agno (formerly Phidata) for orchestrating LLMs and tools.
  • Vector DB: LanceDB for embedded, serverless vector search.
  • Auth/DB: Supabase for user management and metadata storage.
  • LLM: Ollama for running models locally (privacy-first!).

Key Challenges & Solutions

1. Robust RAG Ingestion

One of the hardest parts of RAG is reliable file ingestion. I initially faced issues where large documents were being chunked poorly, leading to bad search results.

Solution: I implemented a CustomTextReader in our backend to enforce strict chunk sizes. Additionally, I added a "lazy-loading" mechanism. Instead of hoping background tasks finish, our agent runner checks if the knowledge base is loaded at runtime before answering queries.

# backend/app/services/agent_runner.py
if knowledge:
  # Lazy load: Re-load only if not already populated to ensure robustness
 knowledge.load(recreate=False)
Enter fullscreen mode Exit fullscreen mode

2. Streaming Responses

Users expect AI to "type" out answers. Waiting for a full response feels slow.

Solution: I used Python's generators and FastAPI's StreamingResponse to push tokens to the frontend as they are generated. On the client side, we used TextDecoder and a ReadableStream reader to append text in real-time, giving that satisfying typewriter effect.

3. Handling Local Models

Hardcoding models is a bad practice. Users want to use the models they have installed.

Solution: I built a dynamic endpoint (/api/v1/models/ollama) that queries the user's running Ollama instance for available tags.

# backend/app/api/endpoints/llm.py
@router.get("/ollama")
def  list_ollama_models():
 ollama_url =  f"{settings.OLLAMA_BASE_URL}/api/tags"
 response = requests.get(ollama_url)
  # ... returns list of models
Enter fullscreen mode Exit fullscreen mode

This populates a dropdown in the UI, ensuring users never make typos when selecting llama3 or mistral.

The Result

The result is a snappy, functional platform where you can:

  1. Create an Agent: "Marketing Assistant".
  2. Upload Knowledge: Upload your company's branding PDF.
  3. Chat: Ask "What are our brand colors?" and get an accurate answer cited from the PDF.

Result 1

Result 2

Result 3

Result 4

Result 5

Result 6

What's Next?

Somethings I plan to add in the future:

  • Multi-agent orchestration (teams of agents).
  • Web browsing tools for live research.
  • One-click deployment to Vercel/Render.

Check out the code on GitHub and start building your own agent army today!

Top comments (0)