DEV Community

Rishiraj Bal
Rishiraj Bal

Posted on

RishiGPT: Building a Multi-Agent, Memory-Aware AI with Streamlit, LangChain & Pinecone

In this post, I’ll break down every single component of RishiGPT — an AI-native, developer-first chatbot and orchestration system that combines modern LLM stacks, Retrieval-Augmented Generation (RAG), graph-based state machines, and persistent vector memory to push beyond the usual “prompt in, text out” paradigm.

I’ll cover:

  • The core design goals.
  • How the Streamlit apps are structured.
  • How file-based RAG works (PDF, Text, Web, GitHub).
  • How persistent Pinecone indexes integrate with Cohere embeddings.
  • How LangGraph adds conditional flows and multi-agent orchestration.
  • How this grows into true multimodal and voice-enabled AI.
  • Exact repo layout, API key strategy, and gotchas.
  • Full code snippets showing how each piece plugs together.

Let’s unpack it, one layer at a time.


What Problem Does RishiGPT Solve?

Most chatbots are either:

  1. A plain LLM frontend with zero context memory.
  2. A simple PDF Q&A app that forgets everything once you close the tab.
  3. Or a brittle RAG pipeline stuck on one format.

RishiGPT’s mission is to unify live web search, file-based RAG, repository indexing, graph-based memory, and multi-step agent orchestration — all wrapped in a single developer-friendly interface that runs on Streamlit.

It’s not just a bot — it’s a framework for building knowledge workflows:

  • Embed once, query forever.
  • Search your own data, plus the web.
  • Use memory to keep context alive.
  • Orchestrate multi-agent tasks conditionally.
  • Expand into voice and image generation when funded.

Key Tech Choices

  • Streamlit: Fast prototyping for interactive apps.
  • LangChain: File loaders, chunking, embeddings, Conversational Retrieval Chain.
  • FAISS: Local vector DB for in-session RAG.
  • Pinecone: Persistent vector storage.
  • Cohere: Cheap, robust English embeddings.
  • Groq: High-performance LLaMA inference.
  • LangGraph[Currently in development]: Graph-based, conditional multi-step orchestration.

The Main Chatbot: RishiGPT

1. Web Search Agent

When you enable web search via a sidebar checkbox:

  • Loads the serpapi LangChain tool.
  • Sets up a ZERO_SHOT_REACT_DESCRIPTION agent.
  • Passes user queries to SerpAPI, pipes results back through Groq’s LLaMA.
  • Uses a ConversationBufferMemory for minimal in-session memory.
from langchain.agents import load_tools, initialize_agent, AgentType

tools = load_tools(["serpapi"], llm=model)
agent = initialize_agent(
    llm=model,
    tools=tools,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=False,
    memory=None
)
Enter fullscreen mode Exit fullscreen mode

2. File-Based RAG

Select:

  • PDF
  • Text
  • Website
  • GitHub repo

Each uses:

  • LangChain loaders: PyPDFLoader, TextLoader, WebBaseLoader, GitLoader.
  • RecursiveCharacterTextSplitter for chunking.
  • HuggingFaceEmbeddings for local embeddings.
  • FAISS for local vector store.

Example PDF chunking:

splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(loader.load())
Enter fullscreen mode Exit fullscreen mode

The ConversationalRetrievalChain is built on the fly:

chain = ConversationalRetrievalChain.from_llm(
    llm=model,
    retriever=db.as_retriever(search_kwargs={"k": 5}),
    memory=session_state.rag_memory,
    return_source_documents=True,
    combine_docs_chain_kwargs={"prompt": prompt_template}
)
Enter fullscreen mode Exit fullscreen mode

3. GitHub Repo RAG

Clone a public repo on the fly:

from git import Repo

repo = Repo.clone_from(url, to_path=tempdir)
loader = GitLoader(repo_path=tempdir, branch=repo.head.reference)
docs = loader.load()
Enter fullscreen mode Exit fullscreen mode

Same split + embed + FAISS pipeline applies.


4. Multi-Mode Toggle

The sidebar logic ensures you can’t run file RAG and web search at the same time — avoids mixing conflicting contexts.

if use_rag:
    use_serp = False  # or vice versa
Enter fullscreen mode Exit fullscreen mode

5. Persistent Memory Mode

The separate Embedding Station and Deposition Chat handle permanent storage:

  • CohereEmbeddings → Pinecone.
  • Create new index → chunk docs → embed → store.
  • Later, use PineconeVectorStore retriever in another Streamlit app.
  • Keeps your knowledge forever.

LangGraph Orchestration: RishiGPT_PLUS [CURRENTLY IN DEVELOPMENT]

This is where things get interesting.

A simple state graph:

  • Pick a model: Groq, OpenAI, Cohere, Gemini.
  • Pick a task: Chatbot, RAG, Web Search.
  • If RAG: pick flavor → In-session, Pinecone, Hybrid.
  • End or loop.

Each node is a function:

from langgraph.graph import StateGraph, add_messages, START, END

class RishiGPTPLUS(TypedDict):
    messages: Annotated[list, add_messages]
    model: str

def choose_model(state: RishiGPTPLUS) -> str:
    return "groq"

def model_groq(state: RishiGPTPLUS) -> RishiGPTPLUS:
    return {"messages": state["messages"] + ["Using GROQ"], "model": "groq"}
Enter fullscreen mode Exit fullscreen mode

The graph:

build3 = StateGraph(RishiGPTPLUS)
build3.add_node("choose_model", choose_model)
build3.add_node("groq", model_groq)
build3.add_edge(START, "choose_model")
build3.add_edge("choose_model", "groq")
...
graph3 = build3.compile()
Enter fullscreen mode Exit fullscreen mode

This is a blueprint for future expansions:

  • Branch flows.
  • Conditional tasks.
  • Multi-agent orchestration.

How Persistent Memory Works

  1. Create index via Embedding Station.
  2. Use CohereEmbeddings (embed-english-light-v3.0).
  3. Pinecone index = store chunks.
  4. Later, Deposition Chat pulls it:
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
retriever = PineconeVectorStore(index=index, embedding=embeddings).as_retriever()
Enter fullscreen mode Exit fullscreen mode

Future Roadmap

LangGraph orchestration: True multi-agent, dynamic workflows.

RAGAnything: PDFs, web, GitHub, but also:

  • Video transcripts.
  • Spreadsheets.
  • Notion wikis.
  • HTML dumps.
  • Raw JSON, XML.
  • Audio pipelines.
  • Slide decks, research datasets.

Memory-aware chat: Graph-based, chunked vector memory for full context retention.

Voice + Images: When funded, plug in DALL·E / Stable Diffusion + TTS/STT.
API usage isn’t cheap — contributors can share OpenAI keys to offset costs.

n8n Connectors: Run real APIs, webhooks, databases from chat.

Role Switching: Dev co-pilot, student tutor, research agent — switch modes instantly.


How to Run Locally

git clone https://github.com/Rishirajbal/RishiGPT.git
cd RishiGPT
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
streamlit run app_2.py
Enter fullscreen mode Exit fullscreen mode

Create API_KEYS.env:

GROQ_API_KEY="..."
SERPAPI_API_KEY="..."
Enter fullscreen mode Exit fullscreen mode

Or use .streamlit/secrets.toml for Streamlit Cloud.


Final Thoughts

RishiGPT is not just a chatbot — it’s a foundation for your own self-hosted LLM lab:

  • One repo for ad-hoc file chat.
  • One repo for orchestration experiments.
  • One repo for persistent RAG memory.

It’s easy to fork, adapt, and extend:

  • Swap Groq for OpenAI or Cohere.
  • Use LangGraph for custom flows.
  • Plug in Pinecone or local FAISS.
  • Add new loaders — CSV, HTML, Notion.
  • Wire up image or audio models when budget permits.

Repos


Questions?
Open an issue, fork it, push a PR.
RishiGPT is open to your ideas.


Author

Rishiraj Bal — Building practical LLM systems, robust RAG pipelines, and next-generation AI orchestration frameworks.

Top comments (0)