Rishiraj Bal

Posted on Jul 8

RishiGPT: Building a Multi-Agent, Memory-Aware AI with Streamlit, LangChain & Pinecone

In this post, I’ll break down every single component of RishiGPT — an AI-native, developer-first chatbot and orchestration system that combines modern LLM stacks, Retrieval-Augmented Generation (RAG), graph-based state machines, and persistent vector memory to push beyond the usual “prompt in, text out” paradigm.

I’ll cover:

The core design goals.
How the Streamlit apps are structured.
How file-based RAG works (PDF, Text, Web, GitHub).
How persistent Pinecone indexes integrate with Cohere embeddings.
How LangGraph adds conditional flows and multi-agent orchestration.
How this grows into true multimodal and voice-enabled AI.
Exact repo layout, API key strategy, and gotchas.
Full code snippets showing how each piece plugs together.

Let’s unpack it, one layer at a time.

What Problem Does RishiGPT Solve?

Most chatbots are either:

A plain LLM frontend with zero context memory.
A simple PDF Q&A app that forgets everything once you close the tab.
Or a brittle RAG pipeline stuck on one format.

RishiGPT’s mission is to unify live web search, file-based RAG, repository indexing, graph-based memory, and multi-step agent orchestration — all wrapped in a single developer-friendly interface that runs on Streamlit.

It’s not just a bot — it’s a framework for building knowledge workflows:

Embed once, query forever.
Search your own data, plus the web.
Use memory to keep context alive.
Orchestrate multi-agent tasks conditionally.
Expand into voice and image generation when funded.

Key Tech Choices

Streamlit: Fast prototyping for interactive apps.
LangChain: File loaders, chunking, embeddings, Conversational Retrieval Chain.
FAISS: Local vector DB for in-session RAG.
Pinecone: Persistent vector storage.
Cohere: Cheap, robust English embeddings.
Groq: High-performance LLaMA inference.
LangGraph[Currently in development]: Graph-based, conditional multi-step orchestration.

The Main Chatbot: `RishiGPT`

1. Web Search Agent

When you enable web search via a sidebar checkbox:

Loads the serpapi LangChain tool.
Sets up a ZERO_SHOT_REACT_DESCRIPTION agent.
Passes user queries to SerpAPI, pipes results back through Groq’s LLaMA.
Uses a ConversationBufferMemory for minimal in-session memory.

from langchain.agents import load_tools, initialize_agent, AgentType

tools = load_tools(["serpapi"], llm=model)
agent = initialize_agent(
    llm=model,
    tools=tools,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=False,
    memory=None
)

2. File-Based RAG

Select:

PDF
Text
Website
GitHub repo

Each uses:

LangChain loaders: PyPDFLoader, TextLoader, WebBaseLoader, GitLoader.
RecursiveCharacterTextSplitter for chunking.
HuggingFaceEmbeddings for local embeddings.
FAISS for local vector store.

Example PDF chunking:

splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(loader.load())

The ConversationalRetrievalChain is built on the fly:

chain = ConversationalRetrievalChain.from_llm(
    llm=model,
    retriever=db.as_retriever(search_kwargs={"k": 5}),
    memory=session_state.rag_memory,
    return_source_documents=True,
    combine_docs_chain_kwargs={"prompt": prompt_template}
)

3. GitHub Repo RAG

Clone a public repo on the fly:

from git import Repo

repo = Repo.clone_from(url, to_path=tempdir)
loader = GitLoader(repo_path=tempdir, branch=repo.head.reference)
docs = loader.load()

Same split + embed + FAISS pipeline applies.

4. Multi-Mode Toggle

The sidebar logic ensures you can’t run file RAG and web search at the same time — avoids mixing conflicting contexts.

if use_rag:
    use_serp = False  # or vice versa

5. Persistent Memory Mode

The separate Embedding Station and Deposition Chat handle permanent storage:

CohereEmbeddings → Pinecone.
Create new index → chunk docs → embed → store.
Later, use PineconeVectorStore retriever in another Streamlit app.
Keeps your knowledge forever.

LangGraph Orchestration: `RishiGPT_PLUS` [CURRENTLY IN DEVELOPMENT]

This is where things get interesting.

A simple state graph:

Pick a model: Groq, OpenAI, Cohere, Gemini.
Pick a task: Chatbot, RAG, Web Search.
If RAG: pick flavor → In-session, Pinecone, Hybrid.
End or loop.

Each node is a function:

from langgraph.graph import StateGraph, add_messages, START, END

class RishiGPTPLUS(TypedDict):
    messages: Annotated[list, add_messages]
    model: str

def choose_model(state: RishiGPTPLUS) -> str:
    return "groq"

def model_groq(state: RishiGPTPLUS) -> RishiGPTPLUS:
    return {"messages": state["messages"] + ["Using GROQ"], "model": "groq"}

The graph:

build3 = StateGraph(RishiGPTPLUS)
build3.add_node("choose_model", choose_model)
build3.add_node("groq", model_groq)
build3.add_edge(START, "choose_model")
build3.add_edge("choose_model", "groq")
...
graph3 = build3.compile()

This is a blueprint for future expansions:

Branch flows.
Conditional tasks.
Multi-agent orchestration.

How Persistent Memory Works

Create index via Embedding Station.
Use CohereEmbeddings (embed-english-light-v3.0).
Pinecone index = store chunks.
Later, Deposition Chat pulls it:

pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
retriever = PineconeVectorStore(index=index, embedding=embeddings).as_retriever()

Future Roadmap

LangGraph orchestration: True multi-agent, dynamic workflows.

RAGAnything: PDFs, web, GitHub, but also:

Video transcripts.
Spreadsheets.
Notion wikis.
HTML dumps.
Raw JSON, XML.
Audio pipelines.
Slide decks, research datasets.

Memory-aware chat: Graph-based, chunked vector memory for full context retention.

Voice + Images: When funded, plug in DALL·E / Stable Diffusion + TTS/STT.
API usage isn’t cheap — contributors can share OpenAI keys to offset costs.

n8n Connectors: Run real APIs, webhooks, databases from chat.

Role Switching: Dev co-pilot, student tutor, research agent — switch modes instantly.

How to Run Locally

git clone https://github.com/Rishirajbal/RishiGPT.git
cd RishiGPT
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
streamlit run app_2.py

Create API_KEYS.env:

GROQ_API_KEY="..."
SERPAPI_API_KEY="..."

Or use .streamlit/secrets.toml for Streamlit Cloud.

Final Thoughts

RishiGPT is not just a chatbot — it’s a foundation for your own self-hosted LLM lab:

One repo for ad-hoc file chat.
One repo for orchestration experiments.
One repo for persistent RAG memory.

It’s easy to fork, adapt, and extend:

Swap Groq for OpenAI or Cohere.
Use LangGraph for custom flows.
Plug in Pinecone or local FAISS.
Add new loaders — CSV, HTML, Notion.
Wire up image or audio models when budget permits.

Repos

Questions?
Open an issue, fork it, push a PR.
RishiGPT is open to your ideas.

Author

Rishiraj Bal — Building practical LLM systems, robust RAG pipelines, and next-generation AI orchestration frameworks.

DEV Community

RishiGPT: Building a Multi-Agent, Memory-Aware AI with Streamlit, LangChain & Pinecone

What Problem Does RishiGPT Solve?

Key Tech Choices

The Main Chatbot: `RishiGPT`

1. Web Search Agent

2. File-Based RAG

3. GitHub Repo RAG

4. Multi-Mode Toggle

5. Persistent Memory Mode

LangGraph Orchestration: `RishiGPT_PLUS` [CURRENTLY IN DEVELOPMENT]

How Persistent Memory Works

Future Roadmap

How to Run Locally

Final Thoughts

Repos

Author

Top comments (0)

What Problem Does RishiGPT Solve?

Key Tech Choices

The Main Chatbot: RishiGPT

1. Web Search Agent

2. File-Based RAG

3. GitHub Repo RAG

4. Multi-Mode Toggle

5. Persistent Memory Mode

LangGraph Orchestration: RishiGPT_PLUS [CURRENTLY IN DEVELOPMENT]

How Persistent Memory Works

Future Roadmap

How to Run Locally

Final Thoughts

Repos

Author

The Main Chatbot: `RishiGPT`

LangGraph Orchestration: `RishiGPT_PLUS` [CURRENTLY IN DEVELOPMENT]