In this post, I’ll break down every single component of RishiGPT — an AI-native, developer-first chatbot and orchestration system that combines modern LLM stacks, Retrieval-Augmented Generation (RAG), graph-based state machines, and persistent vector memory to push beyond the usual “prompt in, text out” paradigm.
I’ll cover:
- The core design goals.
- How the Streamlit apps are structured.
- How file-based RAG works (PDF, Text, Web, GitHub).
- How persistent Pinecone indexes integrate with Cohere embeddings.
- How LangGraph adds conditional flows and multi-agent orchestration.
- How this grows into true multimodal and voice-enabled AI.
- Exact repo layout, API key strategy, and gotchas.
- Full code snippets showing how each piece plugs together.
Let’s unpack it, one layer at a time.
What Problem Does RishiGPT Solve?
Most chatbots are either:
- A plain LLM frontend with zero context memory.
- A simple PDF Q&A app that forgets everything once you close the tab.
- Or a brittle RAG pipeline stuck on one format.
RishiGPT’s mission is to unify live web search, file-based RAG, repository indexing, graph-based memory, and multi-step agent orchestration — all wrapped in a single developer-friendly interface that runs on Streamlit.
It’s not just a bot — it’s a framework for building knowledge workflows:
- Embed once, query forever.
- Search your own data, plus the web.
- Use memory to keep context alive.
- Orchestrate multi-agent tasks conditionally.
- Expand into voice and image generation when funded.
Key Tech Choices
- Streamlit: Fast prototyping for interactive apps.
- LangChain: File loaders, chunking, embeddings, Conversational Retrieval Chain.
- FAISS: Local vector DB for in-session RAG.
- Pinecone: Persistent vector storage.
- Cohere: Cheap, robust English embeddings.
- Groq: High-performance LLaMA inference.
- LangGraph[Currently in development]: Graph-based, conditional multi-step orchestration.
The Main Chatbot: RishiGPT
1. Web Search Agent
When you enable web search via a sidebar checkbox:
- Loads the
serpapi
LangChain tool. - Sets up a
ZERO_SHOT_REACT_DESCRIPTION
agent. - Passes user queries to SerpAPI, pipes results back through Groq’s LLaMA.
- Uses a
ConversationBufferMemory
for minimal in-session memory.
from langchain.agents import load_tools, initialize_agent, AgentType
tools = load_tools(["serpapi"], llm=model)
agent = initialize_agent(
llm=model,
tools=tools,
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
verbose=False,
memory=None
)
2. File-Based RAG
Select:
- Text
- Website
- GitHub repo
Each uses:
-
LangChain
loaders:PyPDFLoader
,TextLoader
,WebBaseLoader
,GitLoader
. -
RecursiveCharacterTextSplitter
for chunking. -
HuggingFaceEmbeddings
for local embeddings. -
FAISS
for local vector store.
Example PDF chunking:
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(loader.load())
The ConversationalRetrievalChain
is built on the fly:
chain = ConversationalRetrievalChain.from_llm(
llm=model,
retriever=db.as_retriever(search_kwargs={"k": 5}),
memory=session_state.rag_memory,
return_source_documents=True,
combine_docs_chain_kwargs={"prompt": prompt_template}
)
3. GitHub Repo RAG
Clone a public repo on the fly:
from git import Repo
repo = Repo.clone_from(url, to_path=tempdir)
loader = GitLoader(repo_path=tempdir, branch=repo.head.reference)
docs = loader.load()
Same split + embed + FAISS pipeline applies.
4. Multi-Mode Toggle
The sidebar logic ensures you can’t run file RAG and web search at the same time — avoids mixing conflicting contexts.
if use_rag:
use_serp = False # or vice versa
5. Persistent Memory Mode
The separate Embedding Station and Deposition Chat handle permanent storage:
-
CohereEmbeddings
→ Pinecone. - Create new index → chunk docs → embed → store.
- Later, use
PineconeVectorStore
retriever in another Streamlit app. - Keeps your knowledge forever.
LangGraph Orchestration: RishiGPT_PLUS
[CURRENTLY IN DEVELOPMENT]
This is where things get interesting.
A simple state graph:
- Pick a model: Groq, OpenAI, Cohere, Gemini.
- Pick a task: Chatbot, RAG, Web Search.
- If RAG: pick flavor → In-session, Pinecone, Hybrid.
- End or loop.
Each node is a function:
from langgraph.graph import StateGraph, add_messages, START, END
class RishiGPTPLUS(TypedDict):
messages: Annotated[list, add_messages]
model: str
def choose_model(state: RishiGPTPLUS) -> str:
return "groq"
def model_groq(state: RishiGPTPLUS) -> RishiGPTPLUS:
return {"messages": state["messages"] + ["Using GROQ"], "model": "groq"}
The graph:
build3 = StateGraph(RishiGPTPLUS)
build3.add_node("choose_model", choose_model)
build3.add_node("groq", model_groq)
build3.add_edge(START, "choose_model")
build3.add_edge("choose_model", "groq")
...
graph3 = build3.compile()
This is a blueprint for future expansions:
- Branch flows.
- Conditional tasks.
- Multi-agent orchestration.
How Persistent Memory Works
- Create index via Embedding Station.
- Use
CohereEmbeddings
(embed-english-light-v3.0
). - Pinecone index = store chunks.
- Later,
Deposition Chat
pulls it:
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
retriever = PineconeVectorStore(index=index, embedding=embeddings).as_retriever()
Future Roadmap
LangGraph orchestration: True multi-agent, dynamic workflows.
RAGAnything: PDFs, web, GitHub, but also:
- Video transcripts.
- Spreadsheets.
- Notion wikis.
- HTML dumps.
- Raw JSON, XML.
- Audio pipelines.
- Slide decks, research datasets.
Memory-aware chat: Graph-based, chunked vector memory for full context retention.
Voice + Images: When funded, plug in DALL·E / Stable Diffusion + TTS/STT.
API usage isn’t cheap — contributors can share OpenAI keys to offset costs.
n8n Connectors: Run real APIs, webhooks, databases from chat.
Role Switching: Dev co-pilot, student tutor, research agent — switch modes instantly.
How to Run Locally
git clone https://github.com/Rishirajbal/RishiGPT.git
cd RishiGPT
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
streamlit run app_2.py
Create API_KEYS.env
:
GROQ_API_KEY="..."
SERPAPI_API_KEY="..."
Or use .streamlit/secrets.toml
for Streamlit Cloud.
Final Thoughts
RishiGPT is not just a chatbot — it’s a foundation for your own self-hosted LLM lab:
- One repo for ad-hoc file chat.
- One repo for orchestration experiments.
- One repo for persistent RAG memory.
It’s easy to fork, adapt, and extend:
- Swap Groq for OpenAI or Cohere.
- Use LangGraph for custom flows.
- Plug in Pinecone or local FAISS.
- Add new loaders — CSV, HTML, Notion.
- Wire up image or audio models when budget permits.
Repos
Questions?
Open an issue, fork it, push a PR.
RishiGPT is open to your ideas.
Author
Rishiraj Bal — Building practical LLM systems, robust RAG pipelines, and next-generation AI orchestration frameworks.
Top comments (0)