DEV Community

Cover image for πŸ€–πŸ“š Build Your Own AI-Powered Book Chatbot using Python, Flask, Lang Chain, and Pinecone!
ujjwal
ujjwal

Posted on

πŸ€–πŸ“š Build Your Own AI-Powered Book Chatbot using Python, Flask, Lang Chain, and Pinecone!

Hey Devs! πŸ‘‹
Have you ever wanted to chat with your favorite books like you're texting a friend? πŸ“–πŸ’¬
Well, you're in the right place! In this blog post, I’ll walk you through how I built BookChatBot, an AI-powered chatbot that can answer questions about a book, using:

  • 🧠 LangChain (for LLM logic)
  • 🌲 Pinecone (for vector search)
  • 🧾 PDF loading and splitting
  • ⚑ Google Gemini (for answering questions)
  • πŸ§ͺ Flask (as the web framework)

You can find the full code on GitHub:
πŸ‘‰ GitHub Repo


πŸ’‘ What are we building?

We're building a chatbot web app that can read PDFs (like a book πŸ“˜), store them in Pinecone’s vector database, and allow users to ask questions about the content!
The AI will retrieve the most relevant chunks and generate human-like answers using Google's Gemini model.

πŸ’Έ Note: Pinecone’s free tier only allows one index. So for now, you can't dynamically upload new books β€” but once set up, it's super efficient for Q&A!


πŸ—‚οΈ Project Structure

bookchatbot-/
β”œβ”€β”€ app.py               # Flask app and RAG chain
β”œβ”€β”€ helper.py            # PDF loading, chunking, and embeddings
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ prompt.py        # System prompt for LLM
β”œβ”€β”€ data/                # Folder with your PDF files
β”œβ”€β”€ templates/
β”‚   └── chat.html        # Simple frontend
β”œβ”€β”€ .env                 # API keys (not shared!)
Enter fullscreen mode Exit fullscreen mode

🧠 How does it work?

This is a RAG (Retrieval-Augmented Generation) pipeline:

  1. Load and split PDFs into chunks
  2. Convert chunks into vector embeddings
  3. Store in Pinecone (vector DB)
  4. Accept user question
  5. Find the top relevant chunks (via Pinecone)
  6. Use Gemini to answer based on retrieved content

🧾 helper.py – Preprocessing the PDFs

def load_pdf(data):
    loader = DirectoryLoader(data, glob="*.pdf", loader_cls=PyMuPDFLoader)
    return loader.load()
Enter fullscreen mode Exit fullscreen mode

πŸ“₯ We load all PDFs from the data/ folder.

def text_splitter(extraced_date):
    text_split = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
    return text_split.split_documents(extraced_date)
Enter fullscreen mode Exit fullscreen mode

πŸ“š We split documents into manageable 500-token chunks to help with better retrieval.

def load_geneni_embeddings():
    embeddings = GoogleGenerativeAIEmbeddings(
        model="models/embedding-001",
        google_api_key=os.getenv("GOOGLE_API_KEY")
    )
    return embeddings
Enter fullscreen mode Exit fullscreen mode

πŸ” We use Google's embedding model to turn text chunks into vectors!


πŸš€ app.py – The Flask App + AI Brain

We start by setting up Pinecone:

pc = Pinecone(api_key= PINECONE_API_KEY)
docsearch = PineconeVectorStore.from_existing_index(index_name="bookchat", embedding=embeddings)
retriver = docsearch.as_retriever(search_type="similarity", search_kwargs={"k": 3})
Enter fullscreen mode Exit fullscreen mode

🧠 This allows us to retrieve 3 most similar chunks from our stored book.

Then we build a prompt + Gemini LLM:

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("human", "{input}"),
])
Enter fullscreen mode Exit fullscreen mode

πŸ’¬ system_prompt defines how the AI should behave (e.g., polite, detailed).

Create the RAG chain:

question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriver, question_answer_chain)
Enter fullscreen mode Exit fullscreen mode

πŸ’‘ This is the brain of the chatbot β€” retrieval + generation.

Finally, the Flask endpoints:

@app.route("/")
def index():
    return render_template('chat.html')

@app.route("/get", methods=["GET", "POST"])
def chat():
    msg = request.form["msg"]
    response = rag_chain.invoke({"input": msg})
    return str(response["answer"])
Enter fullscreen mode Exit fullscreen mode

πŸ“‘ The front end sends a message β†’ gets a smart reply from the AI!


πŸ§ͺ Testing it Out

Just run:

python app.py
Enter fullscreen mode Exit fullscreen mode

Then open http://localhost:8080 and start chatting with your book! πŸ—¨οΈπŸ“•


⚠️ Limitations

  • Pinecone’s free tier = only one index. So, you can't upload new books at runtime unless you upgrade or manage your own embedding storage.
  • Static loading: you must re-run the app if you want to embed a different book.
  • Basic HTML frontend – could be upgraded with React, Tailwind, or Chat UI kits.

πŸ› οΈ Ideas for Improvements

  • Add file upload (if using a paid Pinecone plan or local vector store like FAISS)
  • Use streaming responses for a more chat-like feel
  • Add authentication and user-specific history
  • Display source chunk(s) below each answer for transparency

🌐 Conclusion

Building an AI chatbot like this is easier than ever thanks to:

  • 🧠 LangChain for chaining LLM workflows
  • 🌲 Pinecone for fast vector search
  • ⚑ Google Gemini for intelligent responses
  • πŸ§ͺ Flask for quick APIs

If you liked this post, don’t forget to ⭐ the GitHub repo and follow me here on Dev.to!

Got questions or ideas? Drop them below! πŸ’¬πŸ‘‡


Top comments (0)