I just built a self-contained AI chatbotโno cloud dependencies, no API keys, just pure local power with ๐๐๐ง๐ ๐๐ซ๐๐ฉ๐ก, ๐๐ฅ๐ฅ๐๐ฆ๐, ๐ ๐๐๐๐, ๐๐ง๐ ๐๐๐ง๐ ๐๐ฆ๐ข๐ญ๐ก.
Tech Stack:
๐ค Build a fully local AI chatbot that answers questions from uploaded documents, with zero cloud dependencies.
๐ง LangChain: Orchestrate the chatbot logic using modular, state-based workflows (e.g., retrieve โ generate โ feedback).
๐ Ollama + FAISS + TF-IDF: Run a local llama3 model for response generation, and use TF-IDF + FAISS for fast, document-based context retrieval.
๐ฅ Streamlit: Provide an interactive web interface where users can upload files and chat with the bot in real time.
๐ LangSmith: Enable full observability โ trace queries, inspect prompts, monitor latency, and analyze errors or retrieval issues end-to-end.
At first, it answered questions well enough.
But the real game-changer?
I could trace every step of its reasoning.
๐๐๐ง๐ ๐๐ฆ๐ข๐ญ๐ก gave me the transparency I never knew I needed, revealing the exact document chunks retrieved, the prompts fed to the model, execution times, and even where things went off track.
๐ง The Problem: โIt Worksโ Isnโt Enough
At first, my chatbot seemed to be doing well โ it returned reasonable answers to most questions. But thenโฆ weird things started to happen..
โ Was the wrong chunk retrieved?
โ Was the prompt malformed?
โ Did the model hallucinate?
Without insight into what was happening step by step, debugging was pure guesswork.
๐๐จ๐ฐ ๐๐๐ง๐ ๐๐ฆ๐ข๐ญ๐ก ๐๐๐ฅ๐ฉ๐๐ ๐๐ ๐๐๐๐ฎ๐ ๐๐ง๐ ๐๐ฆ๐ฉ๐ซ๐จ๐ฏ๐
โ Tracing
โ View each query, chunk retrieval, and LLM response in real-time
โ Confirm the right part of the document was being used
โ Inspect the exact prompt given to llama3 via Ollama
โ Error Analysis
โ Trace misfires back to irrelevant or empty document chunks
โ Compare expected vs. actual outputs
โ Catch malformed inputs or slow model responses
โ Performance Metrics
โ Track latency for each step (retriever, LLM)
โ Identify slowdowns during Ollama inference
โ Start tagging โslowโ or โretrieval_missโ runs for dashboards
๐ Scaling Visibility with LangSmith Dashboards
LangSmith doesnโt just log traces โ it helps you monitor trends over time.
Using their dashboard tools, I now track:
๐ง Number of LLM calls
๐ Average latency per query
๐ Retrieval failures
๐ธ Token usage (if using APIs like OpenAI or Anthropic)
โ Error Rates: Identify failed runs, exceptions, or empty prompts
Iโve published the full working project on GitHub โ complete with TF-IDF + FAISS retrieval, Ollama model integration, LangSmith observability, and a Streamlit interface.
https://lnkd.in/etpCMPiS
Top comments (0)