How I Built a RAG-Based Law Assistant with LangChain and FAISS

#ai #llm #rag #showdev

As a CS student obsessed with LLMs, I wanted to build something that actually solved a real problem, not just another chatbot that hallucinated answers. Law felt like the perfect domain. The stakes are high, the documents are dense, and people genuinely need accurate answers from specific sources. So I built a RAG-based Law Assistant that answers legal questions directly from PDF documents.

What is RAG and Why Does It Matter Here?
RAG (Retrieval-Augmented Generation) — is a pattern where you don't just ask an LLM a question cold. Instead, you first retrieve relevant chunks from your own documents, then pass those chunks as context to the model. The model answers based on what's actually in your documents, not what it vaguely remembers from training.
For legal use cases, this is critical. You don't want the model guessing. You want it citing the right clause from the right document.

The Stack:

LangChain — orchestration and chaining
FAISS — vector store for fast similarity search
OpenAI / HuggingFace Embeddings — to convert text into vectors
PyPDF2 / pdfplumber — to extract text from PDFs
Python — everything glued together

How It Works:

Load and Split the PDFs
The first step is getting the text out of the PDF documents and splitting it into manageable chunks.
Embed and Store in FAISS
Next, each chunk gets converted into a vector embedding and stored in a FAISS index for fast retrieval later. FAISS is fast, lightweight, and runs entirely locally — no external database needed. For a student project, that's a big win.
Build Retrieval Chain
This is where LangChain shines. You wire up the retriever and the LLM into a chain that handles everything automatically. temperature=0 keeps the model grounded; for legal Q&A, you want deterministic, factual answers, not creative ones.
Ask Questions The retriever pulls the 4 most relevant chunks from the FAISS index, passes them to the LLM as context, and the model answers based solely on those chunks.

What I Learned:

Chunking strategy is everything; The quality of your answers is directly tied to how well you split your documents. I spent more time tuning chunk size and overlap than on anything else.
Temperature = 0 for factual domains; Any creativity from the model is a liability when you're answering legal questions.
FAISS is surprisingly powerful for local projects; No cloud setup, no API calls, just fast vector search on your machine.
RAG isn't magic, its more like garbage in, garbage out; If your PDF extraction is messy (and legal PDFs often are), your retrieval will be messy too. Clean extraction is the way to go.

What's Next: I want to extend this with;

A proper UI using Streamlit or React
Support for multiple documents simultaneously
Source citation — showing exactly which page and clause the answer came from

If you're a student trying to build something real with LLMs, RAG is one of the best patterns to learn first. It's practical, it's in demand, and it forces you to think about the full pipeline and not just prompt engineering.
Feel free to connect or drop questions in the comments.

Top comments (2)

Benjian Dai • Apr 24

Solid build — the instinct to go RAG-first for legal is exactly right. "Cite the right clause" is cleaner than how
most enterprise RAG teams articulate their own grounding requirement.

On your next step #3 (source citation with exact references): one failure mode worth flagging — the model will
sometimes paraphrase the cited chunk inside its answer even when the citation itself is correct. The headline looks
grounded, but the wording the user actually reads wasn't in the source. For high-stakes legal that's the scariest
class of failure — technically cited, substantively drifted.

Pattern I've landed on after a lot of grounding work: separate the layer that decides which chunks are citable from
the layer that generates prose, then mechanically verify every quote-worthy claim in the prose against a specific
source chunk. If a claim doesn't match, drop it. Empty output > plausibly-paraphrased output in legal.

Curious whether your plan is to expose the raw cited text verbatim to users, or let the LLM rephrase it in its
response?

Benjian Dai • Apr 28

Follow-up — just wrote up the broader pattern in more detail today. §3
of this piece is the "pre-extract → constrain → verify → fail closed"
layer applied to scraped social signal rather than legal PDFs, but the
mechanics translate one-to-one: dev.to/benjiandai/i-shipped-an-ai-...

The war-story in §5 (cross-language float serialization) is unrelated to
your domain but worth a skim — it's the kind of silent-failure bug that
RAG pipelines are uniquely vulnerable to.