DEV Community

Cover image for Building a Semantic Search Engine for Any Website Using React, Django & Milvus Lite
THIYAGARAJAN varadharajan
THIYAGARAJAN varadharajan

Posted on

Building a Semantic Search Engine for Any Website Using React, Django & Milvus Lite

I recently built an end-to-end semantic search application that takes any website URL + a user query, and returns the top 10 most relevant HTML content chunks—all using embeddings and a vector database.

🔹 Tech Stack

Frontend: React 18 + Vite

Backend: Django 5 + Django REST Framework

NLP: BERT tokenizer + Sentence-Transformers

Vector DB: Milvus Lite (with cosine similarity fallback)

🔹 Processing Pipeline

Fetch + clean HTML (BeautifulSoup)

Extract DOM blocks (h1–h6, p, li, code, etc.)

Chunk to ≤500 tokens (BERT limitation)

Embed blocks

Store + search in vector DB

Rank + return top-10 results

🔹 Frontend Highlights

Card-based UI

Snippet + full HTML tabs

Show more/less

Copy markup button

Optional highlight

🔹 Challenges

Preserving readable HTML while staying under 500 tokens

Milvus Lite issues on Windows → fallback to cosine

First-run embedding model download delays

🔹 Lessons Learned

DOM-block chunking improves readability

Normalized embeddings enable consistent similarity scores

Toggle-based UI improves UX

🔹 What's Next?

✅ Multi-page crawling
✅ Better DOM coverage (tables, figures, captions)
✅ Server caching of embeddings
✅ Plug-in vector DB support (Pinecone, Weaviate, etc.)

🎥 I also recorded a full demo video. Happy to share or open source it soon!

visit linkedin:https://www.linkedin.com/in/thiyagu26v/
project repository:https://github.com/thiyagu26v/website-content-django

other social:
myportfolio : https://thiyagu26v.github.io/myreactportfolio/

linktree : https://linktr.ee/thiyagu26v

Github : https://github.com/thiyagu26v

Forem : https://forem.com/thiyagu26v

Medium : https://medium.com/@thiyagu26v

X : https://x.com/thiyagu26v

Instagram : https://www.instagram.com/thiyagu26v

Dev.io : https://dev.to/thiyagu26v

stack overflow : https://stackoverflow.com/users/31647359/thiyagarajan-varadharajan

Facebook : https://www.facebook.com/thiyagu26v

Top comments (0)