So I had this painful idea: “Let’s build a search engine from scratch. How hard could it be?”
Turns out… it’s very hard.
I’m still deep in the process, but I thought I’d share the roadblocks, mistakes, and ongoing headaches of trying to piece together a crawler, parser, indexer, and frontend into something that kind of works like Google, but much worse.
The Architecture I Thought Made Sense
I wanted the project to be modular so each part could work independently and maybe even scale on its own. Here’s the rough breakdown:
Crawler (Python): scrapes web pages.
Parser (Go): extracts useful HTML/text.
Indexer (Go): builds the search index.
The Search API (Python): Query Handling
Frontend (Vue.js): where users type queries.
Database (PostgreSQL): keeps it all together.
Sounds clean, right? Reality check incoming:
The Crawler (Python)
Expectation: “I’ll just use requests and BeautifulSoup. Easy.”
Reality:
Handling robots.txt is more complicated than I thought.
I ran into infinite loops (crawling the same site repeatedly).
Some pages block crawlers unless you spoof headers.
There’s also rate limiting…I once brought down my own WiFi.
I realized that crawling is a deep pit of edge cases. Suddenly, I understood why companies spend millions just to maintain crawlers.
Parsing in Go
Why Go? Honestly, I wanted better performance and concurrency.
Struggles:
Parsing HTML in Go is not as simple as in Python.
The libraries feel basic, which is great for speed but not for ease of use.
Dealing with broken HTML is a nightmare because the web is messy.
Also, constantly switching between Python and Go is tiring.
Indexing in Go
I thought this part would be fun. Instead, I faced:
Tokenization across different languages. Do I really want to support Japanese search right now? Nope.
Efficiently storing inverted indexes.
Designing queries that don’t lead to full-table scans in PostgreSQL.
It turns out that writing a search engine involves reinventing many wheels that ElasticSearch solved years ago. But hey, it’s a learning experience, right?
The Search Query Handling
This is where I’m currently stuck.
How do I balance relevance scoring?
Should I use TF-IDF, BM25, or just return whatever matches?
Pagination seems straightforward until you realize that queries need to stay consistent across refreshes.
On top of that, I chose to use FastAPI for the search endpoint because it’s fast and modern. The reality is:
Writing routes is fine, but managing async and sync code with my Go indexer is confusing.
Every small change means I have to restart everything just to check if my API still communicates with the index correctly.
The documentation makes it look simple, so when I encounter real-world problems, like CORS issues, JSON serialization quirks, or blocking DB calls, I just sit there staring at the screen.
Right now, search results are random at best. If you search for “Python,” you might find blog posts about cooking. I’m calling it creative search.
Frontend with Vue.js
Honestly, this has been the easiest part. Vue makes building the search bar and results page feel good. The struggle is all in the backend right now.
The frontend does show how poor the backend is. Nothing is more disheartening than entering a query and seeing your own project fail.
PostgreSQL Adventures
I love Postgres, but:
Deciding what should go in the database versus what should stay in-memory indexes has been tricky.
I’ve had to reset and rebuild my schema about ten times.
Full-text search in Postgres is decent, but combining it with my custom Go indexer is confusing.
What I’ve Learned So Far
Search is hard. Really hard.
Premature optimization kills momentum. I probably didn’t need Go for parsing and indexing yet.
The web is chaotic. Crawlers reveal just how broken many websites are.
It’s okay to stand on the shoulders of giants. ElasticSearch, Solr, Meilisearch, they exist for a reason.
Still Struggling With…
Making indexing fast enough without consuming too much memory.
Relevance scoring that doesn’t feel random.
Keeping modules communicating with each other smoothly (Python to Go to Postgres to Vue).
Motivation, honestly. Sometimes I question if I should’ve just built a to-do app.
Closing Thoughts
Even though it’s frustrating, I’ve learned more about how search engines actually work in the past weeks than I ever did while just using them.
If you’re considering building one yourself, do it for the learning experience, not just for the final product. Or at least, don’t expect to outdo Google with just a weekend hackathon effort.
Top comments (0)