DEV Community

Anishka
Anishka

Posted on

1 2

Wunner

Wunner

Wunner is a simple terminal-based web search engine I built to understand the basic working of a search engine.

Link to Code

GitHub logo Anishka0107 / Wunner

A toy search engine

Wunner

A toy search engine that searches the web inside your terminal :p

Features

  • Implemented in C++14.
  • Crawls webpages progressively starting from seed URL(s).
  • Parses the documents and the query, trying to generate more appropriate results.
  • Builds an index (hash map) for the parsed documents.
  • The crawled documents and index are refreshed periodically.
  • Autocompletes query using a trie, based on most recently asked queries.
  • Maintains two threads, to allow refreshing the index and querying simultaneuosly.
  • Generates most relevant results in order ranked on the basis of harmonic mean of PageRank (to get the importance of webpage) and Okapi BM25 (to get query-based result) algorithm ranks.
  • Provides query suggestions (only when the input query does not generate any results), on the basis of common incorrect and correct words. Ranks them using n-gram algorithm and edit-distance DP to compare two strings.

Steps to Run

Command to run : wunner_search (make sure your…

How I built it

It is purely written in C++14. It crawls webpages progressively starting from seed URL(s). Then it pre-processes the documents and builds an index for the parsed documents. The crawled documents and index are refreshed periodically. It autocompletes query using a trie, based on most recently asked queries. It generates the most relevant results in order ranked on the basis of harmonic mean of PageRank (to get the importance of a webpage) and Okapi BM25 (to get query-based results) algorithm ranks. It also provides query suggestions when the input query does not generate any results, on the basis of common incorrect and correct words (ranked using n-gram algorithm and edit-distance).

Additional Thoughts / Feelings / Stories

It was fun developing this project as it was built from scratch and allowed me to learn about different data structures and use them practically.

Final Note: Stay healthy, stay safe!

Do your career a big favor. Join DEV. (The website you're on right now)

It takes one minute, it's free, and is worth it for your career.

Get started

Community matters

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

AWS GenAI LIVE!

GenAI LIVE! is a dynamic live-streamed show exploring how AWS and our partners are helping organizations unlock real value with generative AI.

Tune in to the full event

DEV is partnering to bring live events to the community. Join us or dismiss this billboard if you're not interested. ❤️