DEV Community

Anishka
Anishka

Posted on

Wunner

Wunner

Wunner is a simple terminal-based web search engine I built to understand the basic working of a search engine.

Link to Code

GitHub logo Anishka0107 / Wunner

A toy search engine

Wunner

A toy search engine that searches the web inside your terminal :p

Features

  • Implemented in C++14.
  • Crawls webpages progressively starting from seed URL(s).
  • Parses the documents and the query, trying to generate more appropriate results.
  • Builds an index (hash map) for the parsed documents.
  • The crawled documents and index are refreshed periodically.
  • Autocompletes query using a trie, based on most recently asked queries.
  • Maintains two threads, to allow refreshing the index and querying simultaneuosly.
  • Generates most relevant results in order ranked on the basis of harmonic mean of PageRank (to get the importance of webpage) and Okapi BM25 (to get query-based result) algorithm ranks.
  • Provides query suggestions (only when the input query does not generate any results), on the basis of common incorrect and correct words. Ranks them using n-gram algorithm and edit-distance DP to compare two strings.

Steps to Run

Command to run : wunner_search (make sure your…

How I built it

It is purely written in C++14. It crawls webpages progressively starting from seed URL(s). Then it pre-processes the documents and builds an index for the parsed documents. The crawled documents and index are refreshed periodically. It autocompletes query using a trie, based on most recently asked queries. It generates the most relevant results in order ranked on the basis of harmonic mean of PageRank (to get the importance of a webpage) and Okapi BM25 (to get query-based results) algorithm ranks. It also provides query suggestions when the input query does not generate any results, on the basis of common incorrect and correct words (ranked using n-gram algorithm and edit-distance).

Additional Thoughts / Feelings / Stories

It was fun developing this project as it was built from scratch and allowed me to learn about different data structures and use them practically.

Final Note: Stay healthy, stay safe!

Top comments (0)