DEV Community

Anishka
Anishka

Posted on

1 2

Wunner

Wunner

Wunner is a simple terminal-based web search engine I built to understand the basic working of a search engine.

Link to Code

GitHub logo Anishka0107 / Wunner

A toy search engine

Wunner

A toy search engine that searches the web inside your terminal :p

Features

  • Implemented in C++14.
  • Crawls webpages progressively starting from seed URL(s).
  • Parses the documents and the query, trying to generate more appropriate results.
  • Builds an index (hash map) for the parsed documents.
  • The crawled documents and index are refreshed periodically.
  • Autocompletes query using a trie, based on most recently asked queries.
  • Maintains two threads, to allow refreshing the index and querying simultaneuosly.
  • Generates most relevant results in order ranked on the basis of harmonic mean of PageRank (to get the importance of webpage) and Okapi BM25 (to get query-based result) algorithm ranks.
  • Provides query suggestions (only when the input query does not generate any results), on the basis of common incorrect and correct words. Ranks them using n-gram algorithm and edit-distance DP to compare two strings.

Steps to Run

Command to run : wunner_search (make sure your…

How I built it

It is purely written in C++14. It crawls webpages progressively starting from seed URL(s). Then it pre-processes the documents and builds an index for the parsed documents. The crawled documents and index are refreshed periodically. It autocompletes query using a trie, based on most recently asked queries. It generates the most relevant results in order ranked on the basis of harmonic mean of PageRank (to get the importance of a webpage) and Okapi BM25 (to get query-based results) algorithm ranks. It also provides query suggestions when the input query does not generate any results, on the basis of common incorrect and correct words (ranked using n-gram algorithm and edit-distance).

Additional Thoughts / Feelings / Stories

It was fun developing this project as it was built from scratch and allowed me to learn about different data structures and use them practically.

Final Note: Stay healthy, stay safe!

Image of Datadog

The Essential Toolkit for Front-end Developers

Take a user-centric approach to front-end monitoring that evolves alongside increasingly complex frameworks and single-page applications.

Get The Kit

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay