Wunner
Wunner is a simple terminal-based web search engine I built to understand the basic working of a search engine.
Link to Code
Anishka0107
/
Wunner
A toy search engine
Wunner
A toy search engine that searches the web inside your terminal :p
Features
- Implemented in C++14.
- Crawls webpages progressively starting from seed URL(s).
- Parses the documents and the query, trying to generate more appropriate results.
- Builds an index (hash map) for the parsed documents.
- The crawled documents and index are refreshed periodically.
- Autocompletes query using a trie, based on most recently asked queries.
- Maintains two threads, to allow refreshing the index and querying simultaneuosly.
- Generates most relevant results in order ranked on the basis of harmonic mean of PageRank (to get the importance of webpage) and Okapi BM25 (to get query-based result) algorithm ranks.
- Provides query suggestions (only when the input query does not generate any results), on the basis of common incorrect and correct words. Ranks them using n-gram algorithm and edit-distance DP to compare two strings.
Steps to Run
Command to run : wunner_search
(make sure your…
How I built it
It is purely written in C++14. It crawls webpages progressively starting from seed URL(s). Then it pre-processes the documents and builds an index for the parsed documents. The crawled documents and index are refreshed periodically. It autocompletes query using a trie, based on most recently asked queries. It generates the most relevant results in order ranked on the basis of harmonic mean of PageRank (to get the importance of a webpage) and Okapi BM25 (to get query-based results) algorithm ranks. It also provides query suggestions when the input query does not generate any results, on the basis of common incorrect and correct words (ranked using n-gram algorithm and edit-distance).
Additional Thoughts / Feelings / Stories
It was fun developing this project as it was built from scratch and allowed me to learn about different data structures and use them practically.
Final Note: Stay healthy, stay safe!
Top comments (0)