DEV Community

Cover image for Search engine building tutorial, that supports advanced search syntaxes
Pacharapol Withayasakpunt
Pacharapol Withayasakpunt

Posted on • Updated on • Originally published at polv.cc

 

Search engine building tutorial, that supports advanced search syntaxes

  • It should support AND, OR, NOT; and perhaps brackets ().
  • Another part of it, though, is about optimization and fuzzy searches.
    • Fast even for a large body of text.
    • Realizes pluralization.
    • Forgiving of minor typos.

Advanced search syntaxes

I have thought about this a lot in the past.

The easiest way is to use lunr.js's syntaxes.

  • Default connector is AND.
  • To make an OR, use ?expression.
  • Search is normally case-insensitive, i.e. a and A means the same thing.
  • +expression means exactly match, and case-sensitive.
  • -expression means negation.
  • Not only :, but also > and < is used to specify comparison. For example, +foo:bar, count>1.
  • Date comparison is enabled.
    • Special keyword: NOW.
    • +1h means next 1 hour. -1h mean 1 hour ago.
    • Available units are y (year), M (month), w (week), d (day), h (hour), m (minute).

You can see my experiment and playground here.

GitHub logo patarapolw / qsearch

Search a database with a string. Designed for end-users.

Full text search and fuzzy search

I made a list, here.

  • Algolia
  • Elasticsearch, Lucene, Solr
  • Google custom search

How does it compare to search engines with web crawlers?

  • Yahoo
  • Bing
  • DuckDuckGo
  • Yandex
  • Baidu

What about pure JavaScript implementations?

  • js-search
  • lunr, elasticlunr

RDBMS and NoSQL's feature?

  • SQLite FTS4, FTS5
  • PostgreSQL plugin
  • MongoDB

Or, some other implementations, like Python's Whoosh?

Implementing both together

It is easier if you use RDBMS and NoSQL's features. PostgreSQL, MySQL and MongoDB (but not SQLite) allows you to create an index on a TEXT column, and make a full-text index.

Furthermore, PostgreSQL also has pgroonga, that does not only have more language support than native tsvector; but also can index anything, including JSONB.

Now comes the algorithm for the syntax. I made it for PostgreSQL in another project.

Top comments (0)

An Animated Guide to Node.js Event Loop

Node.js doesn’t stop from running other operations because of Libuv, a C++ library responsible for the event loop and asynchronously handling tasks such as network requests, DNS resolution, file system operations, data encryption, etc.

What happens under the hood when Node.js works on tasks such as database queries? We will explore it by following this piece of code step by step.