DEV Community

Cover image for How I implemented a search engine for my static blog generator
Julian.io
Julian.io

Posted on • Edited on

How I implemented a search engine for my static blog generator

Lately, I worked on my static site generator, and one of the must-haves was that it should provide a search system. It should be pre-configured and ready to use after every new project initialization.
Also it should be based on clean JavaScript for simplicity.

Two fundamental problems here:

  • Source data
  • Solid and performant indexing solution

As for source data, I had two choices, read the blog posts from already generated HTML files or prepare JSON data with all blog posts at the build time. I chose the second one because the first one seems a little bit clunky and problematic. There could be many different edge cases when collecting the data. When building, I generate HTML files from markdown source files, so I had this step anyway.

As for the indexing solution, I chose Lunr. They describe themselves as "A bit like Solr, but much smaller and not as bright." I love that headline.
Lunr is a JavaScript library with a clean and simple API. It provides an indexing solution for provided source data. Then it allows triggering search on this set, using search phrase.

I won't focus on Lunr much here because they have excellent documentation, and there are many articles about it on the Internet. I wanted to show this particular implementation used in HaroldJS.

After new project initialization using my static site generator, you'll get the postsData.json file and already implemented JavaScript logic for the search engine.

Lunr initialization looks like that:

fetchPostsJsonData().then((data) => {
  searchIndex = lunr(function () {
    this.ref('fileName');
    this.field('title');
    this.field('body');
    this.field('excerpt');
    this.field('tags');
    data.forEach((doc) => {
      this.add(
        Object.assign(doc, {
          tags: doc.tags.join(' '),
          body: doc.body.replace(/(<([^>]+)>)/gi, ''),
        })
      );
    }, this);
  });
});
Enter fullscreen mode Exit fullscreen mode

We fetched the data from the already generated postsData.json file and then created the Lunr search index. We need to add all fields from the JSON file, on which we will base our search index, and then we iterate through that data, adding it to the index. I've also implemented several improvements for multiple tags and cleanup for HTML to get better search results.

Then we have ready to use search function, which gets the search phrase as an argument and executes a search on the index. It looks like:

const searchResults = searchIndex.search(phrase);
Enter fullscreen mode Exit fullscreen mode

Thanks to Lunr and my custom logic for postsData.json file generation, I have a search engine on every new project I want to create. It can be a blog, portfolio website, or documentation site. All packed with clean, responsive design and full-screen search interactions.

Of course, it needs some improvements at this stage, but the main idea was to have something which works and won't take a lot of time to implement.

Now, it also works when I will add or remove blog posts. It will rebuild the index, so every update of the search index depends on the actual state of our static blog.

I encourage you to play with Harold. Start with:

npm init harold-app my-blog
Enter fullscreen mode Exit fullscreen mode

Also, please check out docs: www.haroldjs.com
And Github: create-harold-app
Quick walkthrough video: youtu.be/DG0T1Fg0mq0
Read more: https://www.julian.io/articles/blog-for-free.html

Top comments (0)