Peter Hallander

Posted on May 13

Building related-post recommendations for a Shopify blog — the algorithm, not the app

#shopify #ecommerce #javascript #liquid

Shopify gives you product recommendations out of the box. There's a whole /recommendations/products.json endpoint, a recommendations.products Liquid drop, and an entire ML pipeline behind it. For blog posts? Nothing. Zero. There is no /recommendations/articles, no recommendations.articles, no signal. If you run an editorial blog on Shopify — and a lot of stores do now, because content is how you get organic traffic without paying Meta — you're on your own.

What most themes do is one of three things:

A "Latest 3 posts" widget at the bottom of every article.
A hand-curated metafield where the merchant picks related posts manually.
An app that you install and treat as a black box.

I've shipped all three for clients. The first one is useless. The second one only works for the first month, then nobody updates the metafields. The third one works fine but most merchants want to understand what's actually happening. So this post is about how to roll your own — the algorithm, the rendering pattern, and the things I got wrong the first two times.

I'll be upfront: I made an app for this. I'll mention it once near the end. The article isn't about the app — it's about the engineering. If you'd rather build it yourself after reading this, fantastic; that's the point.

Why "latest posts" is the wrong default

A reader who lands on your article about "summer wedding shoes for outdoor ceremonies" is interested in summer wedding shoes for outdoor ceremonies. If your latest three posts are about pet grooming, holiday gift guides, and a sustainability report, you've just shown them three exit ramps. Bounce goes up. Pages-per-session goes down. And the internal linking signal you could have given Google — "this article about wedding shoes is topically related to these other wedding shoe articles" — is wasted on noise.

The fix isn't fancy. You need a similarity function that takes the current article and returns a ranked list of other articles, where "ranked" means topically relevant, not chronologically recent. Then you render the top three or four. That's it. The hard part is doing it without slowing the page down or breaking SEO.

The baseline: tag overlap with Jaccard similarity

Every Shopify article has a tags array. Most editorial Shopify stores tag posts with things like wedding, summer, outdoor, shoes, style-guide. If two articles share a high proportion of tags, they're probably related. That's the entire premise of Jaccard similarity:

J(A, B) = |A ∩ B| / |A ∪ B|

Intersection over union. Score between 0 and 1. Higher means more overlap. It's the simplest non-trivial similarity function you can compute, and for blog posts with reasonable tagging, it's surprisingly strong as a baseline.

You can do this in pure Liquid. No JS, no fetch, no script tag — the markup renders server-side and you get the related posts inline. Here's the loop:

{%- liquid
  assign current_tags = article.tags
  assign current_handle = article.handle
  assign scored = ''
-%}

{%- for post in blog.articles -%}
  {%- if post.handle == current_handle -%}{%- continue -%}{%- endif -%}

  {%- assign intersection = 0 -%}
  {%- assign union_set = current_tags | concat: post.tags | uniq -%}

  {%- for tag in current_tags -%}
    {%- if post.tags contains tag -%}
      {%- assign intersection = intersection | plus: 1 -%}
    {%- endif -%}
  {%- endfor -%}

  {%- assign union_size = union_set | size -%}
  {%- if union_size == 0 -%}{%- continue -%}{%- endif -%}

  {%- assign score = intersection
    | times: 1000
    | divided_by: union_size -%}

  {%- assign entry = score
    | append: '|'
    | append: post.handle -%}
  {%- assign scored = scored | append: entry | append: ',' -%}
{%- endfor -%}

{%- assign scored_list = scored
  | split: ','
  | sort
  | reverse -%}

<ul class="related-posts">
  {%- for entry in scored_list limit: 3 -%}
    {%- assign parts = entry | split: '|' -%}
    {%- assign handle = parts[1] -%}
    {%- assign related = blog.articles[handle] -%}
    {%- if related -%}
      <li>
        <a href="{{ related.url }}">
          <img src="{{ related.image | image_url: width: 400 }}"
               alt="{{ related.title | escape }}"
               loading="lazy">
          <h3>{{ related.title }}</h3>
        </a>
      </li>
    {%- endif -%}
  {%- endfor -%}
</ul>

A few notes on this. Liquid doesn't have floats, so I multiply by 1000 before dividing to preserve precision in the sort. The string concatenation trick (score|handle,score|handle,... then split and sort) is how you get a stable sort by numeric score in pure Liquid without a sort_by filter on a dynamic key. It's ugly. It works.

The other thing: this loops over blog.articles, which on a small blog is fine. On a blog with several hundred articles you'll pay for it in render time, and blog.articles is also capped at 50 by default — you'll want to paginate or limit to the most recent N for the candidate set. In practice, scoring against the last 50 articles is more than enough for relevance; older posts are usually less relevant anyway.

Adding author as a secondary signal

Tags are noisy. Some writers under-tag, some over-tag. A second signal that's almost free: same author. If two articles share an author, that's often a strong topical signal because writers tend to specialize. Your shoe writer writes about shoes. Your sustainability writer writes about sustainability. You don't even need to look at the tags to know the next article from the same person is probably in the same orbit.

I weight this lower than tags — author is a coarse signal, tags are fine-grained — but adding +0.2 when authors match and the article is in the same blog has been a clean lift in the data I've seen. It's also the savior when tags are sparse, which on a newer blog they will be.

In Liquid:

{%- assign author_bonus = 0 -%}
{%- if post.author == article.author -%}
  {%- assign author_bonus = 200 -%}
{%- endif -%}
{%- assign final_score = score | plus: author_bonus -%}

(Again, scaled by 1000 to play nice with integer math.)

Keyword overlap from titles and handles

The third signal is the most useful but takes the most care: content keyword overlap. Tags are author-controlled and inconsistent. The article title and the URL handle are usually clean, descriptive, and stable. "summer-wedding-shoes-outdoor" tells you a lot more than the tag summer does.

You can tokenize the title plus handle once per article, drop stopwords, and compute overlap against the current article's tokens. The bigger version of this is TF-IDF — you compute term frequency for each article and inverse document frequency across the whole corpus, then take a dot product on the resulting vectors. That's textbook IR and it works, but you don't need it for a blog with 50–500 articles. Plain keyword overlap with a stopword list gets you 80% of the win for 10% of the complexity.

I do this in JS rather than Liquid because tokenization in Liquid is painful and you can cache the token sets in localStorage or just compute them once per page load. Liquid handles the initial render with the tag-only score, then JS upgrades the list on hydration if you want a fancier score. Or — and this is what I do now — you push the whole scoring step server-side via the Section Rendering API.

Section Rendering API: why it matters

Here's the thing about doing recommendations purely in client-side JS: you lose two things you actually need.

First, SEO. Googlebot does render JS these days, but it discounts JS-rendered content and especially JS-rendered internal links. If the related-posts block is in the initial HTML, you get the full internal-link equity from day one. If it's injected on hydration, you don't.

Second, CLS. Cumulative Layout Shift is a real Core Web Vitals metric and Google ranks pages worse when they shift. A block that materializes 600ms after first paint pushes the footer down and you eat a CLS penalty. Reserving space with a skeleton is one fix, but the cleaner answer is to render the section server-side and not have it pop in at all.

Shopify's Section Rendering API lets you request a single section's rendered HTML from any URL. The pattern is ?section_id=related-posts appended to any request. You get back HTML, and you swap it into the DOM.

Why is this useful if Liquid already rendered the section server-side on the initial page load? Because you can do a two-pass approach:

First paint: render with the cheap tag-only scoring directly in Liquid. The block exists in HTML, the links are crawlable, no CLS.
Hydration: fetch the same section with a richer query (more candidate articles, optional author/keyword weighting) and replace the HTML. The swap is identical layout, no shift.

Or, simpler: render once server-side with the full algorithm and never touch the DOM after. That's what I do for most installs. The Section Rendering API still matters because it's how you re-render the block in a wishlisted or AJAX-y context without reloading the page.

Here's the fetch:

async function loadRelatedPosts(articleHandle, container) {
  const url = `/blogs/news/${articleHandle}?section_id=related-posts`;

  try {
    const res = await fetch(url, {
      headers: { 'Accept': 'text/html' }
    });
    if (!res.ok) throw new Error(`HTTP ${res.status}`);

    const html = await res.text();
    // The section response is wrapped in a div with the section id.
    const parser = new DOMParser();
    const doc = parser.parseFromString(html, 'text/html');
    const fresh = doc.querySelector(`#shopify-section-related-posts`);

    if (fresh && container) {
      container.replaceWith(fresh);
    }
  } catch (err) {
    console.warn('Related posts refresh failed', err);
    // Leave the server-rendered block in place. Graceful degradation.
  }
}

Two things worth flagging. The URL has to be a valid Shopify URL — you can hit the article URL itself, or /, or any path that includes the section in its template. Different themes scope sections differently; you'll want to test which path includes your section. And the response is HTML, not JSON. There's no section.id.json endpoint for blog sections the way there is for product sections in some setups. Parse the HTML, find your section's outer wrapper by ID, and swap.

Combined scoring

Putting it together, here's the scoring function I land on for most stores. Tag overlap is the spine. Author is a tiebreaker. Keywords are the fine-grain.

const STOPWORDS = new Set([
  'the','a','an','and','or','of','to','in','on','for','with',
  'is','are','was','were','be','been','it','this','that','your','you'
]);

function tokenize(text) {
  return text
    .toLowerCase()
    .replace(/[^a-z0-9\s-]/g, ' ')
    .split(/[\s-]+/)
    .filter(t => t.length > 2 && !STOPWORDS.has(t));
}

function jaccard(setA, setB) {
  const a = new Set(setA);
  const b = new Set(setB);
  const intersection = [...a].filter(x => b.has(x)).length;
  const union = new Set([...a, ...b]).size;
  return union === 0 ? 0 : intersection / union;
}

function scoreArticle(current, candidate) {
  const tagScore = jaccard(current.tags, candidate.tags);

  const authorScore = current.author === candidate.author ? 1 : 0;

  const currentTokens = tokenize(
    `${current.title} ${current.handle}`
  );
  const candidateTokens = tokenize(
    `${candidate.title} ${candidate.handle}`
  );
  const keywordScore = jaccard(currentTokens, candidateTokens);

  return (
    tagScore * 0.5 +
    authorScore * 0.2 +
    keywordScore * 0.3
  );
}

function rankRelated(current, candidates, limit = 3) {
  return candidates
    .filter(c => c.handle !== current.handle)
    .map(c => ({ ...c, _score: scoreArticle(current, c) }))
    .filter(c => c._score > 0)
    .sort((a, b) => b._score - a._score)
    .slice(0, limit);
}

Weights are tunable. I've ended up at 0.5 / 0.2 / 0.3 because tags are the highest-signal feature on a well-tagged blog, but keywords pull more weight when tags are sparse. If you have a content audit step, you can A/B these — track click-through on the related-posts block as your success metric and adjust. The fallback when every score is 0 (an article with unique tags, unique author, no shared keywords) is to surface the latest few posts. That's the only legitimate use case for the "latest 3" widget — as a fallback, not as a default.

What I'd avoid

A few patterns I've seen on stores I've looked at that don't work:

Random shuffle. Some themes ship with "related posts" that are actually just blog.articles | shuffle | limit: 3. It's worse than latest-3 because at least latest-3 surfaces fresh content. Random surfaces archive posts that the merchant has probably stopped maintaining.

Pure JS rendering with no SSR fallback. Pretty common in older "related posts" apps. Block injects on DOMContentLoaded, Googlebot mostly ignores it, internal-link equity wasted, CLS hit. If you're doing this in 2026, stop.

Showing more than 3 or 4. Diminishing returns. The fourth and fifth items get clicks in the low single digits in the merchant data I've looked at. Real estate is better spent on a newsletter signup or a product card.

Scoring against every article. Cap the candidate set. The most recent 50 is plenty for any reasonable blog. If you genuinely have a 5000-article archive, build a separate batch job that pre-computes related posts and stores them in metafields. That's a different post though.

What shipping it taught me

A few things I didn't expect.

Most merchants under-tag. Their articles have one or two tags total, and Jaccard scores are mostly 0 or 0.5. The author and keyword signals carry more weight in practice than they should in theory. If you're building this for a single store, audit the tagging first — you'll get more lift from cleaning tags than from tweaking weights.

Carousels test worse than grids on mobile, from what I've seen. People scroll past carousels because they look like ads. A 2x2 grid that all renders above the fold of the post-article scroll position outperforms a swipeable carousel in the click-through data I've had access to. Hedge: this is from a handful of stores, not a controlled study. Try both.

The Section Rendering API approach saved me from a CLS regression on one store that was running a Liquid-rendered block but with images that loaded after first paint. Adding explicit width and height on the related-post thumbnails matters at least as much as the rendering pattern itself. Reserve the space.

I made an app for this — Better Related Blog Posts — if you'd rather not roll your own. It's on the App Store and it does the tag/author/keyword scoring plus the layouts (grid, list, carousel) on a Theme App Extension so there's no script tag and nothing to maintain. Happy to share more about how it's built if useful. But honestly, if you're comfortable in Liquid and JS, everything above is enough to build the V1 yourself in an afternoon. The hard part isn't the algorithm — it's getting the tagging clean enough that the algorithm has signal to work with.

DEV Community