DEV Community

Macan
Macan

Posted on

I built a shopping search engine in Rust that you talk to in plain words

Keyword search is bad at specific products. I'd know exactly what I wanted — "dark green waxed cotton jacket, under €200, not from a giant marketplace" —
and every engine buried me in Amazon listings and content-farm "best of" lists.

So I built Hubje: you describe a product in plain language and it returns real, buyable products from independent shops, ranked by how
well they match what you actually said. Here's the interesting engineering.

## Server-rendered Rust, no SPA

The whole site is Rust (axum) rendering HTML with Maud, a compile-time template macro. No
JS framework, no hydration, no API layer — handlers return HTML strings.

  fn product_card(p: &Product) -> Markup {
      html! {
          article class="rounded-2xl border border-slate-200 bg-white" {
              img src=(img_url(&p.image, 480))
                  srcset=(img_srcset(&p.image, &[240, 360, 480, 600, 800]))
                  sizes=(CARD_SIZES) loading="lazy";
              h3 class="text-sm font-semibold text-ink" { (p.title) }
          }
      }
  }
Enter fullscreen mode Exit fullscreen mode

Why: a shopping site lives or dies by SEO, and SSR is trivially crawlable + fast. htmx handles the few interactive bits (live search results swap in-place)
as progressive enhancement — URLs stay real and shareable. The "framework" is the type system.

## Plain-language search

Search isn't keyword matching. It pulls live listings (via Exa) and an LLM ranks/filters them against your sentence — including soft
constraints like "under €200" or "minimalist". Same pipeline writes the one-line "why this pick" rationales and a top-3.

The genuinely fun part is the failure mode as a feature: when a description returns nothing buyable, that zero-result query is logged. Recurring gaps
auto-generate a curated buying guide page — so unmet demand turns into indexable content. The content engine feeds itself.

## CWV obsession (self-host everything)

Organic is the only growth channel I can afford, so Core Web Vitals matter. I ended up removing every third-party request:

  • Fonts — vendored the variable woff2s, @font-face with font-display: swap + unicode-range so latin-ext only loads on an accented glyph. No Google Fonts round-trip.
  • htmx — self-hosted + defer, so zero render-blocking external JS.
  • Images — a Rust /img proxy that downscales and content-negotiates the format:
  let fmt = ImgFmt::from_accept(accept_header); // webp if offered, else jpeg
  // cache key + Vary: Accept so caches don't cross-serve
Enter fullscreen mode Exit fullscreen mode

WebP (libwebp via the webp crate) is ~30% smaller than JPEG and builds fine for a distroless image. Cloudflare's field data now shows LCP 100% "good".

Plus the boring-but-required stuff: canonical URLs, full OpenGraph/Twitter, JSON-LD (Product with price/availability, ItemList, FAQ, Breadcrumb, Dataset),
noindex on thin pages, a real 404, trailing-slash 301s.

## Ad-blocker-proof analytics

GA and even Cloudflare's beacon get blocked by uBlock/AdGuard — which undercounts exactly this audience. So pageviews are counted server-side in the
request middleware (cookieless, aggregate-only), with day-salted hashes for unique visitors and a curated datacenter-IP list to flag scrapers spoofing
browser UAs as bots, not humans.

## Deploy

include_str!/include_bytes! embed the CSS, fonts, htmx, and favicon straight into the binary, so the runtime image is just distroless + one static-ish
executable. Built with kaniko, deployed to a small k3s homelab cluster. The whole thing is one ~170MB binary.

## Honest tradeoffs

  • It's new — catalog coverage is thin, so niche searches sometimes return nothing.
  • Affiliate-funded (disclosed); commission never affects ranking.
  • LLM-in-the-loop search is a latency/cost tradeoff vs. a pure index; caching + a top-3-only LLM pass keeps it sane.

Try it with the most oddly-specific thing you've failed to find online and tell me if it surfaces anything sane: https://hubje.nl — feedback welcome,
especially on search quality.

Top comments (0)