If you've ever tried to make your docs, blog posts, or changelogs searchable with Meilisearch, you know the drill: write a custom scraper, parse the content, transform it into the right shape, push it to an index, and hope you don't break search during re-indexing.
I got tired of writing that glue code for every project, so I built content-mill — a CLI and library that indexes static content into Meilisearch from a single YAML config.
The Problem
Meilisearch is fantastic for search, but getting your content into it is surprisingly manual. Every docs site, every changelog, every collection of markdown files needs its own extraction pipeline. And if you want zero-downtime re-indexing? That's more code on top.
Most existing solutions are either tightly coupled to a specific framework (like DocSearch for Algolia) or require you to write a full crawler. If you just have some markdown files and a Meilisearch instance, there's nothing lightweight that bridges the gap.
What content-mill Does
You describe your content sources and the document shape you want in a YAML config:
meili:
host: http://localhost:7700
apiKey: ${MEILI_MASTER_KEY}
sources:
- name: docs
type: mkdocs
config: ./mkdocs.yml
index: docs
document:
primaryKey: id
fields:
id: "{{ slug }}"
title: "{{ heading }}"
content: "{{ body }}"
section: "{{ nav_section }}"
url: "{{ path }}"
type: "docs"
searchableAttributes: [title, content]
filterableAttributes: [section, type]
Then run:
npx @centrali-io/content-mill index --config content-mill.yml
That's it. content-mill reads your sources, extracts content, applies your field templates, and pushes everything to Meilisearch with atomic index swapping (so search never goes down during re-indexing).
Four Source Types, One Interface
content-mill ships with adapters for the content formats you're most likely already using:
mkdocs — Reads your mkdocs.yml, follows the nav tree, and parses each markdown page. You get nav_section context so you know which part of the docs each page belongs to.
markdown-dir — Recursively reads .md files from a directory. Supports YAML frontmatter, so you can pull version numbers, dates, or any metadata into your search index. Great for changelogs and blog posts.
json — Reads a JSON array (or directory of JSON files). Every key in each object becomes a template variable. Perfect for structured data you already have lying around.
html — Reads .html files, strips scripts/styles/nav/footer, and gives you clean text. Useful for indexing a built static site.
Templating: You Control the Document Shape
The key design decision is that you define what your Meilisearch documents look like. Source adapters extract raw variables (slug, heading, body, path, frontmatter.*, etc.), and you map them to fields using {{ template }} syntax:
fields:
id: "{{ slug }}-{{ chunk_index }}"
title: "{{ chunk_heading }}"
content: "{{ chunk_body }}"
excerpt: "{{ body | truncate(200) }}"
url: "{{ path }}#{{ chunk_heading | slugify }}"
Filters like truncate, slugify, lower, upper, and strip_md can be chained with pipes. This means you're not locked into someone else's schema — your search index looks exactly the way your frontend expects.
Chunking for Granular Results
Whole-page results are often too broad for docs search. content-mill can split pages by heading level:
chunking:
strategy: heading
level: 2
This turns one long page into multiple documents — one per ## section — each with its own chunk_heading, chunk_body, and chunk_index. Your search results can now link directly to the relevant section instead of dumping users at the top of a page.
Zero-Downtime Re-indexing
Every indexing run uses Meilisearch's index swap:
- Documents go into a temp index (
docs_tmp) - Atomic swap with the live index (
docs) - Old index gets cleaned up
If something fails mid-way, your live index is untouched. No maintenance windows needed.
CI/CD in Two Lines
# GitHub Actions
- name: Index docs
env:
MEILI_MASTER_KEY: ${{ secrets.MEILI_MASTER_KEY }}
run: npx @centrali-io/content-mill index --config content-mill.yml
Hook this into your release pipeline and your search index stays in sync with every deploy.
Use as a Library
Don't need the CLI? Import it directly:
import { loadConfig, indexAll } from '@centrali-io/content-mill';
const config = loadConfig('./content-mill.yml');
await indexAll(config, { dryRun: false });
Or build the config object in code if you prefer programmatic control.
Getting Started
npm install @centrali-io/content-mill
- Create a
content-mill.ymlwith your Meilisearch connection and source definitions - Run with
--dry-runfirst to preview the extracted documents - Run for real and check your Meilisearch dashboard
The full config reference and source type examples are in the README on GitHub.
content-mill is MIT licensed and open source. If you're using Meilisearch and have static content to index, I'd love to hear how it works for your use case. Issues and PRs welcome on GitHub.
Tags: #meilisearch #search #typescript #opensource
Top comments (0)