DEV Community

Mario Brosco
Mario Brosco

Posted on • Originally published at 42rows.com

wiki42: compile a markdown wiki into RAG-ready chunks

TL;DR

If you have a markdown wiki and want to embed it for RAG, wiki42 does the chunking right: one chunk per page, frontmatter as metadata, [[wikilinks]] resolved, multilingual E5 embeddings.

pip install wiki42
Enter fullscreen mode Exit fullscreen mode
from wiki42 import compile_wiki

chunks = compile_wiki("./my-wiki", backend="cloud")  # or "local"
# → list of dicts ready for Pinecone, Chroma, Qdrant, FAISS, ...
Enter fullscreen mode Exit fullscreen mode

Why one more chunker

Generic chunkers split on token count. Markdown wikis already have semantic units — pages. Splitting them in the middle breaks retrieval.

wiki42:

  • treats 1 page = 1 chunk (whatever its length)
  • parses YAML frontmatter as searchable metadata
  • resolves [[wikilinks]] as crossref for graph queries
  • generates multilingual E5 embeddings out of the box

It's a drop-in replacement for Pinecone server-side embedding, but markdown-aware.

Why we open-sourced

Built at 42rows S.r.l. as the chunker behind our RAG products. We isolated and open-sourced the core because nobody else was solving wiki chunking properly.

Feedback and PRs welcome.

Top comments (0)