DEV Community

Takahiro Sato
Takahiro Sato

Posted on

Query Markdown as a Database with mq-db: SQL, mq, and Interval Indexes

Managing large collections of Markdown documents (documentation sites, knowledge bases, notes) often means resorting to a messy combination of grep and regex.

mq-db takes a different approach: it parses every Markdown element into a typed block, builds interval and secondary indexes, and exposes the result through SQL and a jq-inspired query language called mq.

What mq-db Does

mq-db turns Markdown files into a queryable, embedded database with zero external dependencies.

# Index a set of documents
mq-db index docs/ --recursive --output store.mq-db

# Run SQL across all indexed files
mq-db sql "SELECT block_type, count(*) FROM blocks GROUP BY block_type" --db store.mq-db

# Or use the interactive TUI
mq-db tui --db store.mq-db
Enter fullscreen mode Exit fullscreen mode

Every Markdown element (heading, paragraph, code block, list item, table cell) becomes a Block row with a block_type, content, and structural metadata.

The Block Model

Parsing a Markdown file produces a flat vector of blocks:

struct Block {
    id: u32,
    document_id: u32,
    block_type: BlockType,  // Heading, Paragraph, Code, List, etc.
    content: String,
    span: Option<Span>,     // Source line/column
    pre: u32,               // Interval index: pre-order
    post: u32,              // Interval index: post-order
    properties: Properties, // depth, lang, ordered, etc.
}
Enter fullscreen mode Exit fullscreen mode

The pre and post fields encode heading hierarchy using a Nested Set (Pre-Post Order) traversal. An ancestor check ("is block A inside section B?") reduces to a single integer comparison:

A is_under B  ↔  B.pre < A.pre  AND  A.post < B.post
Enter fullscreen mode Exit fullscreen mode

This eliminates the need for expensive tree traversal at query time, keeping lookup performance at $O(1)$.

Index Layers

To ensure high performance, mq-db applies a cheapest-first indexing strategy across three layers before any blocks are read from disk:

  • Layer 1: Zone Maps. Per-document statistics stored in the .mq-db file. If a document cannot contain the requested heading text, language tag, or depth, it is skipped entirely.
  • Layer 2: Interval Index. Narrows the candidate range to a specific section using the (pre, post) pair of the anchor heading.
  • Layer 3: Secondary Indexes. Includes a BitmapIndex for block_type, a BTreeIndex for pre/post ranges, and a HashIndex for exact matches on content, lang, and depth. The SQL predicate pushdown picks an IndexHint automatically.

SQL Queries

The custom SQL engine is based on sqlparser and runs entirely in-memory with no SQLite dependency.

Block type distribution across all documents

SELECT block_type, count(*) AS total
FROM blocks
GROUP BY block_type
ORDER BY total DESC;
Enter fullscreen mode Exit fullscreen mode
┌─────────────┬───────┐
│ block_type  │ total │
├─────────────┼───────┤
│ paragraph   │   241 │
│ heading     │    89 │
│ code        │    73 │
└─────────────┴───────┘
Enter fullscreen mode Exit fullscreen mode

H1 titles per document

SELECT d.path, b.content
FROM blocks b
JOIN documents d ON d.id = b.document_id
WHERE b.block_type = 'heading' AND b.depth = 1;
Enter fullscreen mode Exit fullscreen mode

All content inside a section with under()

The under() scalar function leverages the interval index for fast hierarchy queries:

SELECT b.block_type, b.content
FROM blocks b
WHERE under(
  b.pre, b.post,
  (SELECT pre  FROM blocks WHERE block_type = 'heading' AND content = 'Architecture'),
  (SELECT post FROM blocks WHERE block_type = 'heading' AND content = 'Architecture')
)
ORDER BY b.pre;
Enter fullscreen mode Exit fullscreen mode

This instantly retrieves every block nested under the Architecture section across all indexed documents in their original order.

Inline mq with mq()

You can also run an mq program directly inside your SQL queries using the mq() scalar function:

SELECT mq('.h1 | to_text', content) AS title
FROM blocks
WHERE block_type = 'code' AND lang = 'markdown';
Enter fullscreen mode Exit fullscreen mode

mq Queries

For quick command-line extractions, the mq engine accepts native jq-style expressions:

mq-db mq ".h1" --db store.mq-db
mq-db mq 'select(.code_lang == "rust")' --db store.mq-db
mq-db mq ".h | select(.depth == 2)" --db store.mq-db --format markdown
Enter fullscreen mode Exit fullscreen mode

DDL: Custom In-Memory Tables

You can create custom tables from query results or define an explicit schema:

-- Snapshot headings into a custom table
CREATE TABLE headings AS
SELECT content, depth FROM blocks WHERE block_type = 'heading';

-- Explicit schema + insert
CREATE TABLE notes (id TEXT, body TEXT);
INSERT INTO notes VALUES ('1', 'Design decision: use interval index for hierarchy');

-- Inspect schema
SHOW TABLES;
DESC notes;

-- Clean up
DROP TABLE notes;
Enter fullscreen mode Exit fullscreen mode

Interactive REPL

mq-db repl --db store.mq-db --mode sql
Enter fullscreen mode Exit fullscreen mode
mq-db  (.help for commands  .quit to exit)
mode: sql  (.mode mq | .mode sql)

sql> SELECT content FROM blocks WHERE block_type = 'heading' LIMIT 3;
┌──────────────────┐
│ content          │
├──────────────────┤
│ Overview         │
│ Architecture     │
│ Query Engine     │
└──────────────────┘
(3 rows)

sql> .mode mq
→ mq mode
mq> .h2
## Architecture
## Query Engine
Enter fullscreen mode Exit fullscreen mode

HTTP Server

mq-db serve --db store.mq-db  # Listens on 127.0.0.1:7878
Enter fullscreen mode Exit fullscreen mode

The server exposes three simple endpoints:

Method Path Description
GET /health Returns internal engine status and document count
POST /sql Executes SQL queries and returns rows as JSON
POST /mq Evaluates mq expressions and returns results
curl -s -X POST http://127.0.0.1:7878/sql \
  -H 'Content-Type: application/json' \
  -d '{"query":"SELECT block_type, count(*) FROM blocks GROUP BY block_type"}'
Enter fullscreen mode Exit fullscreen mode

Library API

mq-db can also be integrated directly into your Rust projects as a library:

use mq_db::{DocumentStore, SqlEngine, MqEngine};

let mut store = DocumentStore::new();
store.add_file("docs/DESIGN.md")?;
store.add_str("# Hello\n\n## Architecture\n\nDetails\n")?;

// Chainable query API
let chunks = store.query()
    .documents(|doc| doc.zone_maps.heading_contents.contains("Architecture"))
    .under_heading("Architecture", Some(2))
    .filter(|b| matches!(b.block_type, BlockType::Paragraph | BlockType::Code))
    .blocks();

// SQL Evaluation
let engine = SqlEngine::new(&store)?;
let out = engine.execute(
    "SELECT content FROM blocks WHERE block_type = 'heading' ORDER BY pre"
)?;
print!("{}", out.to_table());

// Serialization
store.save("store.mq-db")?;
let store = DocumentStore::load("store.mq-db")?;
Enter fullscreen mode Exit fullscreen mode

Installation

# Install script (Linux / macOS)
curl -fsSL https://raw.githubusercontent.com/harehare/mq-db/main/bin/install.sh | bash

# Via Cargo
cargo install mq-db
Enter fullscreen mode Exit fullscreen mode

Use Cases

  • RAG Preprocessing: Extract precise section content by heading for LLM embedding pipelines.
  • Documentation Search: Run fast SQL queries across extensive doc sites without deploying heavy search infrastructure.
  • Structural Linting: Easily detect style violations (e.g., missing subheadings, empty sections, or incorrect list placement).
  • Knowledge Base Analytics: Aggregate, group, and discover patterns across thousands of personal or team notes.

Resources


⭐ If you find mq-db useful, dropping a star on GitHub is highly appreciated!

Top comments (0)