Managing large collections of Markdown documents (documentation sites, knowledge bases, notes) often means resorting to a messy combination of grep and regex.
mq-db takes a different approach: it parses every Markdown element into a typed block, builds interval and secondary indexes, and exposes the result through SQL and a jq-inspired query language called mq.
What mq-db Does
mq-db turns Markdown files into a queryable, embedded database with zero external dependencies.
# Index a set of documents
mq-db index docs/ --recursive --output store.mq-db
# Run SQL across all indexed files
mq-db sql "SELECT block_type, count(*) FROM blocks GROUP BY block_type" --db store.mq-db
# Or use the interactive TUI
mq-db tui --db store.mq-db
Every Markdown element (heading, paragraph, code block, list item, table cell) becomes a Block row with a block_type, content, and structural metadata.
The Block Model
Parsing a Markdown file produces a flat vector of blocks:
struct Block {
id: u32,
document_id: u32,
block_type: BlockType, // Heading, Paragraph, Code, List, etc.
content: String,
span: Option<Span>, // Source line/column
pre: u32, // Interval index: pre-order
post: u32, // Interval index: post-order
properties: Properties, // depth, lang, ordered, etc.
}
The pre and post fields encode heading hierarchy using a Nested Set (Pre-Post Order) traversal. An ancestor check ("is block A inside section B?") reduces to a single integer comparison:
A is_under B ↔ B.pre < A.pre AND A.post < B.post
This eliminates the need for expensive tree traversal at query time, keeping lookup performance at $O(1)$.
Index Layers
To ensure high performance, mq-db applies a cheapest-first indexing strategy across three layers before any blocks are read from disk:
-
Layer 1: Zone Maps. Per-document statistics stored in the
.mq-dbfile. If a document cannot contain the requested heading text, language tag, or depth, it is skipped entirely. -
Layer 2: Interval Index. Narrows the candidate range to a specific section using the
(pre, post)pair of the anchor heading. -
Layer 3: Secondary Indexes. Includes a
BitmapIndexforblock_type, aBTreeIndexforpre/postranges, and aHashIndexfor exact matches oncontent,lang, anddepth. The SQL predicate pushdown picks anIndexHintautomatically.
SQL Queries
The custom SQL engine is based on sqlparser and runs entirely in-memory with no SQLite dependency.
Block type distribution across all documents
SELECT block_type, count(*) AS total
FROM blocks
GROUP BY block_type
ORDER BY total DESC;
┌─────────────┬───────┐
│ block_type │ total │
├─────────────┼───────┤
│ paragraph │ 241 │
│ heading │ 89 │
│ code │ 73 │
└─────────────┴───────┘
H1 titles per document
SELECT d.path, b.content
FROM blocks b
JOIN documents d ON d.id = b.document_id
WHERE b.block_type = 'heading' AND b.depth = 1;
All content inside a section with under()
The under() scalar function leverages the interval index for fast hierarchy queries:
SELECT b.block_type, b.content
FROM blocks b
WHERE under(
b.pre, b.post,
(SELECT pre FROM blocks WHERE block_type = 'heading' AND content = 'Architecture'),
(SELECT post FROM blocks WHERE block_type = 'heading' AND content = 'Architecture')
)
ORDER BY b.pre;
This instantly retrieves every block nested under the Architecture section across all indexed documents in their original order.
Inline mq with mq()
You can also run an mq program directly inside your SQL queries using the mq() scalar function:
SELECT mq('.h1 | to_text', content) AS title
FROM blocks
WHERE block_type = 'code' AND lang = 'markdown';
mq Queries
For quick command-line extractions, the mq engine accepts native jq-style expressions:
mq-db mq ".h1" --db store.mq-db
mq-db mq 'select(.code_lang == "rust")' --db store.mq-db
mq-db mq ".h | select(.depth == 2)" --db store.mq-db --format markdown
DDL: Custom In-Memory Tables
You can create custom tables from query results or define an explicit schema:
-- Snapshot headings into a custom table
CREATE TABLE headings AS
SELECT content, depth FROM blocks WHERE block_type = 'heading';
-- Explicit schema + insert
CREATE TABLE notes (id TEXT, body TEXT);
INSERT INTO notes VALUES ('1', 'Design decision: use interval index for hierarchy');
-- Inspect schema
SHOW TABLES;
DESC notes;
-- Clean up
DROP TABLE notes;
Interactive REPL
mq-db repl --db store.mq-db --mode sql
mq-db (.help for commands .quit to exit)
mode: sql (.mode mq | .mode sql)
sql> SELECT content FROM blocks WHERE block_type = 'heading' LIMIT 3;
┌──────────────────┐
│ content │
├──────────────────┤
│ Overview │
│ Architecture │
│ Query Engine │
└──────────────────┘
(3 rows)
sql> .mode mq
→ mq mode
mq> .h2
## Architecture
## Query Engine
HTTP Server
mq-db serve --db store.mq-db # Listens on 127.0.0.1:7878
The server exposes three simple endpoints:
| Method | Path | Description |
|---|---|---|
GET |
/health |
Returns internal engine status and document count |
POST |
/sql |
Executes SQL queries and returns rows as JSON |
POST |
/mq |
Evaluates mq expressions and returns results |
curl -s -X POST http://127.0.0.1:7878/sql \
-H 'Content-Type: application/json' \
-d '{"query":"SELECT block_type, count(*) FROM blocks GROUP BY block_type"}'
Library API
mq-db can also be integrated directly into your Rust projects as a library:
use mq_db::{DocumentStore, SqlEngine, MqEngine};
let mut store = DocumentStore::new();
store.add_file("docs/DESIGN.md")?;
store.add_str("# Hello\n\n## Architecture\n\nDetails\n")?;
// Chainable query API
let chunks = store.query()
.documents(|doc| doc.zone_maps.heading_contents.contains("Architecture"))
.under_heading("Architecture", Some(2))
.filter(|b| matches!(b.block_type, BlockType::Paragraph | BlockType::Code))
.blocks();
// SQL Evaluation
let engine = SqlEngine::new(&store)?;
let out = engine.execute(
"SELECT content FROM blocks WHERE block_type = 'heading' ORDER BY pre"
)?;
print!("{}", out.to_table());
// Serialization
store.save("store.mq-db")?;
let store = DocumentStore::load("store.mq-db")?;
Installation
# Install script (Linux / macOS)
curl -fsSL https://raw.githubusercontent.com/harehare/mq-db/main/bin/install.sh | bash
# Via Cargo
cargo install mq-db
Use Cases
- RAG Preprocessing: Extract precise section content by heading for LLM embedding pipelines.
- Documentation Search: Run fast SQL queries across extensive doc sites without deploying heavy search infrastructure.
- Structural Linting: Easily detect style violations (e.g., missing subheadings, empty sections, or incorrect list placement).
- Knowledge Base Analytics: Aggregate, group, and discover patterns across thousands of personal or team notes.
Resources
⭐ If you find mq-db useful, dropping a star on GitHub is highly appreciated!

Top comments (0)