Takahiro Sato

Posted on Jul 27

Processing Markdown in Python with mq

#python #markdown #cli #ai

The markdown-query package brings the functionality of the mq Markdown processor to Python. mq uses a jq-like syntax to filter, transform, and extract data from Markdown documents.

Installation

pip install markdown-query

Basic Usage

The primary interface is the run() function, which takes three parameters: a query string, markdown content, and optional configuration.

import mq

markdown = """# Title

This is a paragraph.

## Section

- Item 1
- Item 2

``javascript
console.log("hello");
``
"""

# Extract all headings
result = mq.run("select(.h1 || .h2)", markdown, None)
print(result.values)
# ['# Title', '## Section']

# Get heading text without markdown formatting
result = mq.run("select(.h1 || .h2) | to_text()", markdown, None)
print(result.values)
# ['Title', 'Section']

Working with Different Input Formats

The library supports multiple input formats through the Options class:

import mq

# Process HTML input
html = '<h1>Title</h1><p>Content</p><ul><li>Item</li></ul>'

options = mq.Options()
options.input_format = mq.InputFormat.HTML

result = mq.run(".h1 | upcase()", html, options)
print(result.values)
# ['# TITLE']

Available input formats:

InputFormat.MARKDOWN (default)
InputFormat.HTML
InputFormat.MDX
InputFormat.TEXT

Query Examples

Extract Code Blocks

# Get all code blocks
code_blocks = mq.run("select(.code)", markdown, None)

# Get code block content as text
code_text = mq.run("select(.code) | to_text()", markdown, None)

Filter by Content

# Find headings containing specific text
headings = mq.run('select(.h1 || .h2) | select(test("Section"))', markdown, None)

# Extract list items
items = mq.run(".[] | to_text()", markdown, None)

Transform Content

# Convert headings to uppercase
upper_headings = mq.run("select(.h1, .h2) | upcase()", markdown, None)

# Replace text in paragraphs
modified = mq.run('select(.paragraph) | gsub("paragraph"; "text")', markdown, None)

Accessing Result Data

The run() function returns an MQResult object with a values list:

result = mq.run("select(.h1)", markdown, None)

# Access all values
for value in result.values:
    print(value)

# Access individual items
first_heading = result[0]
print(first_heading.text)           # Text content
print(first_heading.markdown_type)  # MarkdownType enum

# Iterate over results
for item in result:
    print(f"Type: {item.markdown_type}, Text: {item.text}")

Integration with Other Tools

The library works well with other Python markdown processing tools:

from markitdown import MarkItDown
import mq

# Convert web pages to markdown, then process with mq
markitdown = MarkItDown()
result = markitdown.convert("https://example.com")

# Extract specific content
code_samples = mq.run(".code | to_text()", result.text_content, None)
all_links = mq.run(".link | to_html()", result.text_content, None)

Configuration Options

The Options class provides additional configuration:

options = mq.Options()
options.input_format = mq.InputFormat.HTML
options.list_style = mq.ListStyle.PLUS      # Use + for lists
options.link_title_style = mq.TitleSurroundStyle.SINGLE
options.link_url_style = mq.UrlSurroundStyle.ANGLE

result = mq.run("select(.list)", content, options)

Error Handling

Queries that fail will raise a PyRuntimeError:

try:
    result = mq.run("invalid_query", markdown, None)
except RuntimeError as e:
    print(f"Query failed: {e}")

Performance

The library is built on Rust and compiled to native code, providing fast processing for large markdown files.

Resources

Support

🐛 Report bugs
💡 Request features
⭐ Star the project if you find it useful!

The markdown-query package provides a straightforward way to apply mq's markdown processing capabilities in Python applications, from simple content extraction to complex document transformations.

DEV Community