The markdown-query
package brings the functionality of the mq Markdown processor to Python. mq uses a jq-like syntax to filter, transform, and extract data from Markdown documents.
Installation
pip install markdown-query
Basic Usage
The primary interface is the run()
function, which takes three parameters: a query string, markdown content, and optional configuration.
import mq
markdown = """# Title
This is a paragraph.
## Section
- Item 1
- Item 2
``javascript
console.log("hello");
``
"""
# Extract all headings
result = mq.run("select(.h1 || .h2)", markdown, None)
print(result.values)
# ['# Title', '## Section']
# Get heading text without markdown formatting
result = mq.run("select(.h1 || .h2) | to_text()", markdown, None)
print(result.values)
# ['Title', 'Section']
Working with Different Input Formats
The library supports multiple input formats through the Options class:
import mq
# Process HTML input
html = '<h1>Title</h1><p>Content</p><ul><li>Item</li></ul>'
options = mq.Options()
options.input_format = mq.InputFormat.HTML
result = mq.run(".h1 | upcase()", html, options)
print(result.values)
# ['# TITLE']
Available input formats:
-
InputFormat.MARKDOWN
(default) InputFormat.HTML
InputFormat.MDX
InputFormat.TEXT
Query Examples
Extract Code Blocks
# Get all code blocks
code_blocks = mq.run("select(.code)", markdown, None)
# Get code block content as text
code_text = mq.run("select(.code) | to_text()", markdown, None)
Filter by Content
# Find headings containing specific text
headings = mq.run('select(.h1 || .h2) | select(test("Section"))', markdown, None)
# Extract list items
items = mq.run(".[] | to_text()", markdown, None)
Transform Content
# Convert headings to uppercase
upper_headings = mq.run("select(.h1, .h2) | upcase()", markdown, None)
# Replace text in paragraphs
modified = mq.run('select(.paragraph) | gsub("paragraph"; "text")', markdown, None)
Accessing Result Data
The run()
function returns an MQResult
object with a values
list:
result = mq.run("select(.h1)", markdown, None)
# Access all values
for value in result.values:
print(value)
# Access individual items
first_heading = result[0]
print(first_heading.text) # Text content
print(first_heading.markdown_type) # MarkdownType enum
# Iterate over results
for item in result:
print(f"Type: {item.markdown_type}, Text: {item.text}")
Integration with Other Tools
The library works well with other Python markdown processing tools:
from markitdown import MarkItDown
import mq
# Convert web pages to markdown, then process with mq
markitdown = MarkItDown()
result = markitdown.convert("https://example.com")
# Extract specific content
code_samples = mq.run(".code | to_text()", result.text_content, None)
all_links = mq.run(".link | to_html()", result.text_content, None)
Configuration Options
The Options class provides additional configuration:
options = mq.Options()
options.input_format = mq.InputFormat.HTML
options.list_style = mq.ListStyle.PLUS # Use + for lists
options.link_title_style = mq.TitleSurroundStyle.SINGLE
options.link_url_style = mq.UrlSurroundStyle.ANGLE
result = mq.run("select(.list)", content, options)
Error Handling
Queries that fail will raise a PyRuntimeError
:
try:
result = mq.run("invalid_query", markdown, None)
except RuntimeError as e:
print(f"Query failed: {e}")
Performance
The library is built on Rust and compiled to native code, providing fast processing for large markdown files.
Resources
Support
- 🐛 Report bugs
- 💡 Request features
- ⭐ Star the project if you find it useful!
The markdown-query
package provides a straightforward way to apply mq's markdown processing capabilities in Python applications, from simple content extraction to complex document transformations.
Top comments (0)