mdka v1.5 is released. HTML to Markdown converter written in Rust lang.
Bindings for Node.js are introduced in addition to those for Python.
from Rust lover in training π€
nabbisen
/
mdka-rs
A HTML to Markdown (MD) converter balances conversion quality with runtime efficiency.
mdka
A HTML to Markdown converter written in Rust.
mdka balances conversion quality with runtime efficiency β
readable output from real-world HTML, without sacrificing speed or memory.
"ka" means "ε (γ)" pointing to conversion.
Why mdka?
There are several good HTML-to-Markdown converters in the Rust ecosystem mdka's specific focus is:
- Reliable output from diverse HTML sources. It is built on scraper, which uses html5ever β the HTML5 parser from the Servo browser engine. html5ever applies the same parsing algorithm that web browsers use, so it handles malformed tags, deeply nested structures, CMS output, and SPA-rendered DOM without special-casing.
- Crash resistance. Conversion uses non-recursive DFS throughout. There is no stack overflow, no matter the nesting depth.
- Configurable pre-processing. Five conversion modes let you tune what gets kept or stripped β from noise-free LLM input to lossless archiving.
- Multi-language. The same Rust implementation is accessible from Node.js (napi-rs) and Python (PyO3).
Top comments (0)