DEV Community

nabbisen
nabbisen

Posted on • Edited on

mdka v1.5 is out - HTML to Markdown converter developed with Rust

mdka v1.5 is released. HTML to Markdown converter written in Rust lang.

Bindings for Node.js are introduced in addition to those for Python.

from Rust lover in training 🀍

GitHub logo nabbisen / mdka-rs

A HTML to Markdown (MD) converter balances conversion quality with runtime efficiency.

mdka

A HTML to Markdown converter written in Rust.

crates.io npm pypi License

Documentation Dependency Status Executable npm PyPi

mdka balances conversion quality with runtime efficiency β€” readable output from real-world HTML, without sacrificing speed or memory.
"ka" means "εŒ– (か)" pointing to conversion.


Why mdka?

There are several good HTML-to-Markdown converters in the Rust ecosystem mdka's specific focus is:

  • Reliable output from diverse HTML sources. It is built on scraper, which uses html5ever β€” the HTML5 parser from the Servo browser engine. html5ever applies the same parsing algorithm that web browsers use, so it handles malformed tags, deeply nested structures, CMS output, and SPA-rendered DOM without special-casing.
  • Crash resistance. Conversion uses non-recursive DFS throughout. There is no stack overflow, no matter the nesting depth.
  • Configurable pre-processing. Five conversion modes let you tune what gets kept or stripped β€” from noise-free LLM input to lossless archiving.
  • Multi-language. The same Rust implementation is accessible from Node.js (napi-rs) and Python (PyO3).
…

Top comments (0)