DEV Community

Osaigbovo dil Omere
Osaigbovo dil Omere

Posted on • Originally published at substack.com

Defuddle Turns Any Web Page Into Clean Markdown From the Terminal

You have seen the problem. You want the article. The page gives you the article plus a sidebar, a comments section, a cookie notice, four related posts injected into the body, and whatever else the CMS decided to throw in.

Defuddle strips all of that. It extracts the main content from a web page and returns it as clean HTML or Markdown. One library, one job.

The fastest way to try it

No install needed:

npx defuddle parse https://example.com/article --markdown
Enter fullscreen mode Exit fullscreen mode

That prints clean Markdown to stdout. Pipe it wherever you want.

npx defuddle parse https://example.com/article --markdown > output.md
Enter fullscreen mode Exit fullscreen mode

Want the metadata too?

npx defuddle parse https://example.com/article --json
Enter fullscreen mode Exit fullscreen mode

Returns the content alongside title, author, description, domain, publication date, and word count. Enough to generate frontmatter automatically.

In Node.js

import { JSDOM } from 'jsdom';
import { Defuddle } from 'defuddle/node';

const dom = await JSDOM.fromURL('https://example.com/article');
const result = await Defuddle(dom, url, { markdown: true });

console.log(result.title);
console.log(result.content);
Enter fullscreen mode Exit fullscreen mode

The response object gives you everything: author, content, description, domain, published, wordCount, and more.

Why it is different from Readability

Mozilla Readability powers Firefox reader mode and has been the default choice for this kind of extraction. Defuddle is more forgiving -- it removes less when uncertain -- and standardises HTML before returning it. Footnotes, code blocks, and math elements come back in consistent formats. If you are feeding the output into a Markdown converter, that consistency matters.

Worth knowing

Defuddle is explicitly a work in progress. kepano (the person behind Obsidian) built it as the extraction layer for the clipper. If you use Web Clipper, you have already been downstream of it.

Try the playground at defuddle.md before building anything on top of it.

Top comments (0)