Every time I wanted to feed a web page into ChatGPT or Claude, I found myself doing the same tedious dance:
- Select all the text
- Copy-paste into a text editor
- Manually strip out navigation, ads, and cookie banners
- Reformat the headings and code blocks
- Wonder if I got all the relevant content
After doing this dozens of times, I built Page to Markdown — a Chrome extension that does it in one click.
What It Does
Click the extension icon on any web page. You get:
- Clean Markdown with headings, lists, tables, code blocks, images, and links preserved
- Token count so you know exactly how much of your context window you're using
- Word and character counts for quick reference
- Copy to clipboard or download as .md file
It strips out everything you don't need: navigation bars, footers, sidebars, ads, cookie banners, social media widgets, and other boilerplate.
Why Markdown?
If you're building AI workflows, Markdown is the ideal intermediate format:
- LLMs understand it natively — headings convey hierarchy, code blocks preserve formatting
- It's compact — typically 50-70% fewer tokens than raw HTML
- It's portable — works with every LLM, RAG pipeline, and note-taking tool
- It's readable — you can verify what you're feeding into your prompt
How It Works (No API Calls)
The extension runs entirely in your browser. Zero API calls, zero data sent anywhere.
Here's the approach:
-
Find the main content — it walks through
<article>,<main>,[role='main'],.post-content, and falls back to<body> -
Strip non-content elements — removes
<nav>,<footer>,<aside>, cookie banners, social widgets, ad containers, and modals -
Convert HTML to Markdown — a recursive converter handles each element type: headings become
#, lists become-, tables become pipe-delimited, code blocks get proper fencing - Count tokens — estimates using the ~4 characters per token heuristic (close enough for planning purposes)
The whole thing is about 300 lines of JavaScript. No dependencies, no build step.
The Token Counting Angle
This is the feature I actually use most. Before pasting content into an LLM, I want to know:
- Will this fit in my context window?
- How much of my budget am I using?
- Should I split this across multiple prompts?
The extension shows ~1,580 tokens right in the popup. Quick mental math, no surprises.
Try It
The extension is free — 3 conversions per day, unlimited with Pro.
Privacy: No accounts, no tracking, no data leaves your browser.
Built by The CodeFather — making data offers you can't refuse.
What's your workflow for feeding web content into LLMs? I'd love to hear how others handle this.
Top comments (1)
Solid take. The 300-line no-build approach is the right call here — most markdown converters either pull in a Turndown-style 50KB dep or hit a server, neither of which I want on a hot path.
One thing I noticed that you may already be planning: the token count popup only fires on the full-page extraction. The flow I keep wanting is the same extraction but scoped to what I actually selected. If I'm reading a long doc and just want to send paragraphs 3-7 to Claude, I highlight, hit the extension, and get the token count for just that slice — not the whole page. The selection-based variant is maybe 40 more lines (window.getSelection() + Range, then run the same content walk on the cloned subtree).
Second small thing from actually using converters like this: the ~4-chars-per-token heuristic undercounts for code and table-heavy pages and overcounts for prose. gpt-tokenizer in pure JS is ~6KB and tracks the real count close enough that I'd trust it for budget planning. Worth it if you ever ship a 'split into chunks' feature.
The 3/day free + unlimited Pro is the kind of pricing I respect. I would have paid $5 once for the no-server version.