Every frontend developer has lived this ticket: "Here's our marketing site, rebuild it in React." Or the freelance version: "Can you move our Webflow site to Next.js so we stop paying the subscription?"
You open DevTools. You start copying styles. You eyeball paddings. You recreate the DOM by hand, section by section, and three days later you have something that's almost like the original, except the hero spacing is slightly off and nobody can tell you why.
I did this enough times that I built a tool to kill the boring 80% of it: paste a URL, get an editable Next.js + Tailwind project. This post is about why that conversion is genuinely hard, the problems that make "just scrape the HTML" a naive answer, and the architecture that the job forces on you.
The problem isn't getting the HTML. It's everything after.
If the job were "fetch the markup," curl would solve it. The hard part is that a modern website is not its HTML. It's:
Computed styles, not authored styles. The useful styling lives in the rendered CSSOM after the cascade, media queries, and inheritance all resolve, not in the stylesheet you can download. A
JavaScript-rendered content. A large share of any modern page doesn't exist until JS runs. Static fetching returns an empty shell with a and nothing inside it. Whatever framework they used to build the site has to actually execute before there's anything worth reading.
Lazy-loaded everything. Images, sections, and whole below-the-fold blocks don't load until the user scrolls toward them. Grab the page on load and you capture a skeleton with half its content missing.
Design intent buried in pixel soup. The raw DOM might be thousands of wrapper nodes deep. A usable React tree is maybe a few dozen meaningful components. The distance between those two representations is the entire problem, and nothing in the page hands it to you.
So the real task was never scraping. It's reconstructing design intent from a fully rendered page and re-emitting it as clean, componentized code. Once you frame it that way, the architecture almost designs itself.
The pipeline
- Render the page like a real browser would You can't parse what hasn't rendered, so step one is to load the target in a real, headless browser environment rather than fetching raw HTML. A static HTTP request sees the shell; a real browser sees the page a human sees. The non-obvious work is making sure the page is fully materialized before you capture anything. That means waiting for the network and the framework to settle instead of grabbing the DOM the instant it's available, and it means dealing with content that only appears on interaction. The single biggest correctness win here is driving the page through a full scroll before capture, stepping down the viewport and letting lazy-loaded images and sections trigger, so that what you capture is the complete page and not the part that happened to be visible on load. Skip this and every conversion silently drops everything below the fold. This stage is where most naive "website to code" attempts quietly fail, because they treat the page as a document instead of a running application.
- Read computed styles, not source CSS This is the core decision that makes the whole thing tractable: instead of downloading and untangling someone's stylesheets, read the final computed style of each meaningful element from the live, rendered page. The browser has already done the hard work of resolving the cascade for you. You let it finish, then you read the answer. The challenge is volume. The computed style of a single element exposes hundreds of properties, and the overwhelming majority are inherited defaults that carry no design intent. Emit all of them and you get unreadable, unusable output. So the meaningful step is filtering: keeping the properties that actually express layout and visual design (box model, positioning, flex and grid behavior, typography, color, background, borders, spacing) and discarding the noise of defaults. Getting that filter right is most of the difference between output a developer will accept and output they'll throw away.
- Turn a wrapper jungle into a component structure A rendered page is deeply nested, and emitting it one-to-one produces code no one wants to maintain. The reconstruction step looks for structure a human would recognize: repeated patterns that should become reusable components (cards, list items, navigation links), redundant wrappers that can be flattened, and natural boundaries where one section ends and another begins. The goal is output that reads like something a developer would have written, not a literal transcript of the DOM. This is the hardest part to get fully right, and it's where there's the most room left to improve. Honest status: it produces a sound starting structure, not always the exact componentization you'd have chosen by hand.
- Translate computed values into Tailwind Because the output target is Tailwind, every captured visual value has to become a utility class. Concrete pixel and color values from the computed styles get mapped onto Tailwind's scale, falling back to explicit values where a design simply doesn't line up with the default scale. The aim is className strings that a developer reading the file would have plausibly typed themselves, rather than a wall of arbitrary values.
- Localize the assets A converted project that hotlinks back to the original site is a liability, so images, fonts, and other assets are pulled into the project itself and rewired, so the result stands on its own and runs without depending on the source site staying up.
- Emit a runnable Next.js project The final stage is code generation into a real Next.js + Tailwind project with a sane file layout, formatted output, and the structure you'd expect from a fresh scaffold. The end state is a project that runs with npm install && npm run dev and shows you the rebuilt page. What's still hard, and what I got wrong If this section were all wins I wouldn't believe it either, so here's the real state: Output is a strong starting point, not a pixel-perfect clone. On clean, content-driven sites it gets remarkably close. On complex sites it gets you most of the way and assumes you'll do cleanup. I aim for roughly 80%, and I say so up front, because the fastest way to lose a developer's trust is to promise 100% and hand them 80. Heavily interactive and canvas/WebGL-driven pages are a wall. You can capture a faithful rendered frame; you cannot capture behavior that only exists at runtime. Complex animations tend to come through as their resting state rather than their motion. It converts one page at a time right now. Full-site crawling is the obvious next step and it's genuinely harder than it sounds, because now you're reconstructing routing, shared layout, and a component library across pages instead of within one. Componentization is "good," not "exactly how you'd do it." See step 3. This is where most of my remaining work is going. Where it's at I built this into a tool calledurl2code: paste a URL, preview the rebuilt page, download the Next.js + Tailwind project. It's in free closed beta while I harden exactly the rough edges above. If you do website rebuilds or migrations and want to throw a real site at it and tell me where it breaks, I'm approving beta testers. Drop a comment or find it at url2code. The conversions that fail are the ones I most want to see, because that's my roadmap. And if you've built anything in this space, I'd like to hear how you approached the componentization problem from step 3, deciding where one component ends and the next begins from nothing but a rendered DOM. It was the hardest part for me and I'm not convinced I've found the best approach yet.
Top comments (0)