DEV Community

Cover image for Convert Any UI Images to Multi-Page HTML Website With UI Editor
Usman Mehfooz
Usman Mehfooz

Posted on

Convert Any UI Images to Multi-Page HTML Website With UI Editor

Sample HTML Generated

Final Submission Text

This is a submission for the Google AI Studio Multimodal Challenge

What I Built

I built ProtoHTML, a web-based tool designed to bridge the gap between design and development. It transforms static website mockups (like screenshots or design files) into fully functional, multi-page HTML websites styled with Tailwind CSS.

Main UI

The problem ProtoHTML solves is the tedious and time-consuming process of manually converting a visual design into code. For developers and designers, this "mockup-to-code" phase can be a major bottleneck. ProtoHTML automates this by using a powerful multimodal AI to analyze the images and write the code, turning a process that could take hours into one that takes just a few seconds.

Key features include:

  • Multi-Page Site Generation: Upload multiple image mockups at once to generate a complete website structure.
  • AI-Powered Code Generation: Leverages the gemini-2.5-flash-image-preview model to produce clean, semantic HTML and Tailwind CSS.
  • Live Editable Previews: Instantly preview the generated pages and edit text content directly in the browser, with the underlying code updating in real-time.
  • One-Click Export: Package the entire multi-page website into a single, downloadable .zip file, ready for immediate deployment.

Demo

USED FREE API KEY SO IT MAY NOT BE WORKING

You can try the live application here: https://ai-multi-page-architect-626025278302.us-west1.run.app/

Here is a video walkthrough of the application in action:

The application has a clean UI for uploading mockups and editing the results. The AI generates clean HTML and CSS as the output.

How I Used Google AI Studio

Google AI Studio was the complete development environment for building and iterating on ProtoHTML. The core of the application is powered by the gemini-2.5-flash-image-preview model (affectionately known as 'nano banana'), used during the free trial period on Sept 6-7. This model was the perfect choice for its powerful and fast multimodal capabilities.

The key to getting high-quality, consistent output was prompt engineering. I crafted a detailed systemInstruction that sets the persona for the AI as an "expert senior frontend developer" and provides a strict set of rules it must follow. These rules dictate everything from the output format (raw HTML only) to technical requirements like including the Tailwind CSS CDN link, using semantic HTML5 tags, and implementing responsive design patterns.

Each API call is a multimodal request, sending both the visual image data and a concise text prompt (e.g., Based on the provided image, generate the complete HTML file for the "About Us" page now.) to the Gemini model. This combination allows the AI to understand both the visual layout from the image and the specific context for the page from the text.

Multimodal Features

The primary multimodal feature of ProtoHTML is Image-to-Code Generation. The application takes a visual input (a webpage mockup) and translates it into a structured, textual output (a complete HTML file with Tailwind CSS classes).

This functionality fundamentally enhances the user experience in several ways:

  • Accelerates Prototyping: It dramatically reduces the friction between a visual idea and a functional prototype. Users can go from a set of static images to an interactive, multi-page website in minutes, allowing for rapid iteration and feedback.
  • Empowers Non-Coders: Designers or project managers can bring their visions to life without needing to write a single line of code, making web development more accessible.
  • Creates a Tangible Feedback Loop: The most powerful part of the experience is the immediate connection between the visual input and the interactive output. Seeing your static mockup rendered as a live, editable webpage in the "Preview & Edit" tab is a powerful "wow" moment. It makes the AI's "understanding" of the image tangible and gives the user immediate control to refine the result.

Top comments (0)