Hassann

Posted on Jun 3 • Originally published at apidog.com

Screenshot to code with Qwen 3.7 Plus

Hand Qwen 3.7 Plus a UI screenshot and use it to generate the front-end code for a matching React, HTML, Vue, or Tailwind component. Because the model can process images and code in the same request, a mockup, competitor page, or exported Figma frame can become a usable implementation starting point in one API call.

Try Apidog today

This guide walks through the implementation workflow: make the API call, write prompts that produce better code, iterate with rendered screenshots, and connect the generated UI to real or mocked APIs. For model background, see the Qwen 3.7 Plus overview. For request details, use the Qwen 3.7 Plus API guide. You can test the API and the endpoints your UI depends on in Apidog.

TL;DR

Send Qwen 3.7 Plus:

A screenshot.
A precise implementation prompt.
Your target framework and styling constraints.

The first response usually gets the structure right. To improve accuracy, render the generated component, screenshot the result, and send both the original and current render back to the model for correction.

Qwen 3.7 Plus is useful for this workflow because it combines vision and coding, supports a large context window for detailed designs, and is cheap enough to iterate on. The main work is prompt design and visual refinement.

Why Qwen 3.7 Plus works for screenshot-to-code

Screenshot-to-code requires two capabilities at the same time:

Understanding the UI from an image.
Producing maintainable front-end code.

Qwen 3.7 Plus is positioned well for this because it pairs multimodal vision with coding ability. It scores around 60% on SWE-Bench Pro and 70.3 on Terminal-Bench, and its 1M-token context helps with large or tall UI screenshots. At $0.40 per million input tokens, the cost also makes iterative refinement practical.

For a related workflow where the model operates a UI instead of rebuilding it, see the Qwen 3.7 Plus computer-use agent guide.

Make the basic API call

Send the screenshot as an image_url content part and include your instruction as text.

import os
import base64
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DASHSCOPE_API_KEY"],
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

def screenshot_to_code(png_path, prompt):
    with open(png_path, "rb") as image_file:
        b64 = base64.b64encode(image_file.read()).decode()

    response = client.chat.completions.create(
        model="qwen3.7-plus",
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": prompt,
                    },
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/png;base64,{b64}",
                        },
                    },
                ],
            }
        ],
    )

    return response.choices[0].message.content

code = screenshot_to_code(
    "mockup.png",
    "Rebuild this UI as a React component."
)

print(code)

Before shipping, confirm the current model ID in the Model Studio docs.

This minimal prompt works, but it is too vague for production-quality output. The prompt needs to define the stack, constraints, layout expectations, and output format.

Write a prompt that produces usable front-end code

A better prompt tells the model exactly what to generate.

Convert this UI screenshot into a single React component using Tailwind CSS.

Requirements:
- Match the layout, spacing, typography, and color palette as closely as possible.
- Make the component responsive down to a 375px mobile width.
- Use semantic HTML.
- Add accessible labels for inputs, buttons, and interactive elements.
- Use placeholder data where the screenshot shows dynamic content.
- Do not invent backend logic.
- Return only the component code, with no prose.

Include these details whenever possible:

Framework: React, Vue, Svelte, plain HTML, etc.
Styling approach: Tailwind CSS, CSS modules, inline CSS, component library.
Breakpoints: mobile, tablet, desktop widths.
Accessibility requirements.
How to handle dynamic data.
Whether to return a full page, a component, or only markup.

For example, if you want Tailwind output, explicitly say so. The Tailwind CSS docs are a useful reference for the utility classes the model may generate.

If you have a component spec or design brief, include it with the screenshot. Written constraints reduce guessing. The article on what a design.md does for coding agents explains why this improves results.

Use a visual feedback loop

The first generated component usually gets close on structure but misses details like:

Spacing.
Color values.
Font weights.
Border radius.
Icon placement.
Responsive behavior.

To close the gap, render the component locally, screenshot the result, then send both images back to Qwen 3.7 Plus.

Use a prompt like this:

Image 1 is the target design.
Image 2 is my current rendered implementation.

Tasks:
1. List the visual differences between image 1 and image 2.
2. Update the component so image 2 matches image 1 more closely.
3. Return only the corrected component code.

A practical loop looks like this:

Generate the first component from the original screenshot.
Paste it into your project.
Render it in the browser.
Take a screenshot of the rendered result.
Send the original and rendered screenshots back to the model.
Apply the corrected code.
Repeat until the output is close enough.

Two or three iterations are usually enough to make the component visually close to the original. This is the same perceive-and-correct pattern used in a computer-use agent, applied to code instead of UI actions.

Handle large production designs

Real product mockups are often too large to generate cleanly in one pass. Treat the screenshot as implementation input, not as a whole-page magic button.

1. Crop the screenshot

Instead of sending an entire dashboard, crop the specific area you want to build:

Header.
Sidebar.
Table.
Form.
Card grid.
Modal.
Settings panel.

Smaller screenshots reduce cost and improve output quality.

2. Downscale only as much as you can

Downscale large images to reduce token usage, but keep text readable. If the model cannot read labels, buttons, or table headers, it will guess.

3. Generate sections separately

For a dashboard, generate components independently:

Build only the sidebar navigation from this screenshot.
Use React and Tailwind CSS.
Return a reusable Sidebar component.

Then generate the main table:

Build only the data table section from this screenshot.
Use React and Tailwind CSS.
Use placeholder rows.
Return a reusable DataTable component.

Finally compose them in your app.

The large context window helps, but smaller scoped requests generally produce cleaner code.

Prompt fixes for common problems

Use these additions when the output is close but not shippable.

Wrong colors

The model may approximate colors. Provide exact values when you have them.

Use these exact colors:
- Primary: #2563eb
- Background: #f8fafc
- Text: #0f172a
- Muted text: #64748b
- Border: #e2e8f0

Invented icons

If the screenshot contains icons, specify the icon library.

Use lucide-react for icons.
Do not create custom SVG icons unless necessary.

Or:

Use Heroicons for all icons.
Choose the closest matching icon when the screenshot is ambiguous.

Made-up copy

If the model invents realistic text, force placeholders.

Use clearly marked placeholder content.
Do not invent real names, emails, companies, addresses, or metrics.

Too many nested divs

Ask for semantic markup and a flatter structure.

Use semantic HTML where possible:
- nav for navigation
- header for the top bar
- main for primary content
- section for content groups
- button for clickable actions

Avoid unnecessary wrapper divs.

Poor responsive layout

Specify exact responsive behavior.

Responsive behavior:
- At desktop width, show sidebar and content side by side.
- Below 768px, collapse the sidebar into a top menu.
- Below 375px, stack cards vertically and preserve readable spacing.

Turn the generated UI into a working app

The generated component is only the front end. A real feature still needs APIs for:

Fetching table data.
Submitting forms.
Loading user profiles.
Saving settings.
Running searches.
Handling pagination.
Returning validation errors.

Before wiring the component to a backend, define the API contract.

With Apidog, you can:

Define the endpoint contract.
Mock realistic responses.
Connect the generated UI to the mock API.
Test request and response behavior.
Hand the contract to backend developers.

The Apidog spec-first mode guide walks through that workflow. It pairs well with AI-generated front ends for the same reason it works with APIs built in Cursor: the contract keeps the generated code grounded.

You can also download Apidog to mock and test the APIs behind the UI Qwen 3.7 Plus generates.

Example workflow: screenshot to working component

A practical implementation flow looks like this:

1. Export or capture the target UI as a PNG.
2. Crop it to one component or page section.
3. Send it to Qwen 3.7 Plus with a precise stack-specific prompt.
4. Add the generated code to your app.
5. Render it locally.
6. Screenshot the rendered version.
7. Send both screenshots back for correction.
8. Repeat until the UI is close enough.
9. Define the API contract in Apidog.
10. Mock the endpoint response.
11. Connect the component to the mock API.
12. Test the endpoint behavior before backend implementation.

FAQ

What frameworks can Qwen 3.7 Plus target?

Any framework you specify in the prompt, including React, Vue, Svelte, plain HTML/CSS, Tailwind CSS, or a component library. Be explicit, because vague prompts usually produce generic markup.

How accurate is the first pass?

Usually close on layout and structure, but less precise on spacing, colors, and small visual details. The render-and-resend feedback loop is what improves it toward near-pixel accuracy.

Can it work from a Figma design?

Yes, if you export the Figma frame as an image. The model reads the rendered image, not the Figma file itself.

How do I reduce token cost?

Crop the image, downscale it while keeping text readable, and generate one section at a time instead of an entire page.

Does it build the backend too?

No. It generates front-end code that can call APIs. You still need to design, mock, and test those APIs separately. That is where Apidog fits into the workflow.

Bottom line

Qwen 3.7 Plus can turn screenshots into useful front-end code when you give it a precise prompt and iterate with visual feedback. Use it to generate the UI, then define and mock the required API contracts in Apidog so the generated component can become a working feature.

DEV Community