shashank ms

Posted on Jun 27

Building Design Tools with LLM: A Step-by-Step Guide

#product #oxlo #ai

Large language models have moved beyond chat interfaces and into specialized tooling. For developers building design tools, LLMs can now generate layout code, critique visual hierarchies from screenshots, and enforce design systems through structured function calling. The challenge is not whether to integrate an LLM, but how to architect the pipeline so that long context windows, multimodal inputs, and iterative agent loops remain cost predictable and fast.

Define the Design Domain

Start by narrowing the scope. An LLM-powered design tool might generate React components from prose, validate accessibility from screenshots, or manage a multi-step brand kit revision. Each use case demands different capabilities. Code generation requires a strong coding model. Visual critique needs a vision-enabled LLM. Agentic orchestration, where the model calls tools to update a canvas or fetch assets, benefits from models built for tool use.

Selecting Your Model Stack

Your backend should expose a mix of general reasoning, coding, vision, and image generation models without forcing you to manage multiple vendor SDKs. Oxlo.ai provides 45+ open-source and proprietary models across seven categories, all through a single OpenAI-compatible endpoint. For design tools, relevant options include:

General reasoning and chat: Llama 3.3 70B, Qwen 3 32B, and GPT-Oss 120B for interpreting user intent and design constraints.
Coding and layout generation: DeepSeek V3.2, DeepSeek R1 671B MoE, Qwen 3 Coder 30B, and Oxlo.ai Coder Fast for generating HTML, CSS, React, or SVG.
Vision analysis: Kimi K2.6 (with 131K context and vision), Kimi VL A3B, and Gemma 3 27B for reading mockups, wireframes, or screenshots.
Image generation: Oxlo.ai Image Pro, Oxlo.ai Image Ultra, Flux.1, and Stable Diffusion 3.5 for producing assets directly from the same API.
Agentic orchestration: GLM 5 (744B MoE), Minimax M2.5, and Qwen 3 32B for long-horizon tasks that require multiple tool calls.

Because Oxlo.ai is fully OpenAI SDK compatible, you can switch between these models by changing a single string. There are no cold starts on popular models, so UI iterations feel instantaneous.

Crucially, Oxlo.ai uses request-based pricing: one flat cost per API request regardless of prompt length. Unlike token-based providers such as Together AI, Fireworks AI, OpenRouter, Replicate, or Anyscale, your cost does not scale with input length. When you pass a full design system document, a lengthy HTML fragment, or a base64-encoded image into the context window, the price stays flat. For long-context and agentic workloads, this can be 10-100x cheaper than token-based billing. See https://oxlo.ai/pricing for current plan details.

Configure the OpenAI SDK for Oxlo.ai

Since Oxlo.ai exposes a drop-in replacement for the OpenAI API, you can use the official Python or Node.js SDK. Set the base URL to https://api.oxlo.ai/v1 and pick a model suited to your step.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key="YOUR_OXLO_API_KEY"
)

def generate_component(prompt: str) -> str:
    response = client.chat.completions.create(
        model="deepseek-v3-2",  # coding and reasoning, also available on free tier
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a UI engineer. Output only HTML with Tailwind CSS classes. "
                    "Do not include explanation."
                )
            },
            {"role": "user", "content": prompt}
        ],
        temperature=0.2
    )
    return response.choices[0].message.content

html = generate_component("Design a pricing card with three tiers and a toggle for annual billing.")
print(html)

Use JSON Mode for Design Tokens

Design tools often need structured data, not prose. JSON mode lets you enforce schemas for color palettes, spacing scales, or component props.

import json

response = client.chat.completions.create(
    model="llama-3-3-70b",
    messages=[
        {
            "role": "system",
            "content": "Extract design tokens into valid JSON with keys: colors, typography, spacing."
        },
        {
            "role": "user",
            "content": "Primary brand color is #0A2540, body font is Inter 16px, and base spacing unit is 8px."
        }
    ],
    response_format={"type": "json_object"}
)

tokens = json.loads(response.choices[0].message.content)
print(tokens)

Add Vision to Analyze Mockups

Vision models let you turn screenshots into structured feedback or code. This is useful for importing existing designs or reviewing user submissions.

response = client.chat.completions.create(
    model="kimi-k2-6",  # advanced reasoning, vision, 131K context
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "List accessibility issues in this landing page screenshot, then output corrected HTML."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/mockup.png"
                    }
                }
            ]
        }
    ],
    max_tokens=4096
)
print(response.choices[0].message.content)

Build Agentic Workflows with Function Calling

Complex design tools behave like agents: they fetch assets, check contrast ratios, save to a database, and render previews. Models such as Qwen 3 32B, GLM 5, Kimi K2.6, and Minimax M2.5 excel at agentic tool use.

Define tools the model can invoke, then let it orchestrate a multi-step design task.

tools = [
    {
        "type": "function",
        "function": {
            "name": "save_component",
            "description": "Save a React component to the design library",
            "parameters": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "code": {"type": "string"}
                },
                "required": ["name", "code"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "check_contrast",
            "description": "Validate WCAG contrast between two hex colors",
            "parameters": {
                "type": "object",
                "properties": {
                    "foreground": {"type": "string"},
                    "background": {"type": "string"}
                },
                "required": ["foreground", "background"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="qwen3-32b",  # strong for agent workflows
    messages=[
        {
            "role": "system",
            "content": (
                "You are a design-system agent. Generate a button component, "
                "check its contrast, and save it if it passes WCAG AA."
            )
        },
        {"role": "user", "content": "Create a dark primary button using #0A2540 and #FFFFFF."}
    ],
    tools=tools,
    tool_choice="auto"
)

print(response.choices[0].message)

Because agentic loops can involve many requests with long tool definitions and conversation history, flat request pricing keeps costs predictable as the context grows.

Optimize Costs and Scale

Design workflows are inherently iterative. A single session might include a system prompt with your entire design system documentation, multiple user messages, image uploads, and tool results. Under token-based billing, every additional paragraph in your prompt increases cost. Under Oxlo.ai's request-based model, the price remains flat per request no matter how large the input.

This matters for:

Long-context design specs: Pass entire CSS frameworks or component libraries into the system prompt without penalty.
Multi-turn refinement: Let users iterate on layout, color, and copy across ten or twenty turns.
Agentic tool chains: Allow the model to reason over long tool schemas and conversation histories.

Oxlo.ai offers a Free plan with 60 requests per day across 16+ free models, including a 7-day full-access trial. Paid plans scale to Pro and Premium tiers, with Enterprise options for dedicated GPUs and unlimited volume. For teams currently using token-based inference, the Enterprise plan guarantees 30% savings over your existing provider. Details are at https://oxlo.ai/pricing.

Put It All Together

A production design tool pipeline might look like this:

Ingest: Accept text, images, or Figma exports via your frontend.
Reason: Route to Llama 3.3 70B or DeepSeek R1 671B MoE for planning, or to Kimi K2.6 if visual analysis is needed.
Generate: Use DeepSeek V3.2 or Oxlo.ai Coder Fast to emit code.
Validate: Invoke functions for linting, contrast checks, or snapshot tests.
Render: Stream the response back to the user interface.

All steps hit the same endpoint at https://api.oxlo.ai/v1, reducing vendor fragmentation.

Next Steps

Start with the Free tier to prototype your design tool. Swap model identifiers to test coding quality against vision capabilities without rewriting client code. If your tool relies on long design documents or agentic loops, measure your inference costs against token-based alternatives. You will likely find that flat per-request pricing removes the friction from shipping rich, context-heavy creative software.

DEV Community