DEV Community

Cover image for Building a Cost-Efficient Generative UI Architecture in React Native
Serif COLAKEL
Serif COLAKEL

Posted on

Building a Cost-Efficient Generative UI Architecture in React Native

In this article, I want to share the architecture we built for Generative UI in the Fonyx mobile application. This system is designed to deliver premium native UI experiences while keeping LLM costs extremely low.

Tool Calling + Metadata-Driven Rendering

Generative UI (GenUI) is emerging as one of the most powerful patterns for AI-native applications. Instead of returning plain text responses, Large Language Models can dynamically orchestrate real UI components inside applications.

However, many early Generative UI systems face serious production challenges:

  • extremely high token costs
  • slow response times
  • hallucinated datasets
  • unpredictable UI outputs

In the Fonyx mobile application, we implemented a different architecture designed to deliver premium native UI experiences while keeping LLM costs extremely low.

The key idea behind this system is simple:

Metadata over Data

Instead of generating datasets, the model returns lightweight metadata describing what UI should render, while the client application fetches the actual data.

This dramatically improves:

  • performance
  • reliability
  • cost efficiency

The Core Principle: Metadata over Data

Many Generative UI systems ask the LLM to generate both:

  • UI structure
  • data payloads

Example of a common but inefficient approach:

{
  "component": "line_chart",
  "data": [
    { "date": "2024-01-01", "value": 10.21 },
    { "date": "2024-01-02", "value": 10.34 }
  ]
}
Enter fullscreen mode Exit fullscreen mode

This creates two major problems.

Token Explosion

The LLM must generate large datasets as text, dramatically increasing token usage.

Higher Latency

Large responses increase generation time and Time-To-First-Token (TTFT).


The Metadata Approach

Instead, the model returns only the information needed to render the UI.

{
  "tool": "line_history_values",
  "args": {
    "fund_code": "AFT",
    "limit": 30
  }
}
Enter fullscreen mode Exit fullscreen mode

The client application then performs the data request.

LLM → Select Component + Metadata
Client → Fetch Data
Client → Render Native Component
Enter fullscreen mode Exit fullscreen mode

Benefits

Benefit Result
Lower token usage Only metadata generated
Faster responses Minimal generation time
Higher reliability Less hallucination risk
Native UX Real UI components

Generative UI Architecture

This system separates AI orchestration from UI rendering.

Generative UI Architecture Mermaid Diagram

Responsibility Split

Layer Responsibility
LLM Decide which component should render
Client Fetch real data
UI Render native interface

This prevents a common anti-pattern:

LLMs generating raw datasets.


Professional Production Architecture

Large scale Generative UI systems typically follow a three-layer architecture.

Professional Production Architecture Mermaid Diagram

Why this architecture works

Layer Role
LLM decision engine
Client orchestration
Backend data provider

This structure keeps the system:

  • deterministic
  • scalable
  • cost-efficient

Tool Calling Strategy

Instead of returning free-text responses, the model uses structured tool calls.

Example tool definition:

{
  "name": "line_history_values",
  "description": "Render a fund performance chart",
  "parameters": {
    "type": "object",
    "properties": {
      "fund_code": { "type": "string" },
      "limit": { "type": "number" }
    },
    "required": ["fund_code"]
  }
}
Enter fullscreen mode Exit fullscreen mode

System Prompt Strategy

A strong system prompt ensures the model only returns metadata.

Example:

You are a UI orchestration assistant.

Never generate datasets.

Only select tools and return minimal metadata.
Enter fullscreen mode Exit fullscreen mode

This significantly improves tool-selection reliability.


LLM Request / Response Example

User Request

Show me the last 30 days performance of AFT fund
Enter fullscreen mode Exit fullscreen mode

Request Sent to the Model

{
  "model": "stepfun/step-3.5-flash",
  "messages": [
    {
      "role": "system",
      "content": "You are a UI orchestration assistant."
    },
    {
      "role": "user",
      "content": "Show me the last 30 days performance of AFT fund"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Model Response

{
  "tool_call": {
    "name": "line_history_values",
    "arguments": {
      "fund_code": "AFT",
      "limit": 30
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Notice something important:

The LLM does not generate any dataset.


Runtime Safety with Schema Validation

LLM outputs should never be trusted blindly.

Tool arguments must be validated before rendering UI.

Example validation using Zod:

import { z } from "zod";

export const LineHistorySchema = z.object({
  fund_code: z.string().min(3).max(5).toUpperCase(),
  limit: z.number().optional().default(30),
});
Enter fullscreen mode Exit fullscreen mode

Parsing tool arguments:

const parseToolArgs = (args: string) => {
  const result = LineHistorySchema.safeParse(JSON.parse(args));

  if (!result.success) {
    console.error("Invalid tool arguments");
    return null;
  }

  return result.data;
};
Enter fullscreen mode Exit fullscreen mode

Validation prevents:

  • runtime crashes
  • hallucinated parameters
  • invalid UI props

GenUI Renderer Pattern

Tool calls map to predefined UI components.

/**
 * AI Tool isimleri ile Component eşleşmeleri için Enum tanımları.
 */
export enum GenUIComponent {
  LINE_HISTORY_VALUES = "line_history_values",
  FUND_CARD = "fund_card",
}

export type GenUIComponentProps =
  | {
      type: GenUIComponent.LINE_HISTORY_VALUES;
      props: Parameters<typeof UILineHistoryValues>[0];
    }
  | {
      type: GenUIComponent.NAV_CARD;
      props: Parameters<typeof UINavigationCard>[0];
    };

export const PickComponent = ({ type, props }: GenUIComponentProps) => {
  switch (type) {
    case GenUIComponent.LINE_HISTORY_VALUES:
      return <UILineHistoryValues {...props} />;

    case GenUIComponent.NAV_CARD:
      return <UINavigationCard {...props} />;

    default:
      return <Text>Unknown Component</Text>;
  }
};

export const UILineHistoryValues = (props: LineHistoryProps) => {
  // Client-side data fetching and rendering logic here
  // ...
  return <LineChart data={fetchedData} title={props.title} />;
};

export const UINavigationCard = (props: NavCardProps) => {
  // Client-side data fetching and rendering logic here
  // ...
  return <Card title={props.title} description={props.description} />;
};
Enter fullscreen mode Exit fullscreen mode

Each component is responsible for:

  • Fetching its own data
  • Handling loading states
  • Rendering native UI

This keeps the AI layer extremely lightweight.


GenUI Rendering Flow


Token Cost Comparison

Traditional GenUI systems often generate large JSON datasets.

Example:

{
  "data": [
    { "date": "2024-01-01", "value": 10.23 },
    { "date": "2024-01-02", "value": 10.45 }
  ]
}
Enter fullscreen mode Exit fullscreen mode

This increases token usage dramatically.

Estimated token usage

Approach Tokens Cost
LLM generates dataset 2000-5000 High
Metadata only 20-40 Very Low

Reducing output size from 2000 tokens to ~30 tokens can reduce cost by 100× or more.


Production GenUI Folder Structure (React Native)

A scalable project structure might look like this:

src/

  ai/
    llm/
      openrouterClient.ts

    tools/
      registry.ts
      lineHistory.tool.ts

    schemas/
      lineHistory.schema.ts

    renderer/
      PickComponent.tsx

  components/
    genui/
      UILineHistoryValues.tsx
      UINavigationCard.tsx

  services/
    apiClient.ts

  observability/
    aiTracing.ts
Enter fullscreen mode Exit fullscreen mode

Key idea:

Layer Responsibility
ai/tools tool definitions
ai/schemas runtime validation
ai/renderer component picker
components/genui native UI components
services API communication

GenUI Caching Strategy

Caching prevents unnecessary LLM calls.

Cache Layer Purpose
Tool decision cache store LLM component decisions
API response cache reuse fetched datasets
prompt cache avoid repeated prompts

Example implementation:

const decisionCache = new Map();

export const getCachedDecision = (prompt) => {
  return decisionCache.get(prompt);
};

export const setCachedDecision = (prompt, tool) => {
  decisionCache.set(prompt, tool);
};
Enter fullscreen mode Exit fullscreen mode

This reduces both latency and token cost.


AI Observability

Production AI systems must track:

  • token usage
  • latency
  • tool frequency
  • error rates

Example tracing middleware:

export const traceLLMCall = async (fn) => {
  const start = performance.now();

  const result = await fn();

  const duration = performance.now() - start;

  console.log("AI_CALL_DURATION", duration);

  return result;
};
Enter fullscreen mode Exit fullscreen mode

Token tracking example:

console.log("prompt_tokens", response.usage.prompt_tokens);
console.log("completion_tokens", response.usage.completion_tokens);
Enter fullscreen mode Exit fullscreen mode

Observability helps optimize both cost and performance.

Advanced Workflow Management with Effect-TS

For more complex scenarios (multi-step data fetching, retries, fallbacks), we use Effect-TS.

Effect-TS provides a powerful functional runtime for handling asynchronous workflows.

Key benefits:

Typed error handling

Dependency injection

Declarative async pipelines

Example pipeline:

import { Effect, pipe } from "effect";

const parseArgs = (args: string) =>
  Effect.try({
    try: () => LineHistorySchema.parse(JSON.parse(args)),
    catch: (e) => new Error(`Parse Error: ${e}`),
  });

const fetchData = (props: LineHistoryProps) =>
  Effect.promise(() =>
    fetch(`api/funds/${props.fund_code}/history?limit=${props.limit}`).then(
      (res) => res.json(),
    ),
  );

const renderGenUIProcess = (rawArgs: string) =>
  pipe(
    parseArgs(rawArgs),
    Effect.flatMap(fetchData),
    Effect.tap((data) => Effect.log(`Fetched ${data.length} records`)),
    Effect.catchAll((err) =>
      Effect.succeed({ error: true, message: err.message }),
    ),
  );
Enter fullscreen mode Exit fullscreen mode

This ensures errors are tracked across:

  • parsing
  • data fetching
  • rendering

Performance Comparison

Feature Traditional GenUI Fonyx GenUI
Token Cost Very High Extremely Low
Latency Slow Very Fast
Data Handling Generated by LLM Client-side fetching
Reliability Medium High
UX Quality Markdown / Text Native UI

Future Enhancements

The architecture opens the door for more advanced AI-native UX features.

  • Shared Element Transitions

Smooth transitions from chat messages to full-screen visualizations.

  • Local LLM Fallback

Simple navigation commands handled by on-device models.

  • Predictive UI Prefetching

Client can preload data for likely next actions suggested by the LLM.


Why Most Generative UI Systems Fail in Production

Many Generative UI demos look impressive but fail when deployed at scale.


LLMs Used as Rendering Engines

A common mistake is asking the model to generate UI layouts.

Example:

Generate a dashboard UI for this data
Enter fullscreen mode Exit fullscreen mode

This leads to:

  • unpredictable layouts
  • inconsistent UI
  • difficult debugging

Better pattern:

LLM decides component
Application renders UI
Enter fullscreen mode Exit fullscreen mode

Models Generating Raw Datasets

Some systems ask the LLM to generate datasets.

Problems:

  • huge token usage
  • hallucinated numbers
  • slow responses

Instead:

LLM → metadata
Client → fetch data
Enter fullscreen mode Exit fullscreen mode

Lack of Schema Validation

Without validation:

  • invalid props crash UI
  • hallucinated parameters break components

Validation is mandatory.


Prompt-Centric Architectures

Large prompts cause:

  • high token cost
  • unpredictable results
  • slower responses

Structured tools are more reliable.


Final Insight

Generative UI works best when the LLM acts as a decision engine, not a rendering engine.

The ideal separation is:

LLM → decision layer
Client → data layer
UI → rendering layer
Enter fullscreen mode Exit fullscreen mode

This architecture allows AI-powered applications to scale to:

  • millions of users
  • deterministic UI
  • minimal token cost

while still delivering dynamic, intelligent user experiences.

Happy Coding! 🚀

Top comments (0)