In this article, I want to share the architecture we built for Generative UI in the Fonyx mobile application. This system is designed to deliver premium native UI experiences while keeping LLM costs extremely low.
Tool Calling + Metadata-Driven Rendering
Generative UI (GenUI) is emerging as one of the most powerful patterns for AI-native applications. Instead of returning plain text responses, Large Language Models can dynamically orchestrate real UI components inside applications.
However, many early Generative UI systems face serious production challenges:
- extremely high token costs
- slow response times
- hallucinated datasets
- unpredictable UI outputs
In the Fonyx mobile application, we implemented a different architecture designed to deliver premium native UI experiences while keeping LLM costs extremely low.
The key idea behind this system is simple:
Metadata over Data
Instead of generating datasets, the model returns lightweight metadata describing what UI should render, while the client application fetches the actual data.
This dramatically improves:
- performance
- reliability
- cost efficiency
The Core Principle: Metadata over Data
Many Generative UI systems ask the LLM to generate both:
- UI structure
- data payloads
Example of a common but inefficient approach:
{
"component": "line_chart",
"data": [
{ "date": "2024-01-01", "value": 10.21 },
{ "date": "2024-01-02", "value": 10.34 }
]
}
This creates two major problems.
Token Explosion
The LLM must generate large datasets as text, dramatically increasing token usage.
Higher Latency
Large responses increase generation time and Time-To-First-Token (TTFT).
The Metadata Approach
Instead, the model returns only the information needed to render the UI.
{
"tool": "line_history_values",
"args": {
"fund_code": "AFT",
"limit": 30
}
}
The client application then performs the data request.
LLM → Select Component + Metadata
Client → Fetch Data
Client → Render Native Component
Benefits
| Benefit | Result |
|---|---|
| Lower token usage | Only metadata generated |
| Faster responses | Minimal generation time |
| Higher reliability | Less hallucination risk |
| Native UX | Real UI components |
Generative UI Architecture
This system separates AI orchestration from UI rendering.
Responsibility Split
| Layer | Responsibility |
|---|---|
| LLM | Decide which component should render |
| Client | Fetch real data |
| UI | Render native interface |
This prevents a common anti-pattern:
LLMs generating raw datasets.
Professional Production Architecture
Large scale Generative UI systems typically follow a three-layer architecture.
Why this architecture works
| Layer | Role |
|---|---|
| LLM | decision engine |
| Client | orchestration |
| Backend | data provider |
This structure keeps the system:
- deterministic
- scalable
- cost-efficient
Tool Calling Strategy
Instead of returning free-text responses, the model uses structured tool calls.
Example tool definition:
{
"name": "line_history_values",
"description": "Render a fund performance chart",
"parameters": {
"type": "object",
"properties": {
"fund_code": { "type": "string" },
"limit": { "type": "number" }
},
"required": ["fund_code"]
}
}
System Prompt Strategy
A strong system prompt ensures the model only returns metadata.
Example:
You are a UI orchestration assistant.
Never generate datasets.
Only select tools and return minimal metadata.
This significantly improves tool-selection reliability.
LLM Request / Response Example
User Request
Show me the last 30 days performance of AFT fund
Request Sent to the Model
{
"model": "stepfun/step-3.5-flash",
"messages": [
{
"role": "system",
"content": "You are a UI orchestration assistant."
},
{
"role": "user",
"content": "Show me the last 30 days performance of AFT fund"
}
]
}
Model Response
{
"tool_call": {
"name": "line_history_values",
"arguments": {
"fund_code": "AFT",
"limit": 30
}
}
}
Notice something important:
The LLM does not generate any dataset.
Runtime Safety with Schema Validation
LLM outputs should never be trusted blindly.
Tool arguments must be validated before rendering UI.
Example validation using Zod:
import { z } from "zod";
export const LineHistorySchema = z.object({
fund_code: z.string().min(3).max(5).toUpperCase(),
limit: z.number().optional().default(30),
});
Parsing tool arguments:
const parseToolArgs = (args: string) => {
const result = LineHistorySchema.safeParse(JSON.parse(args));
if (!result.success) {
console.error("Invalid tool arguments");
return null;
}
return result.data;
};
Validation prevents:
- runtime crashes
- hallucinated parameters
- invalid UI props
GenUI Renderer Pattern
Tool calls map to predefined UI components.
/**
* AI Tool isimleri ile Component eşleşmeleri için Enum tanımları.
*/
export enum GenUIComponent {
LINE_HISTORY_VALUES = "line_history_values",
FUND_CARD = "fund_card",
}
export type GenUIComponentProps =
| {
type: GenUIComponent.LINE_HISTORY_VALUES;
props: Parameters<typeof UILineHistoryValues>[0];
}
| {
type: GenUIComponent.NAV_CARD;
props: Parameters<typeof UINavigationCard>[0];
};
export const PickComponent = ({ type, props }: GenUIComponentProps) => {
switch (type) {
case GenUIComponent.LINE_HISTORY_VALUES:
return <UILineHistoryValues {...props} />;
case GenUIComponent.NAV_CARD:
return <UINavigationCard {...props} />;
default:
return <Text>Unknown Component</Text>;
}
};
export const UILineHistoryValues = (props: LineHistoryProps) => {
// Client-side data fetching and rendering logic here
// ...
return <LineChart data={fetchedData} title={props.title} />;
};
export const UINavigationCard = (props: NavCardProps) => {
// Client-side data fetching and rendering logic here
// ...
return <Card title={props.title} description={props.description} />;
};
Each component is responsible for:
- Fetching its own data
- Handling loading states
- Rendering native UI
This keeps the AI layer extremely lightweight.
GenUI Rendering Flow
Token Cost Comparison
Traditional GenUI systems often generate large JSON datasets.
Example:
{
"data": [
{ "date": "2024-01-01", "value": 10.23 },
{ "date": "2024-01-02", "value": 10.45 }
]
}
This increases token usage dramatically.
Estimated token usage
| Approach | Tokens | Cost |
|---|---|---|
| LLM generates dataset | 2000-5000 | High |
| Metadata only | 20-40 | Very Low |
Reducing output size from 2000 tokens to ~30 tokens can reduce cost by 100× or more.
Production GenUI Folder Structure (React Native)
A scalable project structure might look like this:
src/
ai/
llm/
openrouterClient.ts
tools/
registry.ts
lineHistory.tool.ts
schemas/
lineHistory.schema.ts
renderer/
PickComponent.tsx
components/
genui/
UILineHistoryValues.tsx
UINavigationCard.tsx
services/
apiClient.ts
observability/
aiTracing.ts
Key idea:
| Layer | Responsibility |
|---|---|
| ai/tools | tool definitions |
| ai/schemas | runtime validation |
| ai/renderer | component picker |
| components/genui | native UI components |
| services | API communication |
GenUI Caching Strategy
Caching prevents unnecessary LLM calls.
| Cache Layer | Purpose |
|---|---|
| Tool decision cache | store LLM component decisions |
| API response cache | reuse fetched datasets |
| prompt cache | avoid repeated prompts |
Example implementation:
const decisionCache = new Map();
export const getCachedDecision = (prompt) => {
return decisionCache.get(prompt);
};
export const setCachedDecision = (prompt, tool) => {
decisionCache.set(prompt, tool);
};
This reduces both latency and token cost.
AI Observability
Production AI systems must track:
- token usage
- latency
- tool frequency
- error rates
Example tracing middleware:
export const traceLLMCall = async (fn) => {
const start = performance.now();
const result = await fn();
const duration = performance.now() - start;
console.log("AI_CALL_DURATION", duration);
return result;
};
Token tracking example:
console.log("prompt_tokens", response.usage.prompt_tokens);
console.log("completion_tokens", response.usage.completion_tokens);
Observability helps optimize both cost and performance.
Advanced Workflow Management with Effect-TS
For more complex scenarios (multi-step data fetching, retries, fallbacks), we use Effect-TS.
Effect-TS provides a powerful functional runtime for handling asynchronous workflows.
Key benefits:
Typed error handling
Dependency injection
Declarative async pipelines
Example pipeline:
import { Effect, pipe } from "effect";
const parseArgs = (args: string) =>
Effect.try({
try: () => LineHistorySchema.parse(JSON.parse(args)),
catch: (e) => new Error(`Parse Error: ${e}`),
});
const fetchData = (props: LineHistoryProps) =>
Effect.promise(() =>
fetch(`api/funds/${props.fund_code}/history?limit=${props.limit}`).then(
(res) => res.json(),
),
);
const renderGenUIProcess = (rawArgs: string) =>
pipe(
parseArgs(rawArgs),
Effect.flatMap(fetchData),
Effect.tap((data) => Effect.log(`Fetched ${data.length} records`)),
Effect.catchAll((err) =>
Effect.succeed({ error: true, message: err.message }),
),
);
This ensures errors are tracked across:
- parsing
- data fetching
- rendering
Performance Comparison
| Feature | Traditional GenUI | Fonyx GenUI |
|---|---|---|
| Token Cost | Very High | Extremely Low |
| Latency | Slow | Very Fast |
| Data Handling | Generated by LLM | Client-side fetching |
| Reliability | Medium | High |
| UX Quality | Markdown / Text | Native UI |
Future Enhancements
The architecture opens the door for more advanced AI-native UX features.
- Shared Element Transitions
Smooth transitions from chat messages to full-screen visualizations.
- Local LLM Fallback
Simple navigation commands handled by on-device models.
- Predictive UI Prefetching
Client can preload data for likely next actions suggested by the LLM.
Why Most Generative UI Systems Fail in Production
Many Generative UI demos look impressive but fail when deployed at scale.
LLMs Used as Rendering Engines
A common mistake is asking the model to generate UI layouts.
Example:
Generate a dashboard UI for this data
This leads to:
- unpredictable layouts
- inconsistent UI
- difficult debugging
Better pattern:
LLM decides component
Application renders UI
Models Generating Raw Datasets
Some systems ask the LLM to generate datasets.
Problems:
- huge token usage
- hallucinated numbers
- slow responses
Instead:
LLM → metadata
Client → fetch data
Lack of Schema Validation
Without validation:
- invalid props crash UI
- hallucinated parameters break components
Validation is mandatory.
Prompt-Centric Architectures
Large prompts cause:
- high token cost
- unpredictable results
- slower responses
Structured tools are more reliable.
Final Insight
Generative UI works best when the LLM acts as a decision engine, not a rendering engine.
The ideal separation is:
LLM → decision layer
Client → data layer
UI → rendering layer
This architecture allows AI-powered applications to scale to:
- millions of users
- deterministic UI
- minimal token cost
while still delivering dynamic, intelligent user experiences.
Happy Coding! 🚀



Top comments (0)