<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ye Allen</title>
    <description>The latest articles on DEV Community by Ye Allen (@ye_allen_).</description>
    <link>https://dev.to/ye_allen_</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3919611%2F58403f09-105c-4557-bc25-ab555b7b4a22.png</url>
      <title>DEV Community: Ye Allen</title>
      <link>https://dev.to/ye_allen_</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ye_allen_"/>
    <language>en</language>
    <item>
      <title>A Simple Way to Test Multiple AI Models with One API</title>
      <dc:creator>Ye Allen</dc:creator>
      <pubDate>Thu, 18 Jun 2026 15:07:49 +0000</pubDate>
      <link>https://dev.to/ye_allen_/a-simple-way-to-test-multiple-ai-models-with-one-api-547</link>
      <guid>https://dev.to/ye_allen_/a-simple-way-to-test-multiple-ai-models-with-one-api-547</guid>
      <description>&lt;p&gt;Building AI applications often starts with one model.&lt;/p&gt;

&lt;p&gt;But as the product grows, developers may need to test different models for different tasks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fast chat replies&lt;/li&gt;
&lt;li&gt;reasoning&lt;/li&gt;
&lt;li&gt;coding&lt;/li&gt;
&lt;li&gt;multilingual support&lt;/li&gt;
&lt;li&gt;RAG applications&lt;/li&gt;
&lt;li&gt;AI agents&lt;/li&gt;
&lt;li&gt;automation workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Managing each model separately can become messy. Different providers may require different API keys, endpoints, pricing rules, and integration steps.&lt;/p&gt;

&lt;p&gt;VectorNode helps simplify this process.&lt;/p&gt;

&lt;p&gt;VectorNode is a multi-model AI API platform that allows developers to access and test models such as GPT, Claude, Gemini, DeepSeek, Qwen and more through one platform.&lt;/p&gt;

&lt;p&gt;For developers already using OpenAI-compatible APIs, the integration can be simple:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
js
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.VECTORNODE_API_KEY,
  baseURL: "https://www.vectronode.com/v1",
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>javascript</category>
      <category>llm</category>
    </item>
    <item>
      <title>How to Structure AI Model Access for Small Developer Teams</title>
      <dc:creator>Ye Allen</dc:creator>
      <pubDate>Tue, 16 Jun 2026 08:24:36 +0000</pubDate>
      <link>https://dev.to/ye_allen_/how-to-structure-ai-model-access-for-small-developer-teams-28k7</link>
      <guid>https://dev.to/ye_allen_/how-to-structure-ai-model-access-for-small-developer-teams-28k7</guid>
      <description>&lt;p&gt;Small developer teams often begin with one AI model and one simple API integration.&lt;/p&gt;

&lt;p&gt;That is usually the right way to start.&lt;/p&gt;

&lt;p&gt;A prototype does not need a complex architecture. A developer can connect a text model, test a few prompts, ship a first feature, and learn from real users.&lt;/p&gt;

&lt;p&gt;But once the product grows, the model layer becomes harder to manage.&lt;/p&gt;

&lt;p&gt;A chatbot may need fast text responses. A RAG feature may need stronger reasoning over retrieved documents. An agent workflow may need structured output and tool-use reliability. A creative product may need image, video, or audio models.&lt;/p&gt;

&lt;p&gt;At that point, the question is no longer only:&lt;/p&gt;

&lt;p&gt;"Which model should we use?"&lt;/p&gt;

&lt;p&gt;A better question is:&lt;/p&gt;

&lt;p&gt;"How should the product organize model access so we can test, switch, and scale without rewriting core logic?"&lt;/p&gt;

&lt;h2&gt;
  
  
  Start with workflows
&lt;/h2&gt;

&lt;p&gt;Before choosing models, list the workflows inside the product.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;support chat&lt;/li&gt;
&lt;li&gt;document summarization&lt;/li&gt;
&lt;li&gt;RAG answer generation&lt;/li&gt;
&lt;li&gt;agent planning&lt;/li&gt;
&lt;li&gt;JSON extraction&lt;/li&gt;
&lt;li&gt;image generation&lt;/li&gt;
&lt;li&gt;video generation&lt;/li&gt;
&lt;li&gt;audio transcription&lt;/li&gt;
&lt;li&gt;content generation&lt;/li&gt;
&lt;li&gt;automation workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each workflow has different requirements.&lt;/p&gt;

&lt;p&gt;Some workflows care most about latency. Some care about reasoning. Some care about structured output. Some care about media quality, completion time, or cost.&lt;/p&gt;

&lt;p&gt;This is why one fixed model usually does not fit every part of an AI product.&lt;/p&gt;

&lt;h2&gt;
  
  
  Create a model access layer
&lt;/h2&gt;

&lt;p&gt;A simple pattern is to keep model access separate from business logic.&lt;/p&gt;

&lt;p&gt;Instead of calling a specific provider directly from every feature, create an internal layer that maps product workflows to model capabilities.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
text
support_chat_model = configurable
rag_reasoning_model = configurable
agent_planning_model = configurable
json_output_model = configurable
image_generation_model = configurable
video_generation_model = configurable
audio_model = configurable
fallback_model = configurable
The application can then request a capability instead of depending on one fixed provider or model name everywhere.
This makes the product easier to adjust later.
If one model becomes too slow, too expensive, or unavailable, the team can test another option without rewriting the feature itself.
Keep routes configurable
Model evaluation should not stop at the model name.
The same type of capability may be available through different routes with different pricing, latency, and availability behavior.
For each test, record:
model name
route or provider path
request cost
response latency
timeout behavior
error behavior
output quality
supported parameters
workflow used for testing
This creates a better decision record than only remembering which model looked good in one demo.
Test API behavior, not only output
A model can produce strong output and still create integration problems.
Developer teams should also test:
authentication
streaming responses
structured output
async jobs
timeout handling
retry behavior
error messages
usage reporting
asset retrieval
unsupported parameters
An OpenAI-compatible API format can make many text-model integrations easier because existing SDKs and developer tools may already support that request structure.
But it should be treated as one technical format, not the entire product strategy.
Image, video, audio, and specialized models may require different endpoints, parameters, or job-based workflows. Good documentation should make those differences clear.
Build a small evaluation matrix
A simple matrix is enough for early teams.
Workflow             Main requirement        Primary model     Alternative      Key metric
Support chat          Fast response           configurable      configurable     latency
RAG answers           reasoning quality       configurable      configurable     answer quality
Agent tools           structured output       configurable      configurable     schema success
Image generation      prompt accuracy         configurable      configurable     quality score
Video generation      stable completion       configurable      configurable     completion quality
Audio transcription   accurate text           configurable      configurable     error rate
The exact models can change over time.
The important point is that model choice should remain visible and configurable.
Monitor after launch
Testing does not end after integration.
Production traffic is different from test prompts. Users may send unexpected inputs. Model behavior, pricing, and availability may also change.
Track:
successful request rate
latency percentiles
cost by workflow
invalid outputs
retries and timeouts
route availability
generation failures
user corrections
Add difficult real-world examples back into the evaluation dataset.
This turns model selection into an ongoing product process instead of a one-time decision.
Where VectorNode fits
VectorNode is a pay-as-you-go multi-model AI API platform for developers building with text, image, video, and audio models.
It helps independent developers and small AI teams test and access GPT, Claude, Gemini, DeepSeek, Qwen, and hundreds of other supported models through developer-friendly APIs.
Developers can explore models, compare available options, test requests, and build AI apps, agents, RAG systems, chatbots, automation workflows, developer tools, and multimodal products without maintaining separate provider accounts, balances, and integrations for every model family.
Learn more:
https://www.vectronode.com/
Start testing with VectorNode.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>llm</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Build a Capability-Based Router for Multimodal AI Models</title>
      <dc:creator>Ye Allen</dc:creator>
      <pubDate>Mon, 15 Jun 2026 10:34:22 +0000</pubDate>
      <link>https://dev.to/ye_allen_/build-a-capability-based-router-for-multimodal-ai-models-4i2h</link>
      <guid>https://dev.to/ye_allen_/build-a-capability-based-router-for-multimodal-ai-models-4i2h</guid>
      <description>&lt;p&gt;AI applications rarely remain connected to a single model.&lt;/p&gt;

&lt;p&gt;A product may begin with text generation, then add structured output for agents, document reasoning for RAG, image generation, video creation, audio transcription, or speech synthesis.&lt;/p&gt;

&lt;p&gt;If every feature calls a provider directly, model-specific code quickly spreads across the application. Credentials, model names, request formats, timeouts, routes, and fallback behavior become mixed with product logic.&lt;/p&gt;

&lt;p&gt;A capability-based model access layer provides a cleaner alternative.&lt;/p&gt;

&lt;p&gt;Instead of asking the application to call a particular provider, each workflow requests a capability such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fast text generation&lt;/li&gt;
&lt;li&gt;document reasoning&lt;/li&gt;
&lt;li&gt;structured agent output&lt;/li&gt;
&lt;li&gt;image generation&lt;/li&gt;
&lt;li&gt;video generation&lt;/li&gt;
&lt;li&gt;audio transcription&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The access layer selects a configured model and route for that capability.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem With Direct Provider Calls
&lt;/h2&gt;

&lt;p&gt;Consider an application with several AI workflows:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
text
Support Chat -&amp;gt; Provider A
RAG Answers -&amp;gt; Provider B
Agent Tools -&amp;gt; Provider C
Image Creation -&amp;gt; Provider D
Video Creation -&amp;gt; Provider E
Audio Transcription -&amp;gt; Provider F
A direct integration may work at first, but every provider introduces additional operational details:
separate credentials
different SDKs
provider-specific model identifiers
different request schemas
inconsistent errors
separate billing accounts
different timeout behavior
different usage dashboards
Changing a model may require editing business logic instead of updating configuration.
The goal of a model access layer is to keep these details behind a stable internal interface.
Define Capabilities First
Start by describing what the product needs rather than selecting model brands immediately.
type Capability =
  | "support-chat"
  | "document-reasoning"
  | "structured-agent-output"
  | "image-generation"
  | "video-generation"
  | "audio-transcription";
Each capability can have different operational requirements.
interface CapabilityRequirements {
  streaming?: boolean;
  structuredOutput?: boolean;
  asynchronous?: boolean;
  maximumLatencyMs?: number;
  outputFormat?: string;
}
A chatbot may need streaming and low latency. An agent may require valid JSON. A video workflow will usually be asynchronous and return an asset after the job completes.
These differences should be explicit.
Create a Configurable Model Target
The application needs a configuration describing the selected model and route for every capability.
interface ModelTarget {
  model: string;
  route?: string;
  apiFormat: "openai-compatible" | "media-job" | "custom";
  timeoutMs: number;
}

type ModelAccessConfig = Record&amp;lt;Capability, {
  primary: ModelTarget;
  fallback?: ModelTarget;
}&amp;gt;;
A sample configuration could look like this:
const modelConfig: ModelAccessConfig = {
  "support-chat": {
    primary: {
      model: "configured-chat-model",
      route: "interactive",
      apiFormat: "openai-compatible",
      timeoutMs: 15_000
    }
  },

  "document-reasoning": {
    primary: {
      model: "configured-reasoning-model",
      route: "standard",
      apiFormat: "openai-compatible",
      timeoutMs: 30_000
    }
  },

  "structured-agent-output": {
    primary: {
      model: "configured-agent-model",
      apiFormat: "openai-compatible",
      timeoutMs: 30_000
    },
    fallback: {
      model: "configured-fallback-model",
      apiFormat: "openai-compatible",
      timeoutMs: 30_000
    }
  },

  "image-generation": {
    primary: {
      model: "configured-image-model",
      route: "standard",
      apiFormat: "media-job",
      timeoutMs: 120_000
    }
  },

  "video-generation": {
    primary: {
      model: "configured-video-model",
      route: "standard",
      apiFormat: "media-job",
      timeoutMs: 600_000
    }
  },

  "audio-transcription": {
    primary: {
      model: "configured-audio-model",
      apiFormat: "custom",
      timeoutMs: 120_000
    }
  }
};
Model identifiers remain configurable. Product code only refers to capabilities.
Use a Common Internal Request
The next step is to define an internal request format.
interface ModelRequest {
  capability: Capability;
  input: unknown;
  metadata?: {
    application?: string;
    environment?: string;
    userId?: string;
  };
}

interface ModelResult&amp;lt;T = unknown&amp;gt; {
  success: boolean;
  model: string;
  route?: string;
  latencyMs: number;
  output?: T;
  error?: {
    code: string;
    message: string;
    retryable: boolean;
  };
}
This is an internal contract, not a claim that every external model uses the same API.
Text, image, video, and audio models may still require different adapters.
Add Format-Specific Adapters
An OpenAI-compatible format can simplify access to many text models.
interface ModelAdapter {
  execute(
    target: ModelTarget,
    request: ModelRequest
  ): Promise&amp;lt;ModelResult&amp;gt;;
}

class OpenAICompatibleAdapter implements ModelAdapter {
  async execute(
    target: ModelTarget,
    request: ModelRequest
  ): Promise&amp;lt;ModelResult&amp;gt; {
    const startedAt = Date.now();

    const response = await fetch(
      `${process.env.AI_API_BASE_URL}/v1/chat/completions`,
      {
        method: "POST",
        headers: {
          "Authorization": `Bearer ${process.env.AI_API_KEY}`,
          "Content-Type": "application/json"
        },
        body: JSON.stringify({
          model: target.model,
          messages: request.input,
          stream: false
        }),
        signal: AbortSignal.timeout(target.timeoutMs)
      }
    );

    if (!response.ok) {
      return {
        success: false,
        model: target.model,
        route: target.route,
        latencyMs: Date.now() - startedAt,
        error: {
          code: `HTTP_${response.status}`,
          message: await response.text(),
          retryable: response.status === 429 || response.status &amp;gt;= 500
        }
      };
    }

    return {
      success: true,
      model: target.model,
      route: target.route,
      latencyMs: Date.now() - startedAt,
      output: await response.json()
    };
  }
}
OpenAI compatibility is useful, but it is only one technical format.
Media models may require asynchronous job handling:
class MediaJobAdapter implements ModelAdapter {
  async execute(
    target: ModelTarget,
    request: ModelRequest
  ): Promise&amp;lt;ModelResult&amp;gt; {
    const startedAt = Date.now();

    const job = await this.createJob(target, request);
    const output = await this.waitForCompletion(
      job.id,
      target.timeoutMs
    );

    return {
      success: true,
      model: target.model,
      route: target.route,
      latencyMs: Date.now() - startedAt,
      output
    };
  }

  private async createJob(
    target: ModelTarget,
    request: ModelRequest
  ): Promise&amp;lt;{ id: string }&amp;gt; {
    // Send the provider-specific generation request.
    throw new Error("Implement the media job request");
  }

  private async waitForCompletion(
    jobId: string,
    timeoutMs: number
  ): Promise&amp;lt;unknown&amp;gt; {
    // Poll the job until it succeeds, fails, or times out.
    throw new Error("Implement job polling");
  }
}
This separation prevents asynchronous media behavior from leaking into unrelated product workflows.
Select the Correct Adapter
The router can choose an adapter based on the configured API format.
class ModelRouter {
  constructor(
    private config: ModelAccessConfig,
    private adapters: Record&amp;lt;ModelTarget["apiFormat"], ModelAdapter&amp;gt;
  ) {}

  async execute(request: ModelRequest): Promise&amp;lt;ModelResult&amp;gt; {
    const capabilityConfig = this.config[request.capability];

    const primaryResult = await this.adapters[
      capabilityConfig.primary.apiFormat
    ].execute(capabilityConfig.primary, request);

    if (primaryResult.success || !capabilityConfig.fallback) {
      return primaryResult;
    }

    if (!primaryResult.error?.retryable) {
      return primaryResult;
    }

    return this.adapters[
      capabilityConfig.fallback.apiFormat
    ].execute(capabilityConfig.fallback, request);
  }
}
Application code now requests a capability:
const result = await router.execute({
  capability: "document-reasoning",
  input: [
    {
      role: "user",
      content: "Summarize the retrieved documents."
    }
  ],
  metadata: {
    application: "knowledge-assistant",
    environment: "production"
  }
});
The workflow does not need to know which model or route handled the request.
Be Careful With Fallbacks
Fallback logic should not retry every failure automatically.
A request may fail because of:
invalid authentication
unsupported parameters
malformed input
rate limits
temporary availability problems
timeouts
provider errors
failed content validation
Only retry failures that are genuinely temporary.
Fallback models may also produce different output structures. Agent and structured-output workflows should validate the result before returning it to the application.
function isValidAgentOutput(value: unknown): boolean {
  if (typeof value !== "object" || value === null) {
    return false;
  }

  return "action" in value &amp;amp;&amp;amp; "arguments" in value;
}
Fallbacks improve resilience only when their behavior is tested and observable.
Record Every Decision
The access layer should generate a usage record for every request.
interface UsageRecord {
  timestamp: string;
  application?: string;
  capability: Capability;
  model: string;
  route?: string;
  success: boolean;
  latencyMs: number;
  estimatedCost?: number;
  errorCode?: string;
}
These records make it possible to compare:
success rate by capability
latency by model and route
spending by workflow
timeout frequency
fallback frequency
generation failures
invalid structured outputs
Without this information, model selection becomes guesswork.
Test Before Production
Before sending important workloads through the router, create evaluation cases based on realistic product inputs.
For every capability, test:
output quality
request success rate
latency distribution
error behavior
API parameter support
route availability
estimated cost
fallback compatibility
Text benchmarks alone are not sufficient for multimodal applications.
Image, video, and audio workflows should also record resolution, duration, output format, job completion time, and asset retrieval behavior.
Run a controlled pilot before moving production traffic.
Where VectorNode Fits
VectorNode is a pay-as-you-go multi-model AI API platform for independent developers and small AI teams building with text, image, video, and audio models.
It gives developers one account for testing and accessing GPT, Claude, Gemini, DeepSeek, Qwen, and hundreds of other supported models through developer-friendly APIs.
Developers can use a Playground for initial testing, compare available model and routing options, review usage records, and work with different supported API formats.
This can reduce the need to maintain a separate provider account, balance, and integration for every model family.
VectorNode can support AI applications, agents, RAG systems, chatbots, automation workflows, developer tools, and multimodal products.
Learn more:
https://www.vectronode.com/
Start testing with VectorNode.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>typescript</category>
      <category>api</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Build a Config-Driven Evaluation Harness for Multimodal AI Models</title>
      <dc:creator>Ye Allen</dc:creator>
      <pubDate>Sun, 14 Jun 2026 08:26:12 +0000</pubDate>
      <link>https://dev.to/ye_allen_/build-a-config-driven-evaluation-harness-for-multimodal-ai-models-5ano</link>
      <guid>https://dev.to/ye_allen_/build-a-config-driven-evaluation-harness-for-multimodal-ai-models-5ano</guid>
      <description>&lt;p&gt;AI applications rarely depend on a single model forever.&lt;/p&gt;

&lt;p&gt;A product may begin with text generation, then add document analysis, image creation, audio processing, video generation, or agent workflows. As these requirements grow, developers need a repeatable way to test models without scattering provider-specific logic across the codebase.&lt;/p&gt;

&lt;p&gt;This tutorial presents a simple, config-driven evaluation harness for comparing AI models by workflow.&lt;/p&gt;

&lt;p&gt;The goal is not to create a universal benchmark. It is to make model decisions measurable, repeatable, and easier to update.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Evaluation Harness Should Do
&lt;/h2&gt;

&lt;p&gt;A practical evaluation harness should be able to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;define models and routes in configuration&lt;/li&gt;
&lt;li&gt;load realistic test cases&lt;/li&gt;
&lt;li&gt;run the same workflow against multiple models&lt;/li&gt;
&lt;li&gt;record latency and success status&lt;/li&gt;
&lt;li&gt;validate structured outputs&lt;/li&gt;
&lt;li&gt;estimate or record usage cost&lt;/li&gt;
&lt;li&gt;support text and asynchronous media jobs&lt;/li&gt;
&lt;li&gt;export comparable results&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The product should not need to know which provider serves a model. It should request a capability through a common internal interface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Define the Core Types
&lt;/h2&gt;

&lt;p&gt;Start with a few TypeScript types:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;Modality&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;image&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;video&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;audio&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;ModelTarget&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;route&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;modality&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Modality&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;enabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;TestCase&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;modality&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Modality&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;requiredFields&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
    &lt;span class="nl"&gt;maxLatencyMs&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;EvaluationResult&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;testCaseId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;targetId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;route&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;latencyMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;formatValid&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;error&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;output&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These types separate three concerns:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The model being tested&lt;/li&gt;
&lt;li&gt;The product workflow&lt;/li&gt;
&lt;li&gt;The recorded result&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That separation becomes important when one model is tested across several workflows or when multiple models are evaluated for the same task.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keep Model Targets in Configuration
&lt;/h2&gt;

&lt;p&gt;Avoid hardcoding model decisions throughout the application.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;targets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ModelTarget&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;fast-support-model&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;SUPPORT_MODEL&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;configured-text-model&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;modality&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;rag-reasoning-model&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;RAG_MODEL&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;configured-reasoning-model&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;modality&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;product-image-model&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;IMAGE_MODEL&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;configured-image-model&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;modality&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;image&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model identifiers above are placeholders. In a real application, use identifiers supported by the selected AI API platform.&lt;/p&gt;

&lt;p&gt;Configuration makes it easier to test new models, compare routes, or respond to availability changes without rewriting business logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Create Workflow-Based Test Cases
&lt;/h2&gt;

&lt;p&gt;Public benchmarks are useful for discovery, but internal tests should represent the actual product.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;testCases&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;TestCase&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;support-001&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;support_chat&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;modality&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Explain how to reset an API credential safely.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;maxLatencyMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;agent-001&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;agent_structured_output&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;modality&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Return a support ticket with title, priority, and summary.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;requiredFields&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;title&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;priority&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;summary&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="na"&gt;maxLatencyMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;image-001&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;product_image&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;modality&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;image&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;A clean studio product image on a neutral background&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;maxLatencyMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;60000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A useful dataset should contain normal requests, difficult inputs, formatting requirements, multilingual examples, and known failure cases.&lt;/p&gt;

&lt;p&gt;Start with 10 to 30 examples for each important workflow. A small, relevant dataset is more useful than a large collection of unrelated prompts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Create an Adapter Interface
&lt;/h2&gt;

&lt;p&gt;Text, image, video, and audio APIs may use different request formats. Hide those differences behind an adapter.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;ModelAdapter&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ModelTarget&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;test&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;TestCase&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A text adapter could use a familiar chat-completion format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TextModelAdapter&lt;/span&gt; &lt;span class="k"&gt;implements&lt;/span&gt; &lt;span class="nx"&gt;ModelAdapter&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="nx"&gt;baseUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="nx"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ModelTarget&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;test&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;TestCase&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;baseUrl&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/chat/completions`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;Authorization&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;...(&lt;/span&gt;&lt;span class="nx"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Request failed with status &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set &lt;code&gt;baseUrl&lt;/code&gt;, credentials, model names, and endpoint paths according to the documentation of the platform being used.&lt;/p&gt;

&lt;p&gt;OpenAI-compatible request formats can simplify many text integrations because familiar SDKs and tools may already support them. However, image, video, audio, and specialized models may require separate endpoints or asynchronous processing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Run and Record Each Evaluation
&lt;/h2&gt;

&lt;p&gt;The runner measures latency and captures errors without stopping the entire test suite.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;adapter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ModelAdapter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ModelTarget&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;test&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;TestCase&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;EvaluationResult&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;startedAt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;performance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;adapter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;test&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;latencyMs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;performance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;startedAt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;testCaseId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;targetId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;route&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;route&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;latencyMs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;formatValid&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;validateOutput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;testCaseId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;targetId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;route&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;route&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;latencyMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;performance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;startedAt&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;formatValid&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="k"&gt;instanceof&lt;/span&gt; &lt;span class="nb"&gt;Error&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Unknown error&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The validation function can begin simply:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;validateOutput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;TestCase&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;expected&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;requiredFields&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;requiredFields&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;every&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;field&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;field&lt;/span&gt; &lt;span class="k"&gt;in &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For production evaluations, add schema validation with a library such as Zod or JSON Schema.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handle Asynchronous Media Jobs
&lt;/h2&gt;

&lt;p&gt;Video and some image or audio APIs may return a job identifier instead of the final asset.&lt;/p&gt;

&lt;p&gt;The adapter should then:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Submit the generation request.&lt;/li&gt;
&lt;li&gt;Store the returned job ID.&lt;/li&gt;
&lt;li&gt;Poll the documented status endpoint.&lt;/li&gt;
&lt;li&gt;Stop after a configured timeout.&lt;/li&gt;
&lt;li&gt;Record the completion time.&lt;/li&gt;
&lt;li&gt;Save the resulting asset URL and metadata.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Media evaluation records may include:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;MediaMetadata&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;width&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;height&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;durationSeconds&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;format&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;jobCompletionMs&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;assetUrl&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes it possible to compare operational behavior as well as creative quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compare Models by Workflow
&lt;/h2&gt;

&lt;p&gt;Do not produce one global model ranking.&lt;/p&gt;

&lt;p&gt;A model that performs well for document reasoning may not be the best option for support chat. A high-quality image model may be too slow for an interactive editing workflow.&lt;/p&gt;

&lt;p&gt;Summarize results by workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;WorkflowSummary&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;successRate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;averageLatencyMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;formatSuccessRate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;averageCost&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final selection should balance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;output quality&lt;/li&gt;
&lt;li&gt;successful request rate&lt;/li&gt;
&lt;li&gt;response latency&lt;/li&gt;
&lt;li&gt;formatting reliability&lt;/li&gt;
&lt;li&gt;route availability&lt;/li&gt;
&lt;li&gt;usage cost&lt;/li&gt;
&lt;li&gt;workflow requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Keep the selected model and route configurable after evaluation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Continue Testing After Launch
&lt;/h2&gt;

&lt;p&gt;Initial evaluation is only the beginning.&lt;/p&gt;

&lt;p&gt;Production traffic will expose new inputs, failure patterns, and user expectations. Add difficult production examples back into the test dataset and rerun them when model settings change.&lt;/p&gt;

&lt;p&gt;Useful production metrics include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;request success rate&lt;/li&gt;
&lt;li&gt;latency percentiles&lt;/li&gt;
&lt;li&gt;cost by workflow&lt;/li&gt;
&lt;li&gt;invalid structured outputs&lt;/li&gt;
&lt;li&gt;timeout and retry frequency&lt;/li&gt;
&lt;li&gt;media generation failures&lt;/li&gt;
&lt;li&gt;route availability&lt;/li&gt;
&lt;li&gt;user corrections&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This creates an evaluation process based on real product behavior instead of a one-time demonstration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using VectorNode for Model Evaluation
&lt;/h2&gt;

&lt;p&gt;VectorNode is a pay-as-you-go multi-model AI API platform for independent developers and small AI teams building with text, image, video, and audio models.&lt;/p&gt;

&lt;p&gt;It provides one account for testing and accessing GPT, Claude, Gemini, DeepSeek, Qwen, and hundreds of other supported models through developer-friendly APIs.&lt;/p&gt;

&lt;p&gt;Developers can use its Playground for initial testing, compare available models and routes, and then move representative evaluations into their own test harness.&lt;/p&gt;

&lt;p&gt;This approach is useful for AI applications, agents, RAG systems, chatbots, automation workflows, developer tools, and multimodal products.&lt;/p&gt;

&lt;p&gt;Learn more:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.vectronode.com/" rel="noopener noreferrer"&gt;https://www.vectronode.com/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Start testing with VectorNode.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>typescript</category>
      <category>api</category>
      <category>webdev</category>
    </item>
    <item>
      <title>How to Build AI Workflows with Unified Model Access</title>
      <dc:creator>Ye Allen</dc:creator>
      <pubDate>Fri, 12 Jun 2026 13:17:41 +0000</pubDate>
      <link>https://dev.to/ye_allen_/how-to-build-ai-workflows-with-unified-model-access-2alg</link>
      <guid>https://dev.to/ye_allen_/how-to-build-ai-workflows-with-unified-model-access-2alg</guid>
      <description>&lt;p&gt;AI applications often begin with a single model call.&lt;/p&gt;

&lt;p&gt;A developer sends a prompt, receives a response, and builds the first working feature. This is the right way to prototype quickly.&lt;/p&gt;

&lt;p&gt;But production AI products usually do not stay that simple.&lt;/p&gt;

&lt;p&gt;A chatbot may need fast responses. A RAG system may need stronger reasoning over documents. An AI agent may need reliable tool use. A developer tool may need better coding behavior. An automation workflow may need predictable structured output.&lt;/p&gt;

&lt;p&gt;These workflows have different requirements.&lt;/p&gt;

&lt;p&gt;That is why developers need a better way to organize model access.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start with the workflow
&lt;/h2&gt;

&lt;p&gt;Before choosing a model, it helps to define the workflow.&lt;/p&gt;

&lt;p&gt;For example, an AI product may include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;support chat&lt;/li&gt;
&lt;li&gt;document Q&amp;amp;A&lt;/li&gt;
&lt;li&gt;content generation&lt;/li&gt;
&lt;li&gt;code assistance&lt;/li&gt;
&lt;li&gt;agent planning&lt;/li&gt;
&lt;li&gt;structured extraction&lt;/li&gt;
&lt;li&gt;workflow automation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each workflow may need a different model behavior.&lt;/p&gt;

&lt;p&gt;A support chat workflow may prioritize latency. A document Q&amp;amp;A workflow may prioritize reasoning. An automation workflow may prioritize structured output. A developer tool may prioritize code quality.&lt;/p&gt;

&lt;p&gt;The application should not treat every AI request as the same type of task.&lt;/p&gt;

&lt;h2&gt;
  
  
  Create a model access layer
&lt;/h2&gt;

&lt;p&gt;A practical pattern is to place a model access layer between the product and the model provider.&lt;/p&gt;

&lt;p&gt;Instead of calling a model directly from every feature, the application calls an internal AI layer.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
js
const result = await ai.run({
  workflow: "support_chat",
  input: userMessage
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>llm</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Building AI Apps with a Model Access Layer</title>
      <dc:creator>Ye Allen</dc:creator>
      <pubDate>Thu, 11 Jun 2026 12:04:28 +0000</pubDate>
      <link>https://dev.to/ye_allen_/building-ai-apps-with-a-model-access-layer-mip</link>
      <guid>https://dev.to/ye_allen_/building-ai-apps-with-a-model-access-layer-mip</guid>
      <description>&lt;p&gt;AI applications usually start with one model.&lt;/p&gt;

&lt;p&gt;That is normal.&lt;/p&gt;

&lt;p&gt;A developer may begin with one chat completion endpoint, one SDK, one model name, and one simple use case. The first version of the product works. A chatbot replies. A RAG system answers questions. An internal tool summarizes documents. An automation workflow generates structured output.&lt;/p&gt;

&lt;p&gt;But once the product becomes real, the model layer often becomes more complicated.&lt;/p&gt;

&lt;p&gt;One workflow may need fast responses. Another may need stronger reasoning. A customer-facing chatbot may need stable latency. A document workflow may need longer context. An agent may need reliable tool use. A content workflow may need different generation styles.&lt;/p&gt;

&lt;p&gt;At that point, model access becomes part of the application architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with single-model integration
&lt;/h2&gt;

&lt;p&gt;Single-model integration is easy to start with, but it can become limiting later.&lt;/p&gt;

&lt;p&gt;The application code may contain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model names spread across different files&lt;/li&gt;
&lt;li&gt;provider-specific request formats&lt;/li&gt;
&lt;li&gt;hardcoded base URLs&lt;/li&gt;
&lt;li&gt;duplicated retry logic&lt;/li&gt;
&lt;li&gt;unclear usage tracking&lt;/li&gt;
&lt;li&gt;no simple way to compare model behavior&lt;/li&gt;
&lt;li&gt;no clean path for adding another model later&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This does not always matter in a prototype.&lt;/p&gt;

&lt;p&gt;But for a real AI product, it can slow down development.&lt;/p&gt;

&lt;p&gt;If every workflow depends directly on one model integration, changing model strategy becomes harder than it should be.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is a model access layer?
&lt;/h2&gt;

&lt;p&gt;A model access layer is a clean boundary between your application logic and the AI models it uses.&lt;/p&gt;

&lt;p&gt;Instead of letting every feature call a model directly, the application sends requests through a controlled access layer.&lt;/p&gt;

&lt;p&gt;That layer can handle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model selection&lt;/li&gt;
&lt;li&gt;request formatting&lt;/li&gt;
&lt;li&gt;model switching&lt;/li&gt;
&lt;li&gt;usage tracking&lt;/li&gt;
&lt;li&gt;latency monitoring&lt;/li&gt;
&lt;li&gt;fallback behavior&lt;/li&gt;
&lt;li&gt;workflow-specific configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This does not need to be complicated.&lt;/p&gt;

&lt;p&gt;The goal is simple: keep product logic separate from model access.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters for AI apps
&lt;/h2&gt;

&lt;p&gt;Modern AI products are not all the same.&lt;/p&gt;

&lt;p&gt;A chatbot, an agent, a RAG app, and an automation workflow may all use language models, but they do not have the same requirements.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A chatbot needs fast and stable conversations.&lt;/li&gt;
&lt;li&gt;A RAG app needs strong document reasoning.&lt;/li&gt;
&lt;li&gt;An AI agent needs tool-use reliability.&lt;/li&gt;
&lt;li&gt;An automation workflow needs structured output.&lt;/li&gt;
&lt;li&gt;A developer tool may need better coding performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If these workflows are forced into the same model path, the product becomes harder to improve.&lt;/p&gt;

&lt;p&gt;A model access layer gives each workflow room to use the model that fits its job.&lt;/p&gt;

&lt;h2&gt;
  
  
  A simple architecture
&lt;/h2&gt;

&lt;p&gt;A practical architecture may look like this:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
js
const response = await aiClient.chat.completions.create({
  model: workflowModel("support_chat"),
  messages,
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>llm</category>
      <category>webdev</category>
    </item>
    <item>
      <title>How to Evaluate AI Models by Workflow in a Real App</title>
      <dc:creator>Ye Allen</dc:creator>
      <pubDate>Wed, 10 Jun 2026 06:40:29 +0000</pubDate>
      <link>https://dev.to/ye_allen_/how-to-evaluate-ai-models-by-workflow-in-a-real-app-1ohh</link>
      <guid>https://dev.to/ye_allen_/how-to-evaluate-ai-models-by-workflow-in-a-real-app-1ohh</guid>
      <description>&lt;p&gt;AI applications often begin with one model and one prompt.&lt;/p&gt;

&lt;p&gt;That is fine for a prototype. But real products usually grow into multiple workflows: support chat, RAG answers, document summaries, structured data extraction, agent planning, content generation, and automation tasks.&lt;/p&gt;

&lt;p&gt;Each workflow may need different model behavior.&lt;/p&gt;

&lt;p&gt;A support workflow may need speed. A RAG workflow may need stronger reasoning over retrieved context. A JSON extraction workflow may need reliable structure. An AI agent may need planning and tool-use consistency.&lt;/p&gt;

&lt;p&gt;This is why developers should evaluate AI models by workflow, not by model popularity alone.&lt;/p&gt;

&lt;p&gt;VectorNode is an AI model access platform for developers, AI builders, and automation workflows. It helps teams access GPT, Claude, Gemini, DeepSeek, Qwen, and more through a unified, OpenAI-compatible API.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.vectronode.com/" rel="noopener noreferrer"&gt;https://www.vectronode.com/&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why workflow-based evaluation matters
&lt;/h2&gt;

&lt;p&gt;The question should not only be:&lt;/p&gt;

&lt;p&gt;Which model is best?&lt;/p&gt;

&lt;p&gt;A better question is:&lt;/p&gt;

&lt;p&gt;Which model is best for this workflow?&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workflow&lt;/th&gt;
&lt;th&gt;What matters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Support chat&lt;/td&gt;
&lt;td&gt;latency, tone, consistency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RAG answers&lt;/td&gt;
&lt;td&gt;context use, grounding, clarity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JSON extraction&lt;/td&gt;
&lt;td&gt;schema validity, repeatability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent planning&lt;/td&gt;
&lt;td&gt;reasoning, next-step quality&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Content generation&lt;/td&gt;
&lt;td&gt;structure, style, usefulness&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Automation tasks&lt;/td&gt;
&lt;td&gt;reliability, predictable output&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A model that works well for one workflow may not be the best choice for another.&lt;/p&gt;

&lt;h2&gt;
  
  
  A simple evaluation structure
&lt;/h2&gt;

&lt;p&gt;Start by defining the workflows in your product.&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
js
const workflows = {
  support_chat: {
    goal: "Answer common user questions quickly",
    checks: ["latency", "clarity", "tone"]
  },
  rag_answer: {
    goal: "Answer using retrieved context",
    checks: ["grounding", "completeness", "source relevance"]
  },
  json_extraction: {
    goal: "Return structured JSON",
    checks: ["schema validity", "field accuracy"]
  },
  agent_planning: {
    goal: "Plan the next action",
    checks: ["reasoning", "tool-use fit"]
  }
};
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>llm</category>
      <category>automation</category>
    </item>
    <item>
      <title>Building AI Automation Workflows with One Model Access Layer</title>
      <dc:creator>Ye Allen</dc:creator>
      <pubDate>Tue, 09 Jun 2026 05:56:40 +0000</pubDate>
      <link>https://dev.to/ye_allen_/building-ai-automation-workflows-with-one-model-access-layer-4hk5</link>
      <guid>https://dev.to/ye_allen_/building-ai-automation-workflows-with-one-model-access-layer-4hk5</guid>
      <description>&lt;p&gt;Modern AI automation workflows rarely stay simple for long.&lt;/p&gt;

&lt;p&gt;A small internal tool may start with one model and one prompt. A few weeks later, the same product may need faster responses for chat, stronger reasoning for planning, better structured output for data extraction, and different model behavior for multilingual users.&lt;/p&gt;

&lt;p&gt;That is where many developers start to feel the limits of a single-model setup.&lt;/p&gt;

&lt;p&gt;Instead of wiring every workflow directly to one model provider, it can be cleaner to design a model access layer. This layer gives the application one place to manage model access, routing, testing, usage, and future changes.&lt;/p&gt;

&lt;p&gt;VectorNode is an AI model access platform for developers, AI builders, and automation workflows. It helps teams access GPT, Claude, Gemini, DeepSeek, Qwen, and more through a unified, OpenAI-compatible API.&lt;/p&gt;

&lt;p&gt;Website: &lt;a href="https://www.vectronode.com/" rel="noopener noreferrer"&gt;https://www.vectronode.com/&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem with Direct Model Integration
&lt;/h2&gt;

&lt;p&gt;Direct model integration is simple at the beginning.&lt;/p&gt;

&lt;p&gt;You choose a model, add an SDK, write a prompt, and ship the feature.&lt;/p&gt;

&lt;p&gt;For example, an automation workflow might do this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Receive a customer message.&lt;/li&gt;
&lt;li&gt;Send the message to an AI model.&lt;/li&gt;
&lt;li&gt;Generate a response.&lt;/li&gt;
&lt;li&gt;Save the result.&lt;/li&gt;
&lt;li&gt;Trigger the next action.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That works when the product is small.&lt;/p&gt;

&lt;p&gt;But as the workflow grows, new questions appear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which model should handle fast classification?&lt;/li&gt;
&lt;li&gt;Which model should handle long reasoning?&lt;/li&gt;
&lt;li&gt;Which model is best for structured JSON output?&lt;/li&gt;
&lt;li&gt;Which model should be used for multilingual responses?&lt;/li&gt;
&lt;li&gt;How do we test model changes without rewriting the app?&lt;/li&gt;
&lt;li&gt;How do we compare results across different models?&lt;/li&gt;
&lt;li&gt;How do we monitor cost, latency, and output quality?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If every workflow talks directly to a different model provider, the application can become harder to maintain.&lt;/p&gt;

&lt;p&gt;The problem is not just API access. The real problem is operational flexibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is a Model Access Layer?
&lt;/h2&gt;

&lt;p&gt;A model access layer is a dedicated part of the application that manages how the product connects to AI models.&lt;/p&gt;

&lt;p&gt;Instead of spreading model logic across many files, the application sends AI tasks through one internal layer.&lt;/p&gt;

&lt;p&gt;That layer can manage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API credentials&lt;/li&gt;
&lt;li&gt;base URL configuration&lt;/li&gt;
&lt;li&gt;model names&lt;/li&gt;
&lt;li&gt;prompt templates&lt;/li&gt;
&lt;li&gt;routing rules&lt;/li&gt;
&lt;li&gt;fallback behavior&lt;/li&gt;
&lt;li&gt;usage logging&lt;/li&gt;
&lt;li&gt;latency tracking&lt;/li&gt;
&lt;li&gt;response format validation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This approach helps developers separate product logic from model access logic.&lt;/p&gt;

&lt;p&gt;The product can say:&lt;/p&gt;

&lt;p&gt;“Generate a support answer.”&lt;/p&gt;

&lt;p&gt;The model access layer decides:&lt;/p&gt;

&lt;p&gt;“Use this model, this prompt, this timeout, this response format, and this logging rule.”&lt;/p&gt;

&lt;p&gt;That separation becomes more valuable as AI workflows become more complex.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Automation Workflows Need Flexibility
&lt;/h2&gt;

&lt;p&gt;Automation workflows are different from simple chat interfaces.&lt;/p&gt;

&lt;p&gt;A chatbot mainly responds to users. An automation workflow may trigger multiple AI steps in sequence.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Classify an incoming message.&lt;/li&gt;
&lt;li&gt;Extract structured fields.&lt;/li&gt;
&lt;li&gt;Decide whether human review is needed.&lt;/li&gt;
&lt;li&gt;Generate a reply.&lt;/li&gt;
&lt;li&gt;Summarize the interaction.&lt;/li&gt;
&lt;li&gt;Update a CRM.&lt;/li&gt;
&lt;li&gt;Trigger another workflow.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each step may have different model requirements.&lt;/p&gt;

&lt;p&gt;Classification may need speed. Extraction may need consistent JSON. Planning may need stronger reasoning. Summarization may need a larger context window. Multilingual responses may need a different model.&lt;/p&gt;

&lt;p&gt;Using one model for everything is possible, but it may not always be the best architecture.&lt;/p&gt;

&lt;p&gt;A unified AI API gives developers a cleaner way to test different models for different workflow steps without rebuilding the integration each time.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Practical Integration Pattern
&lt;/h2&gt;

&lt;p&gt;A simple pattern is to keep all model configuration outside the business logic.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
js
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.VECTORNODE_API_KEY,
  baseURL: process.env.VECTORNODE_BASE_URL
});

export async function runAIWorkflowStep({
  model,
  systemPrompt,
  userInput
}) {
  const response = await client.chat.completions.create({
    model,
    messages: [
      { role: "system", content: systemPrompt },
      { role: "user", content: userInput }
    ]
  });

  return response.choices[0].message.content;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>llm</category>
      <category>automation</category>
    </item>
    <item>
      <title>Building Model-Agnostic AI Apps with One API Layer</title>
      <dc:creator>Ye Allen</dc:creator>
      <pubDate>Mon, 08 Jun 2026 06:22:38 +0000</pubDate>
      <link>https://dev.to/ye_allen_/building-model-agnostic-ai-apps-with-one-api-layer-g6p</link>
      <guid>https://dev.to/ye_allen_/building-model-agnostic-ai-apps-with-one-api-layer-g6p</guid>
      <description>&lt;p&gt;AI applications should not be locked too tightly to one model.&lt;/p&gt;

&lt;p&gt;That does not mean every product needs many models on day one. A prototype can start with one model and one simple request. That is often the fastest way to test an idea.&lt;/p&gt;

&lt;p&gt;But once an AI feature becomes part of a real product, the architecture starts to matter.&lt;/p&gt;

&lt;p&gt;A chatbot may need fast answers. A RAG workflow may need stronger reasoning over retrieved documents. An AI agent may need planning and tool use. A content system may need long-form writing. A developer tool may need stronger code understanding. An automation workflow may need reliable structured output.&lt;/p&gt;

&lt;p&gt;These use cases do not always need the same model.&lt;/p&gt;

&lt;p&gt;This is why developers should think about model-agnostic AI app architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  What model-agnostic means
&lt;/h2&gt;

&lt;p&gt;A model-agnostic AI app is not designed around one fixed model connection.&lt;/p&gt;

&lt;p&gt;Instead, the product separates application logic from model access logic.&lt;/p&gt;

&lt;p&gt;A simple structure looks like this:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
text
Product feature
  -&amp;gt; AI service layer
  -&amp;gt; Model access layer
  -&amp;gt; Selected AI model
  -&amp;gt; Response parser
  -&amp;gt; Product result
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>llm</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Building AI Automation Workflows with a Unified Model Access Layer</title>
      <dc:creator>Ye Allen</dc:creator>
      <pubDate>Thu, 04 Jun 2026 07:38:08 +0000</pubDate>
      <link>https://dev.to/ye_allen_/building-ai-automation-workflows-with-a-unified-model-access-layer-1jlc</link>
      <guid>https://dev.to/ye_allen_/building-ai-automation-workflows-with-a-unified-model-access-layer-1jlc</guid>
      <description>&lt;p&gt;AI automation workflows are becoming more common in developer products.&lt;/p&gt;

&lt;p&gt;A team may use AI to summarize support tickets, classify leads, draft internal reports, enrich CRM records, generate structured JSON, or power an agent that calls other tools.&lt;/p&gt;

&lt;p&gt;At first, many of these workflows begin with one model and one simple API call.&lt;/p&gt;

&lt;p&gt;That works for a prototype.&lt;/p&gt;

&lt;p&gt;But as the workflow becomes part of a real product, developers usually need more control. Different automation steps may need different model behavior. Some tasks need speed. Some need stronger reasoning. Some need better structured output. Some need multilingual responses. Some need stable formatting that can be passed into another system.&lt;/p&gt;

&lt;p&gt;This is where a unified model access layer becomes useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with one fixed AI model path
&lt;/h2&gt;

&lt;p&gt;A simple AI workflow might look like this:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
text
Trigger
  -&amp;gt; Send prompt to one model
  -&amp;gt; Receive response
  -&amp;gt; Continue workflow
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>llm</category>
      <category>automation</category>
    </item>
    <item>
      <title>How to Evaluate AI Model Access Before Building an AI App</title>
      <dc:creator>Ye Allen</dc:creator>
      <pubDate>Wed, 03 Jun 2026 12:58:12 +0000</pubDate>
      <link>https://dev.to/ye_allen_/how-to-evaluate-ai-model-access-before-building-an-ai-app-11oo</link>
      <guid>https://dev.to/ye_allen_/how-to-evaluate-ai-model-access-before-building-an-ai-app-11oo</guid>
      <description>&lt;p&gt;AI products rarely stay simple for long.&lt;/p&gt;

&lt;p&gt;A prototype may start with one model and one prompt. But once the product becomes a real application, the requirements change. A chatbot needs fast responses. A RAG app needs stronger reasoning over retrieved documents. An AI agent needs planning, tool use, and structured output. An automation workflow may need repeatable text generation across many small tasks.&lt;/p&gt;

&lt;p&gt;That is why developers should evaluate AI model access before they build too much application logic around one model.&lt;/p&gt;

&lt;p&gt;This article explains a practical way to think about AI model access for production apps, agents, RAG systems, chatbots, and automation workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with choosing one model too early
&lt;/h2&gt;

&lt;p&gt;A common mistake is to pick one model at the beginning and build the whole product around it.&lt;/p&gt;

&lt;p&gt;That can work for a demo, but real products usually need different model behavior in different places.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A support chatbot may need speed and stable tone.&lt;/li&gt;
&lt;li&gt;A RAG system may need stronger reasoning over long context.&lt;/li&gt;
&lt;li&gt;An AI agent may need better instruction following.&lt;/li&gt;
&lt;li&gt;A coding assistant may need stronger programming ability.&lt;/li&gt;
&lt;li&gt;An automation workflow may need predictable structured output.&lt;/li&gt;
&lt;li&gt;A multilingual app may need better language coverage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are different workloads. They should not always be evaluated with the same prompt, the same model, or the same success metric.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start from workflows, not model names
&lt;/h2&gt;

&lt;p&gt;Instead of asking “Which model is best?”, ask a better question:&lt;/p&gt;

&lt;p&gt;“What does this workflow need to do?”&lt;/p&gt;

&lt;p&gt;A simple workflow map may look like this:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
text
support_chat        -&amp;gt; fast answers and stable tone
rag_answer          -&amp;gt; reasoning over retrieved context
agent_planning      -&amp;gt; instruction following and step planning
content_draft       -&amp;gt; repeatable text generation
code_helper         -&amp;gt; programming help and explanation quality
json_output         -&amp;gt; reliable structured output
multilingual_reply  -&amp;gt; language quality and consistency
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>llm</category>
      <category>automation</category>
    </item>
    <item>
      <title>How to Connect AI Models to Automation Workflows with One API</title>
      <dc:creator>Ye Allen</dc:creator>
      <pubDate>Tue, 02 Jun 2026 09:29:12 +0000</pubDate>
      <link>https://dev.to/ye_allen_/how-to-connect-ai-models-to-automation-workflows-with-one-api-3a49</link>
      <guid>https://dev.to/ye_allen_/how-to-connect-ai-models-to-automation-workflows-with-one-api-3a49</guid>
      <description>&lt;p&gt;Modern automation workflows rarely stop at one AI model.&lt;/p&gt;

&lt;p&gt;A product team may use one model for customer support, another for document analysis, another for code-related tasks, and another for multilingual content generation. A solo builder may connect AI models to n8n, internal tools, chatbots, or background jobs. An AI app may need to test GPT, Claude, Gemini, DeepSeek, Qwen, and other models before choosing the best option for each workflow.&lt;/p&gt;

&lt;p&gt;The challenge is not only access to models. The bigger challenge is organizing model access in a way that is stable, testable, and easy to maintain.&lt;/p&gt;

&lt;p&gt;This is where an AI model access platform becomes useful.&lt;/p&gt;

&lt;p&gt;VectorNode is an AI model access platform for developers, AI builders, and automation workflows. It helps teams access GPT, Claude, Gemini, DeepSeek, Qwen, and more through one unified API.&lt;/p&gt;

&lt;p&gt;Website: &lt;a href="https://www.vectronode.com/" rel="noopener noreferrer"&gt;https://www.vectronode.com/&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why automation workflows need flexible model access
&lt;/h2&gt;

&lt;p&gt;Automation workflows are different from simple chat demos.&lt;/p&gt;

&lt;p&gt;A real workflow may include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reading incoming messages&lt;/li&gt;
&lt;li&gt;summarizing documents&lt;/li&gt;
&lt;li&gt;extracting structured data&lt;/li&gt;
&lt;li&gt;classifying support tickets&lt;/li&gt;
&lt;li&gt;generating replies&lt;/li&gt;
&lt;li&gt;routing tasks to different systems&lt;/li&gt;
&lt;li&gt;calling tools or APIs&lt;/li&gt;
&lt;li&gt;checking output quality&lt;/li&gt;
&lt;li&gt;retrying failed steps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each step may need a different type of model behavior.&lt;/p&gt;

&lt;p&gt;For example, a lightweight model may be enough for classification. A stronger reasoning model may be better for multi-step planning. A coding-focused model may help with technical support. A multilingual model may work better for global users.&lt;/p&gt;

&lt;p&gt;If every model uses a separate integration path, the workflow becomes harder to maintain. Developers need to manage different credentials, request formats, error behavior, logging, and testing processes.&lt;/p&gt;

&lt;p&gt;A unified API can reduce this complexity.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with single-model workflows
&lt;/h2&gt;

&lt;p&gt;Many AI workflows start with one model because it is simpler.&lt;/p&gt;

&lt;p&gt;That is usually fine for a prototype. But as the workflow grows, several problems appear:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The selected model may not be best for every task.&lt;/li&gt;
&lt;li&gt;The team has no easy way to compare model output.&lt;/li&gt;
&lt;li&gt;Cost and latency are harder to optimize.&lt;/li&gt;
&lt;li&gt;Fallback behavior is difficult to design.&lt;/li&gt;
&lt;li&gt;Changing models may require code changes.&lt;/li&gt;
&lt;li&gt;Monitoring becomes fragmented across providers.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is especially important for automation builders. A workflow that runs hundreds or thousands of times should not depend on assumptions that were made during the first prototype.&lt;/p&gt;

&lt;p&gt;A better approach is to separate product logic from model access.&lt;/p&gt;

&lt;h2&gt;
  
  
  A better structure: one model access layer
&lt;/h2&gt;

&lt;p&gt;Instead of connecting every workflow step directly to a specific model provider, developers can use a model access layer.&lt;/p&gt;

&lt;p&gt;The product or automation workflow sends requests to one API layer. That API layer manages access to multiple models.&lt;/p&gt;

&lt;p&gt;A simplified structure looks like this:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
text
Automation workflow
  -&amp;gt; Model access layer
    -&amp;gt; GPT
    -&amp;gt; Claude
    -&amp;gt; Gemini
    -&amp;gt; DeepSeek
    -&amp;gt; Qwen
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>llm</category>
      <category>automation</category>
    </item>
  </channel>
</rss>
