Build a Capability-Based Router for Multimodal AI Models

#ai #typescript #api #webdev

AI applications rarely remain connected to a single model.

A product may begin with text generation, then add structured output for agents, document reasoning for RAG, image generation, video creation, audio transcription, or speech synthesis.

If every feature calls a provider directly, model-specific code quickly spreads across the application. Credentials, model names, request formats, timeouts, routes, and fallback behavior become mixed with product logic.

A capability-based model access layer provides a cleaner alternative.

Instead of asking the application to call a particular provider, each workflow requests a capability such as:

fast text generation
document reasoning
structured agent output
image generation
video generation
audio transcription

The access layer selects a configured model and route for that capability.

The Problem With Direct Provider Calls

Consider an application with several AI workflows:


text
Support Chat -> Provider A
RAG Answers -> Provider B
Agent Tools -> Provider C
Image Creation -> Provider D
Video Creation -> Provider E
Audio Transcription -> Provider F
A direct integration may work at first, but every provider introduces additional operational details:
separate credentials
different SDKs
provider-specific model identifiers
different request schemas
inconsistent errors
separate billing accounts
different timeout behavior
different usage dashboards
Changing a model may require editing business logic instead of updating configuration.
The goal of a model access layer is to keep these details behind a stable internal interface.
Define Capabilities First
Start by describing what the product needs rather than selecting model brands immediately.
type Capability =
  | "support-chat"
  | "document-reasoning"
  | "structured-agent-output"
  | "image-generation"
  | "video-generation"
  | "audio-transcription";
Each capability can have different operational requirements.
interface CapabilityRequirements {
  streaming?: boolean;
  structuredOutput?: boolean;
  asynchronous?: boolean;
  maximumLatencyMs?: number;
  outputFormat?: string;
}
A chatbot may need streaming and low latency. An agent may require valid JSON. A video workflow will usually be asynchronous and return an asset after the job completes.
These differences should be explicit.
Create a Configurable Model Target
The application needs a configuration describing the selected model and route for every capability.
interface ModelTarget {
  model: string;
  route?: string;
  apiFormat: "openai-compatible" | "media-job" | "custom";
  timeoutMs: number;
}

type ModelAccessConfig = Record<Capability, {
  primary: ModelTarget;
  fallback?: ModelTarget;
}>;
A sample configuration could look like this:
const modelConfig: ModelAccessConfig = {
  "support-chat": {
    primary: {
      model: "configured-chat-model",
      route: "interactive",
      apiFormat: "openai-compatible",
      timeoutMs: 15_000
    }
  },

  "document-reasoning": {
    primary: {
      model: "configured-reasoning-model",
      route: "standard",
      apiFormat: "openai-compatible",
      timeoutMs: 30_000
    }
  },

  "structured-agent-output": {
    primary: {
      model: "configured-agent-model",
      apiFormat: "openai-compatible",
      timeoutMs: 30_000
    },
    fallback: {
      model: "configured-fallback-model",
      apiFormat: "openai-compatible",
      timeoutMs: 30_000
    }
  },

  "image-generation": {
    primary: {
      model: "configured-image-model",
      route: "standard",
      apiFormat: "media-job",
      timeoutMs: 120_000
    }
  },

  "video-generation": {
    primary: {
      model: "configured-video-model",
      route: "standard",
      apiFormat: "media-job",
      timeoutMs: 600_000
    }
  },

  "audio-transcription": {
    primary: {
      model: "configured-audio-model",
      apiFormat: "custom",
      timeoutMs: 120_000
    }
  }
};
Model identifiers remain configurable. Product code only refers to capabilities.
Use a Common Internal Request
The next step is to define an internal request format.
interface ModelRequest {
  capability: Capability;
  input: unknown;
  metadata?: {
    application?: string;
    environment?: string;
    userId?: string;
  };
}

interface ModelResult<T = unknown> {
  success: boolean;
  model: string;
  route?: string;
  latencyMs: number;
  output?: T;
  error?: {
    code: string;
    message: string;
    retryable: boolean;
  };
}
This is an internal contract, not a claim that every external model uses the same API.
Text, image, video, and audio models may still require different adapters.
Add Format-Specific Adapters
An OpenAI-compatible format can simplify access to many text models.
interface ModelAdapter {
  execute(
    target: ModelTarget,
    request: ModelRequest
  ): Promise<ModelResult>;
}

class OpenAICompatibleAdapter implements ModelAdapter {
  async execute(
    target: ModelTarget,
    request: ModelRequest
  ): Promise<ModelResult> {
    const startedAt = Date.now();

    const response = await fetch(
      `${process.env.AI_API_BASE_URL}/v1/chat/completions`,
      {
        method: "POST",
        headers: {
          "Authorization": `Bearer ${process.env.AI_API_KEY}`,
          "Content-Type": "application/json"
        },
        body: JSON.stringify({
          model: target.model,
          messages: request.input,
          stream: false
        }),
        signal: AbortSignal.timeout(target.timeoutMs)
      }
    );

    if (!response.ok) {
      return {
        success: false,
        model: target.model,
        route: target.route,
        latencyMs: Date.now() - startedAt,
        error: {
          code: `HTTP_${response.status}`,
          message: await response.text(),
          retryable: response.status === 429 || response.status >= 500
        }
      };
    }

    return {
      success: true,
      model: target.model,
      route: target.route,
      latencyMs: Date.now() - startedAt,
      output: await response.json()
    };
  }
}
OpenAI compatibility is useful, but it is only one technical format.
Media models may require asynchronous job handling:
class MediaJobAdapter implements ModelAdapter {
  async execute(
    target: ModelTarget,
    request: ModelRequest
  ): Promise<ModelResult> {
    const startedAt = Date.now();

    const job = await this.createJob(target, request);
    const output = await this.waitForCompletion(
      job.id,
      target.timeoutMs
    );

    return {
      success: true,
      model: target.model,
      route: target.route,
      latencyMs: Date.now() - startedAt,
      output
    };
  }

  private async createJob(
    target: ModelTarget,
    request: ModelRequest
  ): Promise<{ id: string }> {
    // Send the provider-specific generation request.
    throw new Error("Implement the media job request");
  }

  private async waitForCompletion(
    jobId: string,
    timeoutMs: number
  ): Promise<unknown> {
    // Poll the job until it succeeds, fails, or times out.
    throw new Error("Implement job polling");
  }
}
This separation prevents asynchronous media behavior from leaking into unrelated product workflows.
Select the Correct Adapter
The router can choose an adapter based on the configured API format.
class ModelRouter {
  constructor(
    private config: ModelAccessConfig,
    private adapters: Record<ModelTarget["apiFormat"], ModelAdapter>
  ) {}

  async execute(request: ModelRequest): Promise<ModelResult> {
    const capabilityConfig = this.config[request.capability];

    const primaryResult = await this.adapters[
      capabilityConfig.primary.apiFormat
    ].execute(capabilityConfig.primary, request);

    if (primaryResult.success || !capabilityConfig.fallback) {
      return primaryResult;
    }

    if (!primaryResult.error?.retryable) {
      return primaryResult;
    }

    return this.adapters[
      capabilityConfig.fallback.apiFormat
    ].execute(capabilityConfig.fallback, request);
  }
}
Application code now requests a capability:
const result = await router.execute({
  capability: "document-reasoning",
  input: [
    {
      role: "user",
      content: "Summarize the retrieved documents."
    }
  ],
  metadata: {
    application: "knowledge-assistant",
    environment: "production"
  }
});
The workflow does not need to know which model or route handled the request.
Be Careful With Fallbacks
Fallback logic should not retry every failure automatically.
A request may fail because of:
invalid authentication
unsupported parameters
malformed input
rate limits
temporary availability problems
timeouts
provider errors
failed content validation
Only retry failures that are genuinely temporary.
Fallback models may also produce different output structures. Agent and structured-output workflows should validate the result before returning it to the application.
function isValidAgentOutput(value: unknown): boolean {
  if (typeof value !== "object" || value === null) {
    return false;
  }

  return "action" in value && "arguments" in value;
}
Fallbacks improve resilience only when their behavior is tested and observable.
Record Every Decision
The access layer should generate a usage record for every request.
interface UsageRecord {
  timestamp: string;
  application?: string;
  capability: Capability;
  model: string;
  route?: string;
  success: boolean;
  latencyMs: number;
  estimatedCost?: number;
  errorCode?: string;
}
These records make it possible to compare:
success rate by capability
latency by model and route
spending by workflow
timeout frequency
fallback frequency
generation failures
invalid structured outputs
Without this information, model selection becomes guesswork.
Test Before Production
Before sending important workloads through the router, create evaluation cases based on realistic product inputs.
For every capability, test:
output quality
request success rate
latency distribution
error behavior
API parameter support
route availability
estimated cost
fallback compatibility
Text benchmarks alone are not sufficient for multimodal applications.
Image, video, and audio workflows should also record resolution, duration, output format, job completion time, and asset retrieval behavior.
Run a controlled pilot before moving production traffic.
Where VectorNode Fits
VectorNode is a pay-as-you-go multi-model AI API platform for independent developers and small AI teams building with text, image, video, and audio models.
It gives developers one account for testing and accessing GPT, Claude, Gemini, DeepSeek, Qwen, and hundreds of other supported models through developer-friendly APIs.
Developers can use a Playground for initial testing, compare available model and routing options, review usage records, and work with different supported API formats.
This can reduce the need to maintain a separate provider account, balance, and integration for every model family.
VectorNode can support AI applications, agents, RAG systems, chatbots, automation workflows, developer tools, and multimodal products.
Learn more:
https://www.vectronode.com/
Start testing with VectorNode.

DEV Community

Build a Capability-Based Router for Multimodal AI Models

The Problem With Direct Provider Calls

Top comments (0)