DEV Community: YAIT

The Complete Guide to yait_aichain's Model Registry

YAIT — Wed, 08 Jul 2026 07:19:00 +0000

Most AI frameworks make you hardcode provider details into every function call. You write openai.ChatCompletion.create(model="gpt-4") in forty places, then OpenAI deprecates the endpoint. You spend a Tuesday afternoon doing find-and-replace across your codebase. Two months later, you want to add Anthropic. Another Tuesday afternoon, gone.

The yait_aichain Model Registry exists so you never lose another Tuesday.

It's a single abstraction layer that maps logical model names to provider-specific configurations. You reference a model by its registry key — "openai/gpt4o" or "anthropic/claude-sonnet" — and the registry handles resolution, parameter defaults, and provider routing. Change one line in the registry, and every Skill, Chain, and Agent that references that model updates automatically.

This guide covers everything: how the registry works internally, how to configure it, how to register custom models, and how to use environment variables to keep secrets out of your source code.

How the Registry Fits Into the Architecture

The Model class is one of five core primitives in yait_aichain:

from yait_aichain import Model, Skill, Chain, Pool, Agent

A Skill performs a single LLM operation. A Chain sequences Skills. A Pool runs them in parallel. An Agent orchestrates dynamically. None of these do anything without a Model — it's the connection between your logic and the actual inference provider.

When you instantiate a Model, the registry resolves the model key, injects default parameters, and returns a configured object ready to pass into a Skill:

model = Model("openai/gpt4o")
skill = Skill(model=model, instruction="Summarize the input text.")
result = skill.run("Paste your long document here.")

skill.run() accepts a string and returns a string synchronously. The Skill never knows or cares which provider is behind the model — it sends a prompt, gets a response. That separation is the entire point.

The Built-In Model Registry

Out of the box, yait_aichain ships with a pre-configured registry covering the most common providers and models. You don't need to set up anything to start using them. Just reference the key.

Registry Key Format

Every model key follows the pattern provider/model_name:

Key	Provider	Underlying Model
`openai/gpt4o`	OpenAI	gpt-4o
`openai/gpt4o-mini`	OpenAI	gpt-4o-mini
`openai/gpt4-turbo`	OpenAI	gpt-4-turbo
`anthropic/claude-sonnet`	Anthropic	claude-3-5-sonnet-20241022
`anthropic/claude-haiku`	Anthropic	claude-3-5-haiku-20241022
`anthropic/claude-opus`	Anthropic	claude-3-opus-20240229
`google/gemini-pro`	Google	gemini-1.5-pro
`google/gemini-flash`	Google	gemini-1.5-flash
`mistral/mistral-large`	Mistral	mistral-large-latest

The slash is not decorative. The segment before it tells the registry which provider adapter to load; the segment after it identifies the specific model variant. This convention means the registry can route to the correct API client without any additional configuration from you.

The built-in registry always uses fully dated model names (e.g., claude-3-5-sonnet-20241022) when the provider requires them. If you need to target a different version, override the entry as shown in the custom registration section below.

Default Parameters

Each registered model carries a set of sensible defaults:

model = Model("openai/gpt4o")
# Defaults applied:
#   temperature: 0.7
#   max_tokens: 4096
#   top_p: 1.0

You override any of these at instantiation:

model = Model("openai/gpt4o", temperature=0.2, max_tokens=512)

The override applies only to that instance. The registry's defaults remain untouched for the next caller.

Configuring Models via YAML

For production systems, model configuration shouldn't be scattered across Python files. yait_aichain supports YAML-based configuration that defines your full model setup in one place.

The YAML Schema

A valid configuration file follows this structure:

version: "2.0"

models:
  openai/gpt4o:
    provider: openai
    model_name: gpt-4o
    parameters:
      temperature: 0.7
      max_tokens: 4096
      top_p: 1.0

  anthropic/claude-sonnet:
    provider: anthropic
    model_name: claude-3-5-sonnet-20241022
    parameters:
      temperature: 0.7
      max_tokens: 4096

  custom/my-finetuned:
    provider: openai
    model_name: ft:gpt-4o:my-org:custom-model:abc123
    parameters:
      temperature: 0.3
      max_tokens: 2048

The version field is required and must be "2.0" for the current release. The models block is a dictionary where each key becomes the registry key you'll reference in code.

Loading a YAML Configuration

from yait_aichain import Model

Model.load_registry("path/to/models.yaml")

model = Model("custom/my-finetuned")

When you call load_registry(), the models defined in YAML merge with the built-in registry. If a key in your YAML matches a built-in key, your configuration wins. This lets you override defaults without forking the library.

Schema Validation

The YAML loader validates your file against the expected schema on load. If you misspell a field or provide an invalid type, you get an error immediately — not five minutes into a pipeline run when the model finally gets called.

Common validation errors:

Missing version field: Every config file must declare version: "2.0".
Invalid provider value: Must match a supported provider adapter (openai, anthropic, google, mistral, or a custom-registered provider).
Parameter type mismatch: temperature must be a float between 0.0 and 2.0. max_tokens must be a positive integer.

Environment Variables

API keys and provider-specific settings should never appear in YAML files or Python source code. yait_aichain reads them from environment variables with a consistent naming convention.

Tip: Set AICHAIN_LOG_LEVEL=DEBUG whenever you're configuring the registry for the first time. The registry logs every resolution step, which makes misconfiguration obvious immediately rather than at runtime. See the Debugging section for sample output.

Required Variables by Provider

Provider	Environment Variable	Description
OpenAI	`OPENAI_API_KEY`	Your OpenAI API key
Anthropic	`ANTHROPIC_API_KEY`	Your Anthropic API key
Google	`GOOGLE_API_KEY`	Your Google AI API key
Mistral	`MISTRAL_API_KEY`	Your Mistral API key

Optional Configuration Variables

Beyond API keys, you can control library behavior through additional environment variables:

Variable	Default	Description
`AICHAIN_DEFAULT_MODEL`	`openai/gpt4o`	The model used when no model key is specified
`AICHAIN_LOG_LEVEL`	`WARNING`	Logging verbosity: `DEBUG`, `INFO`, `WARNING`, `ERROR`
`AICHAIN_TIMEOUT`	`30`	Request timeout in seconds
`AICHAIN_MAX_RETRIES`	`3`	Number of retry attempts on transient failures
`AICHAIN_REGISTRY_PATH`	`None`	Path to a YAML config file, loaded automatically on import

AICHAIN_REGISTRY_PATH is particularly useful in containerized deployments. Instead of calling Model.load_registry() in your application code, you set the environment variable and the registry configures itself when yait_aichain is first imported:

export AICHAIN_REGISTRY_PATH=/etc/aichain/models.yaml
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...

from yait_aichain import Model, Skill

# Registry already loaded from /etc/aichain/models.yaml
model = Model("custom/my-finetuned")
skill = Skill(model=model, instruction="Extract key entities from the text.")
result = skill.run("Apple introduced the M4 chip at its May 2024 iPad Pro event.")

No load_registry() call. No API key in sight. The environment handles it.

Using .env Files in Development

For local development, create a .env file in your project root:

OPENAI_API_KEY=sk-dev-abc123
ANTHROPIC_API_KEY=sk-ant-dev-xyz789
AICHAIN_LOG_LEVEL=DEBUG
AICHAIN_DEFAULT_MODEL=openai/gpt4o-mini

yait_aichain automatically detects and loads .env files from the current working directory. Add .env to your .gitignore. This is non-negotiable.

Registering Custom Models Programmatically

YAML works well for static configurations. But sometimes you need to register models at runtime — maybe you're dynamically selecting a fine-tuned model based on user input, or you're integrating a self-hosted model behind a custom API.

Basic Custom Registration

from yait_aichain import Model

Model.register(
    key="custom/llama-local",
    provider="openai",  # compatible API format
    model_name="meta-llama/Llama-3-70b",
    base_url="http://localhost:8080/v1",
    parameters={
        "temperature": 0.5,
        "max_tokens": 2048,
    }
)

model = Model("custom/llama-local")

The base_url parameter is critical for self-hosted models. If you're running vLLM, Ollama, or any OpenAI-compatible server, set provider to "openai" and point base_url to your server. The OpenAI adapter handles the rest.

Overriding Built-In Models

You can re-register a built-in key to change its behavior globally:

Model.register(
    key="openai/gpt4o",
    provider="openai",
    model_name="gpt-4o-2024-08-06",  # pin to specific version
    parameters={
        "temperature": 0.3,  # lower default for your use case
        "max_tokens": 8192,
    }
)

After this call, every Model("openai/gpt4o") instantiation in your application uses the pinned version with your custom defaults. This is how you roll out model version updates safely: change the registry, not the call sites.

Practical Patterns

Pattern 1: Environment-Based Model Selection

Run different models in development and production without changing code:

# Development
export AICHAIN_DEFAULT_MODEL=openai/gpt4o-mini

# Production
export AICHAIN_DEFAULT_MODEL=openai/gpt4o

from yait_aichain import Model, Skill

# Uses whatever AICHAIN_DEFAULT_MODEL points to
model = Model()
skill = Skill(model=model, instruction="Classify the support ticket by urgency.")
result = skill.run("My account has been locked for three days and I can't log in.")

In development you're spending fractions of a cent per call with gpt-4o-mini. In production you get the full capability of gpt-4o. Same code, different bill.

Pattern 2: Multi-Provider Fallback Chain

Use the registry to define a primary and fallback model, then build a Chain that degrades gracefully when the primary provider is unavailable:

from yait_aichain import Model, Skill, Chain

primary_model = Model("openai/gpt4o")
fallback_model = Model("anthropic/claude-sonnet")

primary_skill = Skill(
    model=primary_model,
    instruction="Generate a detailed product description."
)

fallback_skill = Skill(
    model=fallback_model,
    instruction="Generate a detailed product description."
)

chain = Chain(skills=[primary_skill, fallback_skill], mode="fallback")

try:
    result = chain.run("Noise-cancelling wireless headphones, over-ear, 30hr battery.")
except Exception as e:
    print(f"Both providers failed: {e}")

In "fallback" mode, the Chain runs primary_skill first. If it raises a provider error, the Chain automatically retries with fallback_skill. Because both models are resolved through the registry, switching providers later means changing one key — not rewriting API calls.

Pattern 3: Centralized YAML Config for Team Projects

In a shared repository, put your model configuration in a checked-in YAML file (without secrets) and let environment variables supply the keys:

# config/models.yaml
version: "2.0"

models:
  project/summarizer:
    provider: openai
    model_name: gpt-4o
    parameters:
      temperature: 0.3
      max_tokens: 1024

  project/classifier:
    provider: anthropic
    model_name: claude-3-5-haiku-20241022
    parameters:
      temperature: 0.0
      max_tokens: 128

  project/generator:
    provider: google
    model_name: gemini-1.5-pro
    parameters:
      temperature: 0.8
      max_tokens: 4096

from yait_aichain import Model, Skill

Model.load_registry("config/models.yaml")

summarizer = Skill(
    model=Model("project/summarizer"),
    instruction="Summarize in 3 bullet points."
)

classifier = Skill(
    model=Model("project/classifier"),
    instruction="Classify as positive, negative, or neutral."
)

result = summarizer.run("Your input text here.")

Every team member uses the same model configurations. Nobody accidentally runs gpt-3.5-turbo when the project requires gpt-4o. The YAML file is the single source of truth for model settings; the code is the single source of truth for behavior.

Debugging Registry Issues

When something goes wrong, set AICHAIN_LOG_LEVEL=DEBUG and check the output. The registry logs every resolution step:

DEBUG:aichain.registry: Resolving model key 'openai/gpt4o'
DEBUG:aichain.registry: Found in YAML override config
DEBUG:aichain.registry: Applied parameters: temperature=0.3, max_tokens=8192
DEBUG:aichain.registry: Provider adapter: openai
DEBUG:aichain.registry: Model instance created successfully

Common issues and their fixes:

"Model key not found" — You're referencing a key that doesn't exist in the built-in registry or your custom config. Check for typos. Keys are case-sensitive.
"Provider API key not set" — The environment variable for the provider is missing. Set OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.
"Connection refused on base_url" — For self-hosted models, verify the server is running and the URL is correct. Include the /v1 suffix if the server expects it.
"YAML validation error" — Run your YAML through a linter first. The most common culprit is tabs instead of spaces, or misaligned nested keys.

What Changed in Version 2.0

If you're migrating from an earlier version, the registry saw significant changes in 2.0:

The version: "2.0" field is now required in YAML config files. Files without it will fail validation.
Model keys now use the provider/model format consistently. Old-style keys without the slash prefix are no longer supported — a 1.x key like "gpt4o" must become "openai/gpt4o".
Model.load_registry() merges instead of replacing. In 1.x, loading a YAML file wiped the built-in registry. In 2.0, your definitions layer on top.
.env file auto-loading was added. You no longer need a third-party library to load them.
AICHAIN_REGISTRY_PATH is new. It enables zero-code registry configuration for containerized deployments.

A typical 1.x instantiation looked like this:

# yait_aichain 1.x — no longer supported
model = Model("gpt4o")

The 2.0 equivalent:

# yait_aichain 2.0
model = Model("openai/gpt4o")

Search your codebase for Model(" and confirm every key contains a slash. That's the entire migration for most projects.

Summary

The Model Registry is the configuration backbone of yait_aichain. A consistent Model("provider/name") interface, centralized YAML configuration, environment-driven secrets, and clean override semantics mean that when a provider deprecates a model version, you change one registry entry and move on. Thirty seconds, not a Tuesday afternoon.

Building a Multimodal AI Pipeline: Text Image Text Across Three Providers

YAIT — Fri, 26 Jun 2026 11:02:00 +0000

Three providers, three modalities, under 55 lines of Python — and a PNG file on disk at the end. Claude writes a sunset description, an image generation model paints it, and Qwen Vision analyzes the result. Each model does one thing well; the script wires them together.

This article walks through building exactly that pipeline using yait_aichain's Skill and Model primitives. We'll go step by step: generate text with Claude, turn that text into an image, then feed the image to Qwen Vision for analysis.

What We're Building

The pipeline has three stages:

Text → Text (Claude claude-3-5-sonnet-20241022): Generate a one-sentence description of a sunset.
Text → Image (imagine-image-pro): Turn that description into a 1024×1024 image.
Image → Text (Qwen qwen-vl-max): Feed the generated image to a vision model and ask what it sees.

Each stage uses a different provider — Anthropic, xAI, and DashScope. The output of one stage becomes the input of the next.

Prerequisites

You need three API keys, each set as an environment variable:

export ANTHROPIC_API_KEY="your-anthropic-key"
export XAI_API_KEY="your-xai-key"
export DASHSCOPE_API_KEY="your-dashscope-key"

Install the library:

pip install yait_aichain

No extra dependencies for image handling — Python's base64 and pathlib modules cover the file I/O. yait_aichain handles provider routing internally, so you won't need to install Anthropic, xAI, or DashScope SDKs separately.

The Two Primitives You Need to Know

Model represents a connection to a specific model at a specific provider. You pass the model name and an API key — no provider-specific client classes, no adapter patterns to memorize.

Skill is a single unit of work. It takes a Model, an input (structured as messages), and optionally an output configuration. Call .run() and it executes. The message format uses a parts list inside each message, which is how yait_aichain handles multimodal content uniformly — text, images, and mixed content all go through the same structure.

Stage 1: Generating Text with Claude

import os, sys, base64, pathlib
from yait_aichain import Model, Skill

text_skill = Skill(
    model = Model("claude-3-5-sonnet-20241022", api_key=os.environ["ANTHROPIC_API_KEY"]),
    input = {"messages": [{"role": "user", "parts": ["Describe a sunset in one sentence."]}]},
)

description = text_skill.run()
print(f"[text → text · Claude]\n{description}\n")

The input dictionary contains a messages list — identical in shape to what you'd see in a chat API. Each message has a role and a parts list. For plain text, parts is just a list of strings.

Notice the use of os.environ["KEY"] rather than os.getenv("KEY"). This is a deliberate choice I prefer for multi-provider scripts: os.getenv silently returns None when a key is missing, which pushes the error down to the provider's API where the message is far less useful. os.environ raises a KeyError immediately with the variable name. When you're juggling three different API keys for the first time, you want to know which one is missing.

text_skill.run() returns the model's response as a string. On a typical call, you'll get something like:

"The sun melted into the horizon, painting the sky in layered bands of amber, rose, and deep violet as the ocean mirrored its fading warmth."

That string becomes the input for Stage 2.

Why `parts` Instead of `content`?

The parts list is the design decision that makes multimodal work without special-casing. A text-only message uses ["some string"]. A message with an image uses a dictionary inside parts. A message with both uses both. Same field, same structure, every modality.

Stage 2: Turning Text Into an Image

We take Claude's text output and pass it to the image generation model as a prompt:

image_skill = Skill(
    model  = Model("imagine-image-pro", api_key=os.environ["XAI_API_KEY"]),
    input  = {"messages": [{"role": "user", "parts": [description]}]},
    output = {"modalities": ["image"], "format": {"type": "image", "size": "1024x1024"}},
)

image    = image_skill.run()
img_path = pathlib.Path("output_sunset.png")
img_path.write_bytes(base64.b64decode(image["base64"]))
print(f"[text → image]\nsaved → {img_path}\n")

Two things to notice here.

The output configuration. This is the first time we specify how the response should come back. "modalities": ["image"] tells the Skill we expect an image. The "format" dictionary specifies the type and dimensions. Without this, the model might return text describing how it would generate an image — which is not helpful.

The return value. When a Skill produces an image, .run() returns a dictionary with at least two keys: "base64" (the image data) and "mime_type" (e.g., "image/png"). We decode the base64 data and write it to disk.

pathlib.Path("output_sunset.png") writes to the current working directory rather than using __file__. That's deliberate — __file__ is undefined in interactive environments like Jupyter notebooks or a REPL and raises a NameError. A relative path works consistently across all contexts.

A Note on Image Sizes

"1024x1024" is a common default for image generation models. If you pass a size the model doesn't support, you'll get an error at runtime rather than a silently resized image. Check your provider's documentation for supported dimensions before you assume.

Stage 3: Analyzing the Image with Qwen Vision

The image from Stage 2 goes into Qwen's vision-language model:

vision_skill = Skill(
    model = Model("qwen-vl-max", api_key=os.environ["DASHSCOPE_API_KEY"]),
    input = {
        "messages": [{
            "role": "user",
            "parts": [
                {"type": "image", "source": {"kind": "base64",
                                              "data": image["base64"],
                                              "mime": image["mime_type"]}},
                {"type": "text",  "text": "What do you see in this image?"},
            ],
        }]
    },
)

analysis = vision_skill.run()
print(f"[image → text · Qwen]\n{analysis}")

The parts list now contains two items:

An image part — a dictionary with "type": "image" and a "source" object. The source specifies "kind": "base64", the actual base64 data, and the MIME type — both pulled directly from Stage 2's output dictionary.
A text part — a dictionary with "type": "text" and the question.

Same parts structure as Stage 1. The only difference is that instead of bare strings, we use typed dictionaries to describe each piece of content. The vision model receives the image and the question in a single message and Qwen's response comes back as a plain string — something like:

"The image shows a vivid sunset over an ocean. The sky displays gradients of orange, pink, and purple. The sun is partially below the horizon, with its reflection stretching across calm water."

The Complete Script

"""
Multimodal pipeline: Text → Image → Text, three different providers.

  1. text  → text   Claude  (claude-3-5-sonnet-20241022)
  2. text  → image           (imagine-image-pro)
  3. image → text   Qwen    (qwen-vl-max)

Required env vars:
    ANTHROPIC_API_KEY
    XAI_API_KEY
    DASHSCOPE_API_KEY
"""

import os, sys, base64, pathlib
from yait_aichain import Model, Skill

# ── 1. Text → Text (Claude) ──────────────────────────────────────────────────
text_skill = Skill(
    model = Model("claude-3-5-sonnet-20241022", api_key=os.environ["ANTHROPIC_API_KEY"]),
    input = {"messages": [{"role": "user", "parts": ["Describe a sunset in one sentence."]}]},
)

try:
    description = text_skill.run()
except Exception as e:
    print(f"Stage 1 failed: {e}"); sys.exit(1)
print(f"[text → text · Claude]\n{description}\n")

# ── 2. Text → Image ──────────────────────────────────────────────────────────
image_skill = Skill(
    model  = Model("imagine-image-pro", api_key=os.environ["XAI_API_KEY"]),
    input  = {"messages": [{"role": "user", "parts": [description]}]},
    output = {"modalities": ["image"], "format": {"type": "image", "size": "1024x1024"}},
)

try:
    image = image_skill.run()
except Exception as e:
    print(f"Stage 2 failed: {e}"); sys.exit(1)
img_path = pathlib.Path("output_sunset.png")
img_path.write_bytes(base64.b64decode(image["base64"]))
print(f"[text → image]\nsaved → {img_path}\n")

# ── 3. Image → Text (Qwen Vision) ────────────────────────────────────────────
vision_skill = Skill(
    model = Model("qwen-vl-max", api_key=os.environ["DASHSCOPE_API_KEY"]),
    input = {
        "messages": [{
            "role": "user",
            "parts": [
                {"type": "image", "source": {"kind": "base64",
                                              "data": image["base64"],
                                              "mime": image["mime_type"]}},
                {"type": "text",  "text": "What do you see in this image?"},
            ],
        }]
    },
)

try:
    analysis = vision_skill.run()
except Exception as e:
    print(f"Stage 3 failed: {e}"); sys.exit(1)
print(f"[image → text · Qwen]\n{analysis}")

Three providers. Two modality transitions. Each stage wrapped in its own try/except so a failure at Stage 2 tells you it was Stage 2 — not a cryptic traceback from somewhere inside a provider SDK you didn't even know you were calling.

How the Stages Connect

There's no special "chaining" API. The variable description (a string) goes directly into image_skill's input. The variable image (a dictionary) gets its fields plucked out for vision_skill's input. Regular Python variables carry data between stages.

When you need to transform data between stages — truncating a description to 200 characters before image generation, for instance — you write normal Python between the calls. No callbacks, no middleware, no pipeline DSL. This is actually one of the things I like about this approach: the "pipeline" is just a script.

The parts list is what keeps the interface uniform across modalities:

Text-only: "parts": ["your string here"]
Image-only: "parts": [{"type": "image", "source": {...}}]
Mixed: "parts": [image_dict, text_dict]

One structure, every model, every modality.

Swapping Providers

Notice what's absent from the Skill configurations: no Anthropic client initialization, no provider-specific headers, no DashScope SDK imports. The Model constructor takes a model name and an API key; provider routing happens internally. Swapping the image generation model means changing one string and one environment variable — nothing else in the script changes.

Extending the Pipeline

Once you have this pattern, extensions are straightforward.

Add a fourth stage. Take Qwen's analysis and feed it to a text model for summarization or translation. Another Skill, another Model, same shape.
Branch instead of chain. Generate 3 different images from the same description using 3 different models. Compare the results by feeding all of them to the vision model in separate Skill calls.
Save intermediate results. The script already saves the image to disk. Add JSON logging for the text outputs and you have a full audit trail of the pipeline's execution.

The models do the hard work. The code connects them — and stays out of the way.

AIchain Agent: Plan, Act, Reflect

YAIT — Sat, 20 Jun 2026 10:02:00 +0000

A Chain knows every step before it runs. You define step one, step two, step three — and it executes them in order. That works when the problem is well-understood. But what happens when you don't know the steps in advance? When the output of one step determines whether you need two more steps or five? When a search returns nothing useful and the whole approach needs to change mid-run?

That's where Agent comes in. It plans, observes what happened, and decides what to do next. The difference between a Chain and an Agent is the difference between a script and thinking.

The Problem Agents Solve

Consider a task like: "Find the official documentation for Qdrant, identify its main sections, and summarize each one." You don't know ahead of time how many searches you'll need, whether the first URL will be correct, or whether the page content will be structured enough to extract sections from. The number of steps depends on what actually happens at runtime.

If you try to hard-code this as a Chain, you'll either over-engineer it with branching logic for every edge case, or you'll build something brittle that fails the moment reality doesn't match your assumptions. And it will. Reality always does.

An Agent handles this naturally. It makes a plan, executes the first step, looks at the result, and adjusts. Maybe it needs one search. Maybe it needs three. The Agent figures that out as it goes.

A Minimal Agent

Here's the simplest possible Agent — one model, one tool, one task:

import os
from yait_aichain.models import Model
from yait_aichain.agent import Agent
from yait_aichain.tools import searchPerplexity

agent = Agent(
    orchestrator=Model("claude-sonnet-4-6", api_key=os.getenv("ANTHROPIC_API_KEY")),
    tools=[searchPerplexity(api_key=os.getenv("PERPLEXITY_API_KEY"))],
    max_steps=5,
    mode="waterfall",
)

result = agent.run("What are the main differences between Qdrant and Pinecone?")
print(result.output)
print(f"steps={result.steps_taken}  tokens={result.tokens_used:,}")

The orchestrator model handles planning and reflection — it decides which tool to call, evaluates the result, and determines the next action. The tools list defines what the agent can do. And max_steps caps how far it can go.

The result object gives you three things worth caring about: .output (the final answer), .steps_taken (how many steps actually ran), and .tokens_used (total tokens consumed). Enough to understand what happened and what it cost.

Two Modes: Fixed-Plan and Adaptive

Agents in yait_aichain support two execution modes, and the choice between them shapes how the agent behaves.

Fixed-Plan (waterfall)

In waterfall mode, the agent builds a complete plan before executing anything. All steps laid out upfront, then run in order. The plan structure is fixed, but reflection still happens between steps — the agent can stop early if the task is already done, or retry a failed step. What it can't do is add new steps or rearrange the remaining ones.

This gives you predictability. You can look at the plan and know roughly what the agent will do. It's the right choice when the task has a natural structure — "search, then summarize, then format" — even if you're not sure whether the search will need a retry.

Adaptive (agile)

Agile mode is different. After every step, the agent looks at what just happened and can rewrite all remaining steps. Maybe the first search revealed that the question has two parts, so the agent adds a second search it didn't originally plan. Maybe a step returned exactly what was needed, so the agent skips three planned steps and jumps straight to the final answer.

Here's an adaptive agent with multiple tools:

import os
from yait_aichain.models import Model
from yait_aichain.agent import Agent
from yait_aichain.tools import searchPerplexity, fetchPage, convertToMD

agent = Agent(
    orchestrator=Model("claude-sonnet-4-6", api_key=os.getenv("ANTHROPIC_API_KEY")),
    tools=[
        searchPerplexity(api_key=os.getenv("PERPLEXITY_API_KEY")),
        fetchPage(),
        convertToMD(),
    ],
    max_steps=8,
    mode="agile",
)

result = agent.run(
    "Find the official Qdrant documentation homepage URL, "
    "then fetch that page and tell me what the main sections are."
)

print(result.output)
print(f"steps={result.steps_taken}  tokens={result.tokens_used:,}")

This agent has three tools. searchPerplexity finds the URL. fetchPage retrieves the raw page content. convertToMD strips the HTML down to Markdown so the model can read the structure cleanly. The agent decides on its own which tool to call at each step, and it can change its plan based on what each tool returns.

That flexibility isn't free. The execution path is less predictable, and the agent may burn more tokens exploring approaches that don't pan out.

Use waterfall when the task has a known shape. Use agile when it doesn't.

max_steps Is Not Optional

An agent without max_steps is an infinite loop waiting to happen.

Without a hard cap, the agent keeps planning and executing until it exhausts the model's context window or burns through your token budget. In development, that's an awkward wait and a surprising bill. In production, it's an outage.

# Missing max_steps. The agent runs until something breaks.
agent = Agent(
    orchestrator=Model("claude-sonnet-4-6", api_key=os.getenv("ANTHROPIC_API_KEY")),
    tools=[searchPerplexity(api_key=os.getenv("PERPLEXITY_API_KEY"))],
    mode="agile",
)

Always set max_steps. For a single-tool task like "search and summarize," 5 steps covers one search, a possible retry, and the synthesis pass with room to spare. For multi-tool workflows — search, fetch, convert, analyze — 8 to 10 reflects the realistic step count without giving the agent room to spiral.

You should also check whether the agent hit its ceiling before finishing:

if result.steps_taken == max_steps:
    # The agent ran out of steps before completing the task.
    # Log, retry with a higher limit, or surface an error to the user.
    print(f"Warning: agent hit max_steps={max_steps}. Output may be incomplete.")
else:
    print(result.output)

Don't skip this check. An agent that hit max_steps may return a plausible-looking but incomplete answer, and you won't know unless you look.

When to Use Agent vs. Chain

If you already know the steps, use Chain. It's deterministic, cheaper, and easier to debug. Every time.

Use Agent when the number or nature of steps can't be determined before execution begins — when the task requires reacting to intermediate results, when the path from question to answer isn't a straight line.

My practical heuristic: if you can draw the workflow on a whiteboard before writing code, that's a Chain. If you'd need to draw multiple possible workflows with "it depends" arrows between them, that's an Agent.

The Full API Surface

For reference, here's what you can configure:

Agent(
    orchestrator: Model,        # required — plans and reflects
    executors:    list[Model],  # optional — cheaper models for tool-call steps
    tools:        list,         # optional — tools available to the agent
    mode:         str,          # "waterfall" | "agile"  (default: "waterfall")
    max_steps:    int,          # hard cap on execution depth
)

The executors parameter lets you assign a cheaper or faster model to run individual tool-call steps while keeping a more capable model as the orchestrator:

import os
from yait_aichain.models import Model
from yait_aichain.agent import Agent
from yait_aichain.tools import searchPerplexity, fetchPage, convertToMD

agent = Agent(
    orchestrator=Model("claude-sonnet-4-6", api_key=os.getenv("ANTHROPIC_API_KEY")),
    executors=[Model("gemini-2.5-flash", api_key=os.getenv("GOOGLE_API_KEY"))],
    tools=[
        searchPerplexity(api_key=os.getenv("PERPLEXITY_API_KEY")),
        fetchPage(),
        convertToMD(),
    ],
    max_steps=8,
    mode="agile",
)

The orchestrator handles the reasoning-heavy work — deciding what to do next, evaluating results, writing the final answer. The executor handles the mechanical steps in between: calling a tool and passing the output back. A capable model where it matters, a faster cheaper one where it doesn't. That split alone can meaningfully reduce costs on longer workflows.

AIchain Tools: Search, Conversion, Embeddings

YAIT — Wed, 17 Jun 2026 13:11:00 +0000

An LLM knows everything up to its training cutoff and nothing after. Ask it about yesterday's stock price, hand it a PDF, or expect it to tell you which of your 10,000 support tickets are duplicates — and you'll hit a wall. Tools are how you close that gap.

This is article #6 in the series on building AI pipelines with yait_aichain. Three categories today: search for live data, Markdown conversion for universal document ingestion, and embeddings for tasks well beyond basic RAG.

The Design Philosophy: Thin Core, Pluggable Everything

Before any code, one architectural decision is worth understanding. yait_aichain deliberately keeps its core dependency surface small. Only one tool ships inside the package as a hard dependency: Markdown conversion. Everything else — search providers, embeddings, vector stores, rerankers — is either an optional extra you install separately or a pluggable MCP (Model Context Protocol) server running as a separate process.

Why? Specialized tools change fast. Search APIs come and go. OpenAI has shipped three generations of embedding models since 2022, and the pace hasn't slowed. Tying all of that into the core library would create a fragile dependency tree that breaks every time a provider ships a breaking change. So the universal stuff lives inside. The specialized things plug in from outside.

Search: Giving the Model Live Data

A model trained on data through April 2024 can't tell you what happened yesterday. Search tools fix that.

yait_aichain includes a Perplexity search integration as an optional extra. Install it first, then import:

pip install yait-aichain[perplexity]

from yait_aichain.tools import searchPerplexity

search = searchPerplexity()
result = search.run("latest LLM benchmarks 2025")
print(result)

One function call, one string back — grounded, citation-backed, pulled from live web results.

But Perplexity isn't the only option. You might prefer Brave Search for its independence from big-tech indices, or SerpAPI for its structured Google results. These providers connect through MCP servers — external processes that yait_aichain discovers and calls at runtime:

"""
Start the MCP server first:
    python your_newsapi_server.py

MCP server: newsapi on http://127.0.0.1:8009/mcp

Tools available:
    search_news        — search articles (requires: q, sources, or domains)
    get_top_headlines  — breaking news
    get_sources        — list all available news sources
"""

from yait_aichain.tools import Tool
# The Skill/Chain connects to MCP tools via the protocol at runtime.
# No MCP-specific dependencies live inside yait_aichain itself.

The MCP pattern is intentional. A NewsAPI server has its own API key, its own rate limits, its own Python dependencies. Keeping that in a separate process means your core pipeline stays clean. Swap Brave for SerpAPI by pointing at a different MCP server — no code changes in your Chain.

MarkItDownTool: The 80% Solution for Data Prep

Here's a scenario every developer hits: you need an LLM to process a PDF report, a DOCX contract, a PowerPoint deck, and a URL. Each format needs a different parser. Each parser has its own quirks, its own edge cases, its own dependencies.

Or you use one line:

from yait_aichain.tools import convertToMD

tool = convertToMD()

# Convert a URL
web_content = tool.run("https://example.com")
print(web_content[:500])

# Convert a local PDF
pdf_content = tool.run("quarterly_report.pdf")

# Convert a DOCX file
doc_content = tool.run("contract_v3.docx")

Under the hood, convertToMD wraps Microsoft's open-source markitdown library, bundled as a hard dependency of yait_aichain — no separate install required. It handles PDF, DOCX, PPTX, XLSX, HTML, and URLs out of the box. The output is always Markdown, which happens to be the format LLMs consume best.

No configuration objects. No format detection logic. No parsing pipelines. One call turns nearly any document into clean Markdown ready for a Skill or Chain. That's why it's the only tool that lives inside yait_aichain as a hard dependency — document-to-Markdown conversion is universal enough to earn its place in the core.

Embeddings: Beyond Basic RAG

When developers hear "embeddings," they usually think retrieval-augmented generation — chunk documents, embed them, store them in a vector database, retrieve at query time. That's one use case. Not the only one.

from yait_aichain.tools import Embedding

emb = Embedding(model="text-embedding-3-small")
vectors = emb.embed([
    "How do I reset my password?",
    "I can't log into my account",
    "What are your pricing plans?",
    "Password reset not working",
])

Each call returns a list of float vectors. What you do with those vectors determines the use case.

Semantic search. Skip keyword matching entirely. A query for "authentication issues" will surface tickets about password resets, SSO failures, and 2FA problems — because the meaning matches, not the words.

Deduplication. Compare cosine similarity between support tickets. Tickets 0, 1, and 3 above will cluster tightly — they're semantically identical despite different wording. Set a similarity threshold of 0.92 and you can automatically flag duplicates across thousands of entries without a human reading a single one.

Clustering. Feed the vectors into k-means or HDBSCAN. Group product reviews by theme without writing a single regex. Find natural topic boundaries in a corpus of research papers.

The VectorDB class gives you an in-process vector store for quick prototyping:

from yait_aichain.tools import VectorDB, vectorQuery

db = VectorDB()
db.add(
    documents=["Password reset guide", "Pricing FAQ", "SSO troubleshooting"],
    ids=["d1", "d2", "d3"],
)

results = vectorQuery(db, query="I can't log in", top_k=2)
for r in results:
    print(r)

Note: vectorQuery is a standalone function rather than a method on VectorDB because it operates across multiple database instances in pipeline contexts — the separation is intentional.

When result quality matters and your document set is large, add a reranking step:

from yait_aichain.tools import Reranker

reranker = Reranker()
ranked = reranker.rerank(
    query="latest AI benchmarks",
    documents=["doc A about benchmarks", "doc B about pricing", "doc C about model eval"],
)
print(ranked)

The reranker takes coarse results from vector search and reorders them with a cross-encoder — more expensive per comparison, but on published benchmarks like BEIR, cross-encoder reranking consistently improves precision at top-k positions over vector search alone. Think of results as your candidate pool and ranked as the final ordered list you pass to the model.

Custom Tools: Build Your Own

Every tool in yait_aichain follows the same pattern: subclass Tool, define name and description, implement run(self, input: str). The single-string signature is what lets the framework invoke your tool through the standard LLM tool-calling protocol — the model sends a string (or a JSON blob you parse yourself), your tool returns a string. Here's a live currency exchange tool:

import json
import urllib3
from yait_aichain.tools import Tool
from yait_aichain import Model, Skill

class FXRateTool(Tool):
    """Fetch live FX rate from frankfurter.app."""
    name = "fx_rate"
    description = (
        "Return the current exchange rate between two currencies. "
        "Input must be a JSON string with keys 'base' and 'target', "
        "e.g. '{\"base\": \"USD\", \"target\": \"EUR\"}'."
    )

    def run(self, input: str) -> str:
        # Production code should validate input and handle network errors.
        params = json.loads(input)
        base = params.get("base", "USD")
        target = params.get("target", "EUR")
        http = urllib3.PoolManager()
        resp = http.request(
            "GET",
            f"https://api.frankfurter.app/latest?from={base}&to={target}",
        )
        data = json.loads(resp.data.decode())
        rate = data["rates"][target]
        return f"1 {base} = {rate} {target}"

# Use it standalone
tool = FXRateTool()
print(tool.run('{"base": "USD", "target": "EUR"}'))  # 1 USD = 0.8842 EUR

# Or plug it into a Skill
model = Model("claude-3-5-sonnet-20241022")
skill = Skill(
    model=model,
    input={
        "messages": [
            {
                "role": "user",
                # yait_aichain uses "content" as the universal message field
                # across all supported providers.
                "content": "What is the USD to EUR rate? Use the fx_rate tool.",
            }
        ]
    },
    tools=[FXRateTool()],
)

The tools=[] parameter on Skill accepts any list of Tool instances. The model sees the tool names and descriptions, decides when to call them, and incorporates the results into its response. You write the logic once. The framework handles the tool-calling protocol across all eight supported providers — OpenAI, Anthropic, Google Gemini, xAI Grok, Mistral, Groq, Ollama, and any OpenAI-compatible endpoint.

What Lives Where

Inside yait_aichain	Outside (pluggable)
`convertToMD` — Markdown conversion	Search (Perplexity via optional extra; Brave, SerpAPI via MCP)
`Tool` base class	Custom tools you write
Core: Model, Skill, Chain, Pool, Agent	MCP servers with their own dependencies
—	Embeddings, VectorDB, Reranker (optional extras)

The boundary is deliberate. Markdown conversion is universal — every pipeline needs to ingest documents. Search providers, embedding models, and vector stores are opinionated choices that vary by project. Keep them outside and your base pip install yait-aichain stays light. Add only what you actually use.

Tools extend what a language model can do: read today's news, parse your company's PDFs, find semantic patterns across thousands of documents. The API surface stays small. The capabilities grow with your project.

AIchain Pool: Parallel Calls Instead of Sequential

YAIT — Sun, 14 Jun 2026 15:39:00 +0000

You have 50 documents and you're running them through an LLM in a loop. The first one finishes at the 2-second mark. The fiftieth finishes at the 100-second mark — not because it's harder, but because it waited in line behind the other 49. Pool runs all 50 at the same time.

The Problem With Loops

Every developer who works with LLMs writes this code eventually:

import os
from yait_aichain.models import Model
from yait_aichain.skills import Skill

skill = Skill(
    model=Model("claude-sonnet-4-6", api_key=os.getenv("ANTHROPIC_API_KEY")),
    input={"messages": [{"role": "user", "parts": [
        "Summarise in two sentences:\n\n{text}"
    ]}]},
)

documents = [{"text": f"Document {i} content..."} for i in range(50)]

results = []
for doc in documents:
    result = skill.run(doc)
    results.append(result)

It works. It's readable. And it's painfully slow.

Each LLM call takes roughly 2 seconds. Multiply that by 50 documents and you're staring at your terminal for almost two minutes. The calls are completely independent — document 37 doesn't need the result of document 12. Yet document 37 sits idle, waiting its turn. That's a scheduling problem, not a computation problem.

I ran into this directly while building a task that pulled N files or links and produced a consolidated report. The sequential version was logically fine but just hemorrhaged time. I needed to fire everything at once without rewriting the Skill logic — no new prompt templates, no restructured code, just a different execution model. That's what Pool is.

Pool: Parallel Map for LLM Calls

Pool takes one Skill (or Chain) and a list of inputs, then launches all of them concurrently. Think of it as Array.map() where every element runs in parallel against an LLM.

import os
from yait_aichain.models import Model
from yait_aichain.skills import Skill
from yait_aichain.pool import Pool, DONE, FAILED

skill = Skill(
    model=Model("claude-sonnet-4-6", api_key=os.getenv("ANTHROPIC_API_KEY")),
    input={"messages": [{"role": "user", "parts": [
        "Summarise in two sentences:\n\n{text}"
    ]}]},
)

items = [
    {"text": "Artificial intelligence is transforming..."},
    {"text": "Quantum computing promises..."},
    {"text": "Climate change is accelerating..."},
    {"text": "Blockchain technology enables..."},
    {"text": "Gene editing with CRISPR..."},
]

pool = Pool(skill, items=items, max_flows=5)
results = pool.run()

for result in results:
    print(result)

s = pool.status
print(f"done={s[DONE]}  failed={s[FAILED]}")

Three things worth noticing:

The Skill didn't change. Same model, same prompt template, same {text} placeholder. Pool wraps existing logic — it doesn't demand new logic.
pool.run() returns a list in the same order as the input. Item 0 in, result 0 out. No need to track which response belongs to which document.
Status tracking is built in. pool.status gives you a dict with DONE and FAILED counts, so you know exactly what succeeded and what didn't.

The math is straightforward. Five items averaging ~2 seconds each, running concurrently: wall-clock time drops from ~10 seconds to ~2 seconds. The overhead is network jitter and provider-side queuing, not sequential waiting. Scale to 50 items and the gap gets embarrassing. Exact numbers depend on your provider and network conditions, but the shape of the improvement is consistent — you pay for one round-trip, not N.

Controlling Concurrency With `max_flows`

Running everything at once sounds great until your API provider starts returning 429 errors.

LLM providers enforce rate limits, and those limits vary by model, tier, and account type. Blasting 200 concurrent requests is a reliable way to get throttled regardless of your tier. Check your provider's current documentation before tuning this number — don't just guess.

max_flows is your throttle. It sets the maximum number of calls in flight at any given moment:

pool = Pool(skill, items=documents, max_flows=10)

With max_flows=10, Pool processes 50 documents in waves of 10 concurrent calls. Still dramatically faster than sequential, but it keeps you within reasonable rate-limit bounds. Dial it up or down depending on what your provider will tolerate.

Failures Don't Sink the Ship

Batch jobs have a classic failure mode: item 23 out of 50 throws an error and the whole run aborts. You fix the issue, restart from scratch, and wait through items 1–22 again. Deeply annoying.

Pool handles this differently. Each item gets its own outcome — DONE or FAILED. One failed call doesn't stop the rest. After pool.run() completes, check pool.status for the breakdown:

s = pool.status
print(f"done={s[DONE]}  failed={s[FAILED]}")
# done=48  failed=2

You get 48 good results. You know exactly which 2 failed. Reprocess those two — not the entire batch.

Pool + Chain: Multi-Step Pipelines in Parallel

Pool isn't limited to a single Skill. It accepts a Chain as its runner, which means multi-step workflows parallelize just as easily.

Here's a real example: fetch a web page, convert it to Markdown, then summarize it — all in parallel across multiple URLs.

import os
from yait_aichain.models import Model
from yait_aichain.skills import Skill
from yait_aichain.chain import Chain
from yait_aichain.pool import Pool, DONE, FAILED
from yait_aichain.tools import convertToMD

fetch = convertToMD()

summarise = Skill(
    model=Model("claude-sonnet-4-6", api_key=os.getenv("ANTHROPIC_API_KEY")),
    input={"messages": [{"role": "user", "parts": [
        "Summarise in one sentence:\n\n{result}"
    ]}]},
)

# Each Chain step is a tuple of (runner, output_key, input_mapping).
# Here: fetch writes its output to "result", mapped from the item's "source" field.
# The summarise Skill then reads {result} from that output key.
per_item = Chain(steps=[
    (fetch, "result", {"input": "source"}),  # fetch page; store as "result"
    summarise,                                # summarise reads {result}
])

items = [
    {"source": "https://fr.lipsum.com"},
    {"source": "https://de.lipsum.com"},
    {"source": "https://es.lipsum.com"},
]

pool = Pool(per_item, items=items, max_flows=3)
results = pool.run()

for item, result in zip(items, results):
    print(f"[{item['source']}]\n{result}\n")

s = pool.status
print(f"done={s[DONE]}  failed={s[FAILED]}")

The Chain step tuple has three elements: the runner, the key under which its output is stored, and a mapping from that key to the next step's input field. So (fetch, "result", {"input": "source"}) means: run fetch using the item's source field as input, store the output under "result". The summarise Skill then receives {result} via its prompt template. Each URL goes through the full fetch-then-summarize pipeline independently, and Pool runs all three chains at once.

When Pool Changes the Math

Consider a weekly report pulling data from 200 sources — summarize each one, then combine. Sequential at ~2 seconds per call: roughly 400 seconds, nearly 7 minutes. With max_flows=20, you process those 200 items in 10 waves: roughly 20 seconds total. What used to need a scheduled overnight job now finishes while you're still looking at the screen.

The API surface is intentionally small:

Pool(runner, items, max_flows) — runner is a Skill or Chain; items is a list of dicts; max_flows caps concurrency
pool.run() — executes everything, returns results in input order
pool.status — returns {DONE: int, FAILED: int} after the run

No async/await boilerplate. No callbacks. Define your Skill, list your inputs, set a concurrency cap, and Pool handles the scheduling.

AIchain: Connecting Steps into a Pipeline

YAIT — Thu, 11 Jun 2026 10:12:00 +0000

One LLM call solves a simple task. Real tasks look nothing like that. Real tasks are: fetch a page, extract the key points, rewrite them for a specific audience, translate the result into five languages. Four steps, each with its own job. Chain is the glue between them.

The Problem with "One Big Prompt"

When developers first start building with LLMs, there's a natural instinct to pack everything into a single elaborate prompt. "Read this URL, summarize it, rewrite it in a friendly tone, and translate it to Spanish." Sometimes it works. Often it doesn't — you've handed one model a vague, multi-part job with no clear boundaries, and when the output is wrong, you can't tell which sub-task failed. Was it the summary? The rewrite? The translation? Token limits compound the problem, and output formatting across sub-tasks becomes inconsistent with nothing to enforce boundaries.

A better pipeline is several simple steps, each with a clear boundary of responsibility. One step fetches. Another writes. Another edits. Another translates. Each step does one thing well, and the output of one becomes the input of the next.

That's what Chain does in yait_aichain.

How Chain Works

A Chain is an ordered sequence of steps. Each step is a Skill, a tool, or even an Agent. They all share one accumulated variable dictionary — outputs flow forward automatically. No manual variable shuffling, no temp files.

Here's the simplest version: two skills, two different providers.

from yait_aichain import Model, Skill, Chain

writer = Skill(
    model=Model(provider="openai", model="gpt-4o-mini"),
    prompt="Write a short paragraph about {topic}.",
)

reviewer = Skill(
    model=Model(provider="anthropic", model="claude-3-5-haiku-20241022"),
    prompt="Review this paragraph and suggest one improvement:\n\n{result}",
)

chain = Chain(steps=[writer, reviewer])
result = chain.run(variables={"topic": "the importance of clean code"})

print(result)

Step 1 runs GPT-4o-mini. Its output is stored under the default key "result" in the accumulated variable dict. Step 2 picks up {result} in its prompt template and sends it to Claude. Two models, two providers, no plumbing.

Notice that each step uses its own model. You're not locked into one provider for the entire pipeline. Use GPT for generation, Claude for review, a cheap model for translation — whatever makes sense for each task.

Mixing Tools and Skills

Not every step needs an LLM. Sometimes the first step is just fetching data. Chain handles that too.

from yait_aichain import Model, Skill, Chain
from yait_aichain.tools import convertToMD

summariser = Skill(
    model=Model(provider="anthropic", model="claude-3-5-haiku-20241022"),
    prompt="Summarise this page in three bullet points:\n\n{result}",
)

chain = Chain(steps=[convertToMD, summariser])
result = chain.run(variables={"url": "https://en.wikipedia.org/wiki/Python_(programming_language)"})

print(result)

convertToMD is a tool — it fetches a URL and converts the page to Markdown. No LLM involved. It stores its output under the default key "result", which is why the summariser's prompt references {result} directly. Tools and skills sit side by side in the same pipeline.

Named Keys and Variable Remapping

When pipelines get longer, the default "result" key isn't enough. You want each step's output to have a meaningful name so downstream steps can reference exactly what they need.

Pass a 2-tuple of (step, output_key) to give a step's output an explicit name:

chain = Chain(steps=[
    (summariser,  "summary"),
    (translator,  "final"),
])

Now the summariser's output lives under accumulated["summary"], and the translator's prompt can reference {summary} directly. Every step has access to every previously accumulated variable — not just the one immediately before it.

If a skill's prompt expects a variable name that doesn't match what's in the accumulated dict, remap it with a 3-tuple of (step, output_key, remap_dict):

# Stored as "body_content" earlier in the pipeline,
# but this skill's prompt expects {current_section_content}
(summariser, "summary", {"current_section_content": "body_content"})

The remap dict maps prompt_variable → accumulated_key. Here, Chain reads the value stored under "body_content" and passes it to summariser as current_section_content. Here's a minimal working example:

from yait_aichain import Model, Skill, Chain

extractor = Skill(
    model=Model(provider="openai", model="gpt-4o-mini"),
    prompt="Extract the main argument from this text:\n\n{text}",
)

summariser = Skill(
    model=Model(provider="openai", model="gpt-4o-mini"),
    prompt="Summarise this argument in one sentence:\n\n{current_section_content}",
)

chain = Chain(steps=[
    (extractor,  "body_content"),
    (summariser, "summary", {"current_section_content": "body_content"}),
])

result = chain.run(variables={"text": "Your source text here..."})
print(result)

So steps come in three forms: a bare runner, a 2-tuple with an output key, or a 3-tuple with an output key and a remap dict. Pick whichever fits the complexity of that step.

The Real-World Pipeline

Here's the scenario from the introduction: fetch a source URL, write content from it, make it readable, translate it. Four steps, each with one job. Named keys keep every output identifiable.

from yait_aichain import Model, Skill, Chain
from yait_aichain.tools import convertToMD

writer = Skill(
    model=Model(provider="openai", model="gpt-4o"),
    prompt="Based on this source material, write a concise blog post:\n\n{result}",
)

editor = Skill(
    model=Model(provider="anthropic", model="claude-3-5-sonnet-20241022"),
    prompt="Make this text more readable and engaging. Keep it concise:\n\n{draft}",
)

translator = Skill(
    model=Model(provider="openai", model="gpt-4o-mini"),
    prompt="Translate the following into {language}:\n\n{edited}",
)

chain = Chain(steps=[
    convertToMD,
    (writer,     "draft"),
    (editor,     "edited",  {"draft": "draft"}),
    (translator, "final"),
])

result = chain.run(variables={
    "url": "https://example.com/source-article",
    "language": "French",
})

print(result)

Four steps. Three different models — GPT-4o for writing, Claude Sonnet for editing, GPT-4o-mini for translation. One tool. To run this for multiple languages, loop over the language values:

for language in ["French", "German", "Japanese", "Spanish", "Portuguese"]:
    result = chain.run(variables={
        "url": "https://example.com/source-article",
        "language": language,
    })
    print(f"--- {language} ---")
    print(result)

Could an Agent handle this? Sure. But why introduce probabilistic decision-making when the sequence is fixed? You don't need a planner deciding "what to do next" when the answer is always step 1, then 2, then 3, then 4. Chain gives you deterministic, repeatable execution — the same inputs produce the same sequence of calls every time.

Debugging with chain.history

When a four-step pipeline produces unexpected output, you need to know which step went sideways. After chain.run() completes, inspect chain.history:

print(chain.history)

chain.history is a list of dicts, one per step. Here's what an entry looks like:

[
  {
    "step_index": 0,
    "runner": "convertToMD",
    "inputs": {"url": "https://example.com/source-article"},
    "output": "# Example Article\n\nMarkdown content here..."
  },
  {
    "step_index": 1,
    "runner": "Skill(gpt-4o)",
    "inputs": {"result": "# Example Article\n\nMarkdown content here..."},
    "output": "Here is a concise blog post based on the source material..."
  },
  ...
]

You look at step 2's output, see the editor mangled your formatting, fix that one prompt, and re-run. No diagnostic print statements scattered through your code, no running the whole pipeline blind hoping something changes.

This is the same step-by-step visibility that visual workflow tools offer through node inspectors — but it's plain Python. No YAML config files, no browser-based editors. Define your pipeline in code, run it, read the history.

When to Use Chain vs. an Agent

Simple rule: if the sequence of steps is known in advance, use Chain. If the system needs to decide at runtime which tools to call and in what order based on an ambiguous goal, use an Agent.

Most production workflows are deterministic sequences. Content pipelines, data processing, report generation, ETL flows — fixed sequences with variable inputs. Chain was built for exactly this: a code-first alternative to visual workflow tools, without the overhead.

Keep steps simple, keep them separated, and let each one do exactly one thing. That's what makes a pipeline maintainable six months later when you've completely forgotten how it works.

AIchain Reasoning: One Parameter for Every Provider

YAIT — Sun, 07 Jun 2026 11:47:00 +0000

OpenAI calls it reasoning_effort. Anthropic calls it budget_tokens. Google calls it thinkingBudget. Kimi calls it thinking. Qwen calls it enable_thinking. DeepSeek skips the parameter entirely and routes you to a separate model. Same idea, six implementations — and real cost implications if you use it carelessly.

Every major model provider now ships some version of "let the model think longer before answering." The concept is identical: allocate extra compute at inference time so the model can work through a problem step by step before committing to a response. But each provider wraps that in a different API parameter, with different value types, different scaling behavior, and different documentation you'll need to dig through.

I got tired of reading all of them.

Six Providers, Six Parameter Names, One Idea

Here's what the landscape actually looks like when you strip away the marketing:

Provider	Native Parameter	Value Format
OpenAI	`reasoning_effort`	`"low"`, `"medium"`, `"high"`
Anthropic	`budget_tokens`	Integer (token count)
Google	`thinkingBudget`	Integer (token count)
Kimi	`thinking`	Enabled / routes to a thinking model
DeepSeek	(no param)	Routes entire request to `deepseek-reasoner`
Qwen	`enable_thinking`	Boolean

OpenAI gives you a string enum. Anthropic and Google want a raw token budget — but on different scales. Kimi toggles a flag or swaps the model entirely. DeepSeek doesn't even have a parameter; it redirects your call to a completely separate model. Qwen gives you a boolean, take it or leave it.

If you're building anything that runs across multiple providers — for cost optimization, fallback routing, or A/B testing — this fragmentation is a genuine problem. You end up writing provider-specific branching logic that has nothing to do with your actual application.

One Parameter That Translates Everywhere

In aichain, the reasoning interface is a single key in the model options:

from yait_aichain import Model, Skill

skill = Skill(
    model=Model("gpt-4o", options={"reasoning": "high"}),
    input={"messages": [{"role": "user", "parts": ["{prompt}"]}]},
)
result = skill.run(variables={"prompt": "Solve this step by step: ..."})
print(result.content)  # result.content holds the model's response text

The reasoning option accepts three levels — "low", "medium", "high" — or you omit it entirely for standard inference. The library handles translation to each provider's native format automatically. For Qwen, enable_thinking is set to True for any non-null reasoning level and False when reasoning is omitted. There are no partial levels on Qwen's side, so "low" and "medium" both map to True.

Swapping the model is a one-line change, and the reasoning behavior follows:

from yait_aichain import Model, Skill

PROMPT = {
    "messages": [{"role": "user", "parts": ["Prove that sqrt(2) is irrational. Show your work."]}]
}

models = [
    Model("gpt-4o",             options={"reasoning": "high"}),  # → reasoning_effort: "high"
    Model("claude-sonnet-4-5",  options={"reasoning": "high"}),  # → budget_tokens: large value
    Model("gemini-2.5-pro",     options={"reasoning": "high"}),  # → thinkingBudget: large value
    Model("kimi-k2",            options={"reasoning": "high"}),  # → routes to thinking variant
    Model("deepseek-chat",      options={"reasoning": "high"}),  # → routes to deepseek-reasoner
    Model("qwen-plus",          options={"reasoning": "high"}),  # → enable_thinking: true
]

for model in models:
    skill = Skill(model=model, input=PROMPT)
    result = skill.run()
    print(f"{model}: {result.content[:200]}")

Six providers, six different native implementations, zero branching logic in your code. The provider is inferred from the model name prefix — gpt-* routes to OpenAI, claude-* to Anthropic, gemini-* to Google, and so on — and the reasoning parameter gets translated accordingly.

Some providers have special quirks that the library absorbs. DeepSeek doesn't have a reasoning toggle — it has an entirely separate model called deepseek-reasoner. When you set reasoning="high" on deepseek-chat, the library silently reroutes your request:

from yait_aichain import Model, Skill

# Without reasoning: hits deepseek-chat
skill = Skill(
    model=Model("deepseek-chat"),
    input={"messages": [{"role": "user", "parts": ["Explain gradient descent."]}]},
)

# With reasoning: library automatically routes to deepseek-reasoner
skill_reasoning = Skill(
    model=Model("deepseek-chat", options={"reasoning": "high"}),
    input={"messages": [{"role": "user", "parts": ["Explain gradient descent."]}]},
)

Kimi works similarly. The library maps kimi-k2 to Kimi's appropriate API model slug, and the reasoning flag triggers the thinking variant. You can also target the thinking model directly — both approaches hit the same endpoint:

from yait_aichain import Model, Skill

# Direct model targeting
skill_a = Skill(
    model=Model("kimi-k2-thinking"),
    input={"messages": [{"role": "user", "parts": ["Find the bug in this code: {code}"]}]},
)

# Reasoning flag on the base model — library resolves to the same endpoint
skill_b = Skill(
    model=Model("kimi-k2", options={"reasoning": "high"}),
    input={"messages": [{"role": "user", "parts": ["Find the bug in this code: {code}"]}]},
)

The Most Expensive Token Is the One You Didn't Need

Before you turn reasoning on everywhere, the cost math matters. Reasoning tokens are billed as output tokens, and they accumulate fast. A single call to OpenAI's o3 at high effort can generate 10–20× the token count of the same task on gpt-4o. Anthropic's extended thinking adds budget_tokens on top of normal output — at 10,000 budget tokens on Claude Sonnet, that's roughly $0.03 extra per call just for the thinking.

Scale that across thousands of requests and the math turns ugly quickly.

Google's technical report on Gemini 2.5 Pro shows meaningful gains on the MATH benchmark when thinking is enabled. But that result applies to math problems. Not every task is a math problem.

A rough breakdown of where reasoning earns its cost and where it doesn't:

Worth the extra tokens:

Competitive programming
Formal proofs
Complex debugging
Multi-constraint optimization
Multi-step logical reasoning

Rarely worth it:

Text summarization
Sentiment analysis
Simple classification
Creative writing
Data formatting
Translation

The pattern is consistent: reasoning shines on tasks with a verifiably correct answer. If there's a right answer and a wrong answer — a proof that holds or doesn't, code that passes tests or fails — reasoning will find it more reliably. For generative or subjective tasks, you're paying a premium for the model to overthink something that doesn't benefit from overthinking. Standard inference is faster, cheaper, and produces output that's just as good.

When to Turn the Dial

Start with no reasoning. Measure quality. If it's failing on correctness — wrong calculations, flawed logic, missed edge cases — bump to "medium" and measure again. Only go to "high" when medium falls short on tasks you can objectively evaluate. This matters because "medium" often captures most of the quality gain at a fraction of the cost of "high".

The practical workflow: define your eval set, run it at each reasoning level, compare quality scores against token costs. The universal parameter makes this straightforward — changing reasoning level is one string, swapping the underlying model is one line. You're comparing apples to apples across providers without rewriting your integration each time.

One practical note: you'll need valid API credentials for each provider set as environment variables before any of this works. The library picks up standard provider key names automatically, but an auth error will stop you before reasoning ever comes into play.

The provider fragmentation will keep growing as each company iterates on its approach. You don't need to track all of it. You need the model to think when thinking helps, and skip it when it doesn't — and one parameter handles that regardless of which provider you're calling.

AIchain Skill: A Prompt as a Reusable Object

YAIT — Thu, 04 Jun 2026 08:29:00 +0000

A prompt buried in an f-string is technical debt. You can't test it. You can't save it to a file. You can't hand it to a colleague and say "here's the extraction logic." The moment you hard-code a prompt into a string, you've welded your application logic to a single, frozen instruction — one that will rot as models improve and requirements shift.

Skill fixes that. It turns a prompt into a first-class object: something you can construct, parameterize, persist, reload, version-control, and swap models underneath — all without changing a single line of business logic.

The Problem with Naked Strings

Consider the typical approach:

response = client.chat(f"What is {topic} in one sentence?")

This line mixes three concerns: the prompt template, the runtime data, and the model that executes it. Change any one of those, and you're editing the same line. Want to test the prompt with different inputs? You rewrite the call. Want to try a cheaper model? You rewire the client. Want a non-developer on your team to review the prompt? You point them at application code.

SQL solved this exact problem decades ago. Write a parameterized query once, bind values at execution time, and the database engine is a separate concern entirely. Skill applies the same separation to prompts.

Skill in Practice

Install the library and its serialization dependency:

pip install yait-aichain pyyaml

Here's the simplest Skill — a one-sentence explainer with a {topic} variable:

import os
from yait_aichain.models import Model
from yait_aichain.skills import Skill

skill = Skill(
    model=Model("claude-sonnet-4-6", api_key=os.getenv("ANTHROPIC_API_KEY")),
    input={
        "messages": [{
            "role": "user",
            "parts": ["What is {topic} in one sentence?"],
        }]
    },
)

result = skill.run(variables={"topic": "machine learning"})
print(result)

Two things to notice. First, {topic} is a placeholder inside the prompt template — not an f-string resolved at definition time. The value gets substituted only when you call .run(variables={...}). Second, the model is declared at construction, not at call time. The Skill knows which model it targets.

That separation matters. The same Skill instance can run with {"topic": "quantum computing"}, then {"topic": "gradient descent"}, without rebuilding anything. The template stays fixed; the data varies. Exactly like a prepared statement.

One Prompt, Three Models

The key differentiator of a Skill: it's built around the model most effective for a specific task — balancing quality, speed, and cost. What one model handled well six months ago, another can do dramatically better today. Skill lets you act on that without rewriting your logic.

Here's what that looks like in code:

import os
from yait_aichain.models import Model
from yait_aichain.skills import Skill

PROMPT = {
    "messages": [{
        "role": "user",
        "parts": ["What is {topic} in one sentence?"],
    }]
}

models = [
    Model("claude-sonnet-4-6", api_key=os.getenv("ANTHROPIC_API_KEY")),
    Model("gpt-4o-mini",       api_key=os.getenv("OPENAI_API_KEY")),
    Model("gemini-2.5-flash",  api_key=os.getenv("GOOGLE_AI_API_KEY")),
]

for model in models:
    skill  = Skill(model=model, input=PROMPT)
    result = skill.run(variables={"topic": "machine learning"})
    print(f"[{model.name}]\n{result}\n")

The prompt template is defined once. The loop swaps only the Model. You get three answers from three providers — Anthropic, OpenAI, Google — with zero changes to the prompt logic. (model.name is a string attribute on Model that returns the identifier you passed at construction, used here purely for labeling output.) When a newer model returns a better answer at a fraction of the cost, you change one string and move on.

Save, Version, Share

The moment a prompt lives in a Python file, it's coupled to that file's deployment cycle. Edit the prompt, redeploy the app. That's fine until you want a prompt engineer — or your future self at 11pm — to iterate on prompts independently.

Skill supports serialization to YAML. Same idea as versioning a migration script: track every change in Git, roll back when new wording degrades output quality.

import os
from yait_aichain.models import Model
from yait_aichain.skills import Skill

YAML_PATH = "validator_skill.yaml"

# Build and save
skill = Skill(
    model=Model("claude-sonnet-4-6", api_key=os.getenv("ANTHROPIC_API_KEY")),
    input={
        "messages": [{
            "role": "user",
            "parts": ["Review the following text and return only corrected version: {text}"],
        }]
    },
)
skill.save(YAML_PATH)

# Later — reload and run
loaded_skill = Skill.load(YAML_PATH, api_key=os.getenv("ANTHROPIC_API_KEY"))
result = loaded_skill.run(variables={"text": "The quick brown fox jump over the lazy dog."})
print(result)

A few details worth highlighting:

API keys never hit disk. skill.save() serializes the model name, prompt template, and configuration — but intentionally strips the API key. You pass it from the environment when you call Skill.load(). No secrets in your repo.
YAML is human-readable. A teammate can open validator_skill.yaml, tweak the prompt wording, and commit it — without touching Python.
The file is an artifact. Store it in a prompts/ directory, tag it v1.2, run automated tests against it in CI. If new wording degrades output quality, git revert and you're done.

Why Objects Beat Strings

When a prompt becomes an object, workflows that strings simply can't support become straightforward:

Testing. Write a unit test that loads a Skill, runs it against five known inputs, and asserts output structure. Run that test on every PR.
Versioning. Store Skills as YAML files in a prompts/ directory. Git tracks every change, who made it, and when — just as it tracks schema migrations.
Model migration. When a newer, faster, or cheaper model appears, update the model name in the YAML file. The prompt template and your application code stay untouched.
Collaboration. A domain expert writes the prompt. A developer writes the pipeline. Neither blocks the other.
Independent deployment. Ship new prompt versions without redeploying application code. Load the latest YAML from a shared path or artifact store.

The API at a Glance

# Construct
Skill(model: Model, input: dict)

# Run with variable substitution
skill.run(variables: dict = {}) -> str

# Persist
skill.save(path: str) -> None

# Restore (api_key is a keyword argument)
Skill.load(path: str, *, api_key: str) -> Skill

Four operations. That's the entire surface area. A Skill does exactly one thing — execute a parameterized prompt against a specific model — and exposes exactly the controls you need to manage it over time.

From String to Strategy

The shift from f-string to Skill is small in code and large in practice. You go from a prompt that's invisible, untestable, and welded to one model — to one that's named, versioned, portable, and model-aware.

Next in this series: Chain — what happens when one Skill's output feeds into another. But it all starts here, with one prompt treated as a real artifact, not a throwaway string.

AIChain!? Why Another LLM Library?

YAIT — Sun, 31 May 2026 09:30:00 +0000

You wrote an OpenAI integration. Then added Anthropic. Then Gemini. Now look at your code — it's three different applications wearing a trench coat pretending to be one.

The Three-SDK Problem

Every major AI provider ships its own SDK. Reasonable enough — until you need to support more than one. Here's what happens in practice.

OpenAI wants messages with content strings. Anthropic wants a separate system parameter and its own message format. Google's Gemini SDK uses generate_content with Part objects. Three providers, three client initializations, three response shapes, three error handling paths.

You end up with code that looks like this (pseudocode, but you've seen the real thing):

if provider == "openai":
    client = OpenAI(api_key=...)
    response = client.chat.completions.create(model=..., messages=...)
    text = response.choices[0].message.content
elif provider == "anthropic":
    client = Anthropic(api_key=...)
    response = client.messages.create(model=..., max_tokens=..., messages=...)
    text = response.content[0].text
elif provider == "google":
    # yet another pattern entirely
    ...

This isn't a hypothetical. This is Tuesday.

And here's the thing people miss: this divergence is intentional. Every provider consciously locks you into their ecosystem. That's business, not coincidence. Different parameter names, different response shapes, different auth patterns — all of it raises the switching cost just enough to keep you where you are.

Models move fast, too. Whoever was ahead six months ago may not be the leader today. Claude overtook GPT-4 on coding benchmarks like SWE-bench, then Gemini 1.5 landed a million-token context window, and suddenly you need to evaluate all three for your use case. But switching means rewriting integration logic from scratch. Every time.

"Just Use LangChain"

The obvious answer. LangChain abstracts providers behind a common interface. Problem solved, right?

Not quite.

Install langchain and watch your dependency tree explode. Running pip install langchain pulls in over 40 transitive packages — langchain-core, langchain-community, langchain-openai, and a constellation of sub-packages. The abstraction layers stack up: Runnables, Chains, OutputParsers, PromptTemplates, each with its own configuration surface.

For a complex agentic system, that overhead might pay for itself. But if you just want to send the same prompt to three models and compare results? You're hauling a shipping container to carry a sandwich.

I tried this path. The project turned into an immovable monster — not because of my code, but because of everything underneath it. Upgrading one sub-package broke three others. Debugging meant reading through abstraction layers I didn't ask for. The library demanded more attention than the actual task.

The best abstraction is the one you don't notice. If you're thinking about the library instead of the problem, something went wrong.

aichain: Change One Line, Leave Everything Else

That failure mode is the specific thing aichain is designed to avoid. Where LangChain builds up, aichain strips down: a thin normalization layer with no abstraction tower to debug and no sprawling dependency graph to maintain.

That's why aichain exists. The pitch is simple: 8 providers, 1 interface, zero lock-in.

Installation note: The package name and the import name differ. Install with pip install aichain, but import from yait_aichain in your code — as shown in all examples below.

Here's a complete working example — a single prompt sent to one model:

import os
from yait_aichain import Model, Skill

skill = Skill(
    model=Model("claude-sonnet-4-6", api_key=os.getenv("ANTHROPIC_API_KEY")),
    input={
        "messages": [{
            "role": "user",
            "parts": ["What is {topic} in one sentence?"],
        }]
    },
)

# Pass the template variable at runtime
result = skill.run(variables={"topic": "machine learning"})
print(result)

Model takes a model name string and figures out the provider automatically. Skill takes a model and a prompt. .run() gives you back a string. No output parsers, no runnable sequences, no callback handlers.

Now here's where it gets interesting. Want to compare three providers? Same prompt, same logic, one-line swap:

import os
from yait_aichain import Model, Skill

# No template variables here — the prompt is fully hardcoded,
# so .run() takes no arguments.
PROMPT = {
    "messages": [{
        "role": "user",
        "parts": ["What is machine learning in one sentence?"],
    }]
}

models = [
    Model("claude-sonnet-4-6", api_key=os.getenv("ANTHROPIC_API_KEY")),
    Model("gpt-4o-mini",       api_key=os.getenv("OPENAI_API_KEY")),
    Model("gemini-2.5-flash",  api_key=os.getenv("GOOGLE_AI_API_KEY")),
]

for model in models:
    skill  = Skill(model=model, input=PROMPT)
    result = skill.run()
    print(f"[{model.name}]\n{result}\n")  # model.name returns the string passed to Model()

Three providers. One prompt definition. Zero conditional logic. The Model("claude-sonnet-4-6") line is the only thing that determines which provider gets called. Swap "claude-sonnet-4-6" for "gpt-4o-mini" or "gemini-2.5-flash" or "grok-3" — the rest of your code doesn't change. Not one line.

What This Actually Means for Your Workflow

Benchmarking

A new model drops. You add one Model() line to your comparison loop and rerun. No integration work.

Cost Optimization

Your Claude bill is climbing. Switch your non-critical paths to gemini-2.5-flash by changing a string. Test it. If quality holds, ship it.

Resilience

Your primary provider goes down. A fallback is one model-name swap away — because your prompt logic, your variable handling, your output processing are already provider-agnostic.

The template variable system ({topic}, {text}, etc.) means your prompts are reusable across models without reformatting. Define once, run everywhere.

The Right Tool for the Job

Model and Skill will carry you surprisingly far. When your requirements grow, the library grows with you — Chain for multi-step pipelines, Pool for parallel execution, Agent for autonomous workflows. Embedding, VectorDB, and Reranker are there when you need them on the data side. You reach for these when the problem demands them, not because the library herds you through them just to send a single prompt.

aichain doesn't have retrieval pipelines or agent frameworks baked into core because most tasks don't need them. It exists because I got tired of rewriting the same integration logic three times with different parameter names. If that sounds familiar, the GitHub repo has runnable examples that take about a minute to get working.