YAIT

Posted on May 31

AIChain!? Why Another LLM Library?

#ai #api #llm #python

You wrote an OpenAI integration. Then added Anthropic. Then Gemini. Now look at your code — it's three different applications wearing a trench coat pretending to be one.

The Three-SDK Problem

Every major AI provider ships its own SDK. Reasonable enough — until you need to support more than one. Here's what happens in practice.

OpenAI wants messages with content strings. Anthropic wants a separate system parameter and its own message format. Google's Gemini SDK uses generate_content with Part objects. Three providers, three client initializations, three response shapes, three error handling paths.

You end up with code that looks like this (pseudocode, but you've seen the real thing):

if provider == "openai":
    client = OpenAI(api_key=...)
    response = client.chat.completions.create(model=..., messages=...)
    text = response.choices[0].message.content
elif provider == "anthropic":
    client = Anthropic(api_key=...)
    response = client.messages.create(model=..., max_tokens=..., messages=...)
    text = response.content[0].text
elif provider == "google":
    # yet another pattern entirely
    ...

This isn't a hypothetical. This is Tuesday.

And here's the thing people miss: this divergence is intentional. Every provider consciously locks you into their ecosystem. That's business, not coincidence. Different parameter names, different response shapes, different auth patterns — all of it raises the switching cost just enough to keep you where you are.

Models move fast, too. Whoever was ahead six months ago may not be the leader today. Claude overtook GPT-4 on coding benchmarks like SWE-bench, then Gemini 1.5 landed a million-token context window, and suddenly you need to evaluate all three for your use case. But switching means rewriting integration logic from scratch. Every time.

"Just Use LangChain"

The obvious answer. LangChain abstracts providers behind a common interface. Problem solved, right?

Not quite.

Install langchain and watch your dependency tree explode. Running pip install langchain pulls in over 40 transitive packages — langchain-core, langchain-community, langchain-openai, and a constellation of sub-packages. The abstraction layers stack up: Runnables, Chains, OutputParsers, PromptTemplates, each with its own configuration surface.

For a complex agentic system, that overhead might pay for itself. But if you just want to send the same prompt to three models and compare results? You're hauling a shipping container to carry a sandwich.

I tried this path. The project turned into an immovable monster — not because of my code, but because of everything underneath it. Upgrading one sub-package broke three others. Debugging meant reading through abstraction layers I didn't ask for. The library demanded more attention than the actual task.

The best abstraction is the one you don't notice. If you're thinking about the library instead of the problem, something went wrong.

aichain: Change One Line, Leave Everything Else

That failure mode is the specific thing aichain is designed to avoid. Where LangChain builds up, aichain strips down: a thin normalization layer with no abstraction tower to debug and no sprawling dependency graph to maintain.

That's why aichain exists. The pitch is simple: 8 providers, 1 interface, zero lock-in.

Installation note: The package name and the import name differ. Install with pip install aichain, but import from yait_aichain in your code — as shown in all examples below.

Here's a complete working example — a single prompt sent to one model:

import os
from yait_aichain import Model, Skill

skill = Skill(
    model=Model("claude-sonnet-4-6", api_key=os.getenv("ANTHROPIC_API_KEY")),
    input={
        "messages": [{
            "role": "user",
            "parts": ["What is {topic} in one sentence?"],
        }]
    },
)

# Pass the template variable at runtime
result = skill.run(variables={"topic": "machine learning"})
print(result)

Model takes a model name string and figures out the provider automatically. Skill takes a model and a prompt. .run() gives you back a string. No output parsers, no runnable sequences, no callback handlers.

Now here's where it gets interesting. Want to compare three providers? Same prompt, same logic, one-line swap:

import os
from yait_aichain import Model, Skill

# No template variables here — the prompt is fully hardcoded,
# so .run() takes no arguments.
PROMPT = {
    "messages": [{
        "role": "user",
        "parts": ["What is machine learning in one sentence?"],
    }]
}

models = [
    Model("claude-sonnet-4-6", api_key=os.getenv("ANTHROPIC_API_KEY")),
    Model("gpt-4o-mini",       api_key=os.getenv("OPENAI_API_KEY")),
    Model("gemini-2.5-flash",  api_key=os.getenv("GOOGLE_AI_API_KEY")),
]

for model in models:
    skill  = Skill(model=model, input=PROMPT)
    result = skill.run()
    print(f"[{model.name}]\n{result}\n")  # model.name returns the string passed to Model()

Three providers. One prompt definition. Zero conditional logic. The Model("claude-sonnet-4-6") line is the only thing that determines which provider gets called. Swap "claude-sonnet-4-6" for "gpt-4o-mini" or "gemini-2.5-flash" or "grok-3" — the rest of your code doesn't change. Not one line.

What This Actually Means for Your Workflow

Benchmarking

A new model drops. You add one Model() line to your comparison loop and rerun. No integration work.

Cost Optimization

Your Claude bill is climbing. Switch your non-critical paths to gemini-2.5-flash by changing a string. Test it. If quality holds, ship it.

Resilience

Your primary provider goes down. A fallback is one model-name swap away — because your prompt logic, your variable handling, your output processing are already provider-agnostic.

The template variable system ({topic}, {text}, etc.) means your prompts are reusable across models without reformatting. Define once, run everywhere.

The Right Tool for the Job

Model and Skill will carry you surprisingly far. When your requirements grow, the library grows with you — Chain for multi-step pipelines, Pool for parallel execution, Agent for autonomous workflows. Embedding, VectorDB, and Reranker are there when you need them on the data side. You reach for these when the problem demands them, not because the library herds you through them just to send a single prompt.

aichain doesn't have retrieval pipelines or agent frameworks baked into core because most tasks don't need them. It exists because I got tired of rewriting the same integration logic three times with different parameter names. If that sounds familiar, the GitHub repo has runnable examples that take about a minute to get working.

Top comments (5)

Self-Correcting Systems • Jun 1

The "change one line, leave everything else" pitch is exactly right at the provider
abstraction layer. The harder question is one layer up — in the agent's memory and
instruction store that sits on top of the model.

When you swap claude-sonnet-4-6 for gpt-4o-mini, your Skill and Chain logic stays the
same. But if your agent carries persistent memory — instructions, policies, preferences
— the authority structure of that memory may not transfer cleanly. An instruction that
Claude implicitly treats as a hard constraint may need to be stated differently to
carry the same weight with GPT-4o. The abstraction handles provider differences at the
API level. Memory authority lives above the API.

I've been running experiments on this in agent memory stores: retrieval systems that
optimize for query relevance rather than for what a memory is authorized to govern. The
finding is that provider-swapping at the model layer is not fully decoupled from the
memory layer, because different models weight retrieved instructions differently.

That is not a criticism of aichain — it is a problem that lives above what any
provider-abstraction library should own. But it is a real source of behavior divergence
when you swap models.

To add one criterion to Harjot's list: does the library expose what was retrieved and
why, not just what was called and returned? Inspectable control flow is the right goal.
Inspectable memory retrieval is the harder version of the same problem.

YAIT • Jun 1

Oh, yeah! I completely agree and not entirely disagree. Of course, you can't just swap Claude for GPT, and vice versa. But! If GPT was the best yesterday, and Claude is now the best—and cheaper, too—I want to be able to compare quality, memory, patterns, and so on. And if this doesn't break my pipeline, product, or agent—it makes it better and cheaper—then I'd like to change one line.

And secondly, if I want to find a model specifically for my task, and not just a skill that works differently with different models—that's exactly what the library does. It stores settings for the models where the skill has proven itself best.

Self-Correcting Systems • Jun 2

Totally fair. I think the key distinction is between model portability and behavior
portability.

Changing one line to swap GPT/Claude/Gemini is valuable if the surrounding pipeline can
prove the behavior still holds. I’m not against that. I actually think that’s the right
direction.

Where I get cautious is when “the skill works on Model A” quietly becomes “the skill is
model-independent.” In practice, models differ in memory behavior, refusal style, tool-
use habits, formatting discipline, latency, cost, and how they handle ambiguous
authority. So the library storing model-specific settings makes sense to me, especially
if those settings are backed by task-level evidence.

The strongest version, in my view, is:

skill definition
model-specific config
evaluation trace for that skill/model pair
known failure modes
cost/latency profile
easy swap only after the replacement passes the same task checks

Then swapping one line is not blind abstraction. It’s evidence-backed routing. That’s the
part I’m most interested in.

Harjot Singh • May 31

Why another LLM library is the honest question to lead with, and the answer that justifies a new one is usually a sharper opinion, not more features. The crowded libraries (LangChain et al) try to do everything and end up as leaky abstractions that hide the decisions that actually matter, so a new library earns its place by taking a strong stance: thin where you want control, opinionated where the right default is clear, and transparent so you can see what it's doing rather than fighting a black box. The thing I'd want to know about aichain is what it refuses to do, because in this space restraint is the feature, a library that exposes the loop, the tool calls, and the context plainly beats one that buries them under magic, since when an agent misbehaves you need to reach the wiring. The two design choices I'd judge it on: does it keep the control flow inspectable and in my hands (vs hidden in a framework's runtime), and does it bake in the reliability primitives (bounded loops, validation hooks, retries) as defaults rather than leaving them to me. Be opinionated about the right defaults, transparent about the mechanics. That thin-and-inspectable-beats-magic instinct is core to how I think about agent tooling in Moonshift. What's aichain's core opinion, the thing it deliberately does differently from the existing libraries?

YAIT • Jun 1

A primitive is a sentence, not a system. Model, Skill, Chain, Pool, Agent — each is one Python object you can hold in your head. No DAGs, no executors, no graph compilers, no runtime engines. The control flow is your Python code.

What it refuses to do:

No hidden runtime. When you call chain.run() you get a for-loop calling skill.run() calling model.client._post(). You can step through it in a debugger. There is no scheduler, no message bus, no event loop you didn't write.
No magic prompt rewriting. Your prompt is your prompt. The library doesn't inject system instructions, doesn't reformat your messages "for the provider," doesn't add tool-use scaffolding behind your back.
No string-typed abstractions over the model APIs. When OpenAI calls it reasoning_effort and Anthropic calls it budget_tokens, aichain maps reasoning="high" to both — but the translation is in a 20-line table you can read, not a plugin system.
No "memory" primitive. Memory is whatever Python data structure you want. Lists, dicts, your VectorDB, a file. The library doesn't get an opinion on your state.
No DSL. No YAML pipeline DSL, no graph-builder API. YAML save/load is for persistence of an already-built object, not for declaring pipelines.

On the two design choices you'd judge it on:

Does it keep the control flow inspectable and in my hands?

Yes - because there is no separate control flow. Chain.run() is 50 lines of Python; Pool.run() is a ThreadPoolExecutor you could write yourself; Agent.run() is a for step in range(max_steps): plan → act → reflect. The composition is the source code.

Does it bake in reliability primitives as defaults?

Yes, where the right default is clear (bounded loops, typed errors, retries on transient HTTP failures), and opt-in where the right default depends on your taste (validators, custom retry policies, model fallback chains). The line is "make the safe choice automatic, but never silently."

The one-sentence version:

aichain is a thin set of composable primitives that lets you read the for-loop. The library's job is to translate between providers and provide reliable defaults – your code is the pipeline.

If you want a runtime that owns the control flow, use LangGraph or CrewAI. If you want a library that gets out of your way, aichain.