Lynkr

Posted on Jun 30

I Built an LLM Gateway That Extends Claude Pro/Max Users with Azure AI Foundry, Amazon Bedrock, Local Models

#ai #opensource #claudecode #aws

AI coding tools have gotten very good.

But the infrastructure behind them is still weirdly inefficient.

Most tools assume one provider, one lane, one billing path.

That means the same expensive model or subscription ends up handling everything:

reading files
summarizing logs
quick repo questions
multi-file refactors
architecture planning
long debugging sessions

That is the wrong abstraction.

A coding workflow is not one type of problem. So it should not be forced through one type of model path.

That idea is what pushed me to build Lynkr.

Lynkr is an open-source LLM gateway for AI coding tools that lets me combine:

Claude Pro/Max subscription access
Azure AI Foundry-hosted models
Amazon Bedrock-hosted models
and local/free models like Ollama

behind one routing layer.

The problem with single-lane AI coding

If you use a premium coding assistant every day, you have probably seen this already.

A lot of the workload is not actually premium reasoning work.

For example:

"open this file"
"search for auth middleware"
"summarize this module"
"show me where this class is used"
"read these test failures"

These are useful requests, but they are not the same as:

"refactor this subsystem"
"design a safer auth flow"
"debug this multi-step failure"
"trace this agent loop bug"
"rewrite this implementation across five files"

Yet most tools send both classes of work through the same expensive path.

That creates three problems:

1) You waste premium capacity

If a subscription-backed or premium model handles every tiny prompt, you burn good capacity on low-value tasks.

2) You stay locked into one provider

Even if you already have access to Azure, AWS, or local models, your coding workflow is often tied to one vendor path.

3) You lose resilience

If one provider is rate-limited, degraded, or just not the best fit for a task, you have no routing layer to adjust.

The idea behind Lynkr

Lynkr sits between AI coding tools and model providers.

It works as an LLM gateway, which means the coding tool talks to Lynkr, and Lynkr decides what to do next.

That lets the gateway:

route by complexity
compress bulky tool outputs
cache repeated requests
switch providers without changing the client workflow
use different backends for different classes of tasks

The part I am most excited about is hybrid routing across:

Claude Pro/Max
Azure AI Foundry
Amazon Bedrock

What "extending Claude Pro/Max" means

The simplest version looks like this:

simple tasks → local/free model
hard coding tasks → Claude Pro/Max subscription
enterprise workloads → Azure AI Foundry
fallback or alternate routing → Amazon Bedrock

So instead of replacing Claude, Azure, or Bedrock, the gateway combines them.

This is the key idea: extend your Claude Pro/Max usage instead of burning it on everything.

Example workflow

Imagine a coding session that looks like this:

"Read the auth middleware and summarize it."

Route to a cheap local model.
"Search all routes that call this helper."

Still cheap/local.
"Refactor this auth flow to support tenant isolation."

Route to Claude Pro/Max.
"Generate an enterprise-safe variant for our internal stack."

Route to Azure AI Foundry.
"Azure is unavailable or rate-limited."

Fallback to Bedrock.

That is a much more natural way to run coding agents than pretending every prompt deserves the same model path.

Why Claude Pro/Max + Azure + Bedrock is interesting

This combination matters because each lane solves a different problem.

Claude Pro/Max

Great for high-quality coding and reasoning tasks where you already have subscription value.

Azure AI Foundry

Useful when a team wants enterprise-hosted models, internal approvals, or Azure-aligned infrastructure.

Amazon Bedrock

Useful for AWS-native orgs, alternate model access, or fallback when you want another enterprise provider path.

Local models

Useful for cheap, frequent, low-stakes tasks that should not consume premium capacity at all.

Putting these together in one gateway gives you a better operational model than any one of them alone.

Why this matters for coding agents specifically

I think coding is one of the best use cases for an LLM gateway because coding workflows are:

tool-heavy
repetitive
multi-step
full of structured outputs
sensitive to token waste
often spread across many turns

That means a gateway can add value in several ways.

1) Complexity-based routing

Not every prompt deserves the same model.

2) Cost control

Cheap requests stay cheap.

3) Better use of subscriptions

Premium capacity gets reserved for tasks that actually need it.

4) Enterprise compatibility

Teams can use Azure AI Foundry or Bedrock where policy or procurement matters.

5) Resilience

If one provider path fails, the workflow can continue.

Where MCP and agent workflows fit in

Another reason this matters is MCP and agentic tooling.

As coding tools become more agentic, they use more:

tool schemas
file reads
command outputs
structured results
long multi-turn sessions

That creates a lot of overhead and a lot of repeated context.

A gateway is the right place to optimize that.

That is also why I think the future is not just better models.

It is better routing, caching, tool handling, and workload separation around those models.

What I wanted Lynkr to do

I did not want just another OpenAI-compatible endpoint.

I wanted a gateway that could actually help with real coding economics and workflow design.

For me, that means:

keeping the coding tool workflow the same
preserving subscription value
combining subscription + cloud + local lanes
supporting enterprise backends
reducing waste on easy tasks

Who this is for

I think this is especially useful for:

Claude Code users who want more mileage from Pro/Max
teams using Azure AI Foundry for approved enterprise model access
AWS teams already standardizing on Bedrock
developers mixing local models with premium coding assistants
MCP and agent workflow builders who need an LLM gateway

Final thought

I do not think the next big improvement in AI coding comes only from stronger base models.

A lot of value will come from better infrastructure around them:

better routing
better caching
better cost control
better tool handling
better use of multiple model lanes in one workflow

That is the direction I am building toward with Lynkr.

GitHub: https://github.com/Fast-Editor/Lynkr
Ps:- This is fully following Anthropic TOS because lynkr wraps around your existing claude code

DEV Community