DEV Community: Vishal VeeraReddy

One npm Install That Makes Every AI Coding Tool Work With Every LLM Provider

Vishal VeeraReddy — Thu, 16 Apr 2026 01:27:01 +0000

Quick question: how many API keys are in your .env right now just for AI coding tools?

If you use Claude Code (Anthropic key), Codex (OpenAI key), and Cursor (another OpenAI key) — that's three providers, three billing accounts, three rate limit systems, zero flexibility.

I built Lynkr to collapse all of that into one proxy.

What It Does

Claude Code ──┐
Codex CLI ────┤
Cursor ───────┤──→ Lynkr (localhost:8081) ──→ Any LLM Provider
Cline ────────┤
Continue ─────┤
LangChain ────┤
Vercel AI ────┘

Lynkr auto-detects which tool is connecting and speaks its language:

Anthropic Messages API for Claude Code
OpenAI Responses API for Codex CLI
OpenAI Chat Completions for everything else (Cursor, Cline, Continue, KiloCode, LangChain, Vercel AI SDK, any OpenAI-compatible client)

Setup

npm install -g lynkr
lynkr start

Then configure each tool to point at Lynkr:

# Claude Code
export ANTHROPIC_BASE_URL=http://localhost:8081

# Codex CLI (~/.codex/config.toml)
# base_url = "http://localhost:8081/v1"

# Cursor
# Settings → Models → Base URL: http://localhost:8081/v1

# LangChain
# ChatOpenAI(base_url="http://localhost:8081/v1", api_key="sk-lynkr")

# Literally any OpenAI-compatible tool
# OPENAI_BASE_URL=http://localhost:8081/v1

All of them hit the same Lynkr instance. Same provider. Same routing. Same optimization.

12+ Backends

Pick your provider:

# Free (local)
MODEL_PROVIDER=ollama

# Cheap cloud
MODEL_PROVIDER=openrouter    # 100+ models
MODEL_PROVIDER=deepseek      # 1/10 Anthropic cost
MODEL_PROVIDER=zai           # 1/7 Anthropic cost

# Enterprise cloud
MODEL_PROVIDER=bedrock       # AWS, 100+ models
MODEL_PROVIDER=vertex        # Google, Gemini 2.5
MODEL_PROVIDER=databricks    # Claude Opus 4.6

Or mix them across complexity tiers:

TIER_SIMPLE=ollama:qwen2.5-coder
TIER_MEDIUM=openrouter:deepseek-r1
TIER_COMPLEX=databricks:claude-sonnet-4-5
TIER_REASONING=vertex:gemini-2.5-pro

Simple requests (rename a variable) → free local model.
Complex requests (refactor auth across 23 files) → top-tier cloud model.

The routing engine makes this decision automatically using 5-phase complexity analysis — including Graphify, which reads your actual codebase AST across 19 languages to detect high-risk changes.

For Agent Builders: LangChain, CrewAI, AutoGen

This is where Lynkr shines for automation. If you're building agents that make hundreds of LLM calls per pipeline run, most of those calls are simple (read a file, parse JSON, format output). Only a few require deep reasoning.

Without Lynkr: every call hits GPT-4o at $15/MTok. 200 calls × $0.03 = $6/run.

With Lynkr: 140 calls hit free Ollama, 40 hit OpenRouter ($0.005 each), 20 hit Databricks ($0.02 each). Total: $0.60/run. 90% savings.

# Nothing changes in your agent code
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="http://localhost:8081/v1",
    api_key="sk-lynkr",
    model="auto"  # Lynkr routes based on complexity
)

# Your existing chains, agents, and tools work unchanged
agent_executor.invoke({"input": "Refactor the payment module"})

Token Compression Stack

On top of routing, every request passes through 7 optimization phases:

Smart tool selection — only relevant tools sent
Code Mode — 100+ tool defs → 4 meta-tools (96% reduction, saves 16,800 tokens/request)
Distill — delta rendering via Jaccard similarity (60-80% savings)
Prompt cache — SHA-256 keyed LRU
Memory dedup — removes repeated context across turns
History compression — sliding window with structural dedup
Headroom sidecar — optional ML compression (47-92%)

Enterprise: Circuit Breakers, Telemetry, Hot-Reload

For teams running this in production:

# Health check
curl http://localhost:8081/health

# List all providers and models
curl http://localhost:8081/v1/providers
curl http://localhost:8081/v1/models

# Routing analytics
curl http://localhost:8081/v1/routing/stats
curl http://localhost:8081/v1/routing/accuracy

# Change config without restart
curl -X POST http://localhost:8081/v1/admin/reload

# Prometheus metrics
curl http://localhost:8081/metrics

Circuit breakers auto-detect provider failures. After 5 failed requests, incoming calls fail instantly instead of timing out. Half-open probes test recovery every 60 seconds. When 2 probes succeed, traffic resumes. No manual intervention.

Get Started

npm install -g lynkr && lynkr start

699 tests. Apache 2.0. Node.js only. Zero infrastructure.

GitHub: github.com/Fast-Editor/Lynkr

If you're managing multiple AI coding tools or building LLM-powered agents, Lynkr consolidates everything into one proxy with intelligent routing and real cost savings.

Star it if it helps. PRs welcome.

Run OpenClaw/Clawdbot for FREE with Lynkr (No API Bills)

Vishal VeeraReddy — Sun, 01 Feb 2026 02:00:41 +0000

Your personal AI assistant running 24/7 — without burning through API credits

If you've tried OpenClaw (also known as Clawdbot), you know it's incredible. An AI assistant that lives in WhatsApp/Telegram, manages your calendar, clears your inbox, checks you in for flights — all while you chat naturally.

But there's a catch: it needs an LLM backend, and Anthropic API bills add up fast.

What if I told you that you can run OpenClaw completely free using local models? Enter Lynkr.

🔗 What is Lynkr?

Lynkr is a universal LLM proxy that lets you route OpenClaw requests to any model provider — including free local models via Ollama.

The magic? OpenClaw thinks it's talking to Anthropic, but Lynkr transparently routes requests to your local GPU instead.

💡 Why This Matters

Problem with direct Anthropic API:

💸 Bills explode quickly (OpenClaw runs 24/7)
⚠️ Potential ToS concerns with automated assistants
🔒 Your data goes to external servers

With Lynkr + Ollama:

✅ $0/month — runs entirely on your machine
✅ ToS compliant — no API abuse concerns
✅ 100% private — data never leaves your computer
✅ Smart fallback — route to cloud only when needed

🚀 Setup Guide (15 minutes)

Step 1: Install Ollama

# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh

# Pull Kimi K2.5 (recommended for coding/assistant tasks)
ollama pull kimi-k2.5

# Also grab an embeddings model for semantic search
ollama pull nomic-embed-text

Step 2: Install Lynkr

# Option A: NPM (recommended)
npm install -g lynkr

# Option B: Clone repo
git clone https://github.com/Fast-Editor/Lynkr.git
cd Lynkr
npm install

Step 3: Configure Lynkr

Create your .env file:

# Copy example config
cp .env.example .env

Edit .env with these settings:

# Primary provider: Ollama (FREE, local)
MODEL_PROVIDER=ollama
OLLAMA_MODEL=kimi-k2.5
OLLAMA_ENDPOINT=http://localhost:11434

# Enable hybrid routing (local first, cloud fallback)
PREFER_OLLAMA=true
OLLAMA_MAX_TOOLS_FOR_ROUTING=3

# Fallback provider (optional - for complex requests)
FALLBACK_ENABLED=true
FALLBACK_PROVIDER=openrouter
OPENROUTER_API_KEY=sk-or-v1-your-key  # Only needed if using fallback

# Embeddings for semantic search
OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text

# Token optimization (60-80% cost reduction on cloud fallback)
TOKEN_TRACKING_ENABLED=true
TOOL_TRUNCATION_ENABLED=true
HISTORY_COMPRESSION_ENABLED=true

Step 4: Start Lynkr

# If installed via npm
lynkr

# If cloned repo
npm start

You should see:

🚀 Lynkr proxy running on http://localhost:8081
📊 Provider: ollama (kimi-k2.5)

Step 5: Configure OpenClaw/Clawdbot

In your OpenClaw configuration, set:

Setting	Value
Model/auth provider	`Copilot`
Copilot auth method	`Copilot Proxy (local)`
Copilot Proxy base URL	`http://localhost:8081/v1`
Model ID	`kimi-k2.5`

That's it! Your OpenClaw now runs through Lynkr → Ollama → Kimi K2.5, completely free.

⚡ How Hierarchical Routing Works

The killer feature is smart routing:

OpenClaw Request
       ↓
   Is it simple?
    /        \
  Yes         No
   ↓           ↓
Ollama     Cloud Fallback
(FREE)     (with caching)

Lynkr analyzes each request:

Simple requests (< 3 tools) → Ollama (free)
Complex requests → Cloud fallback (with heavy caching/compression)

This means even if you enable cloud fallback, you'll use it sparingly.

💰 Cost Comparison

Setup	Monthly Cost	Privacy
Direct Anthropic API	$100-300+	❌ Cloud
Lynkr + Ollama only	$0	✅ 100% Local
Lynkr + Hybrid routing	~$5-15	✅ Mostly Local

🔒 Why This is ToS-Safe

Running OpenClaw directly against Anthropic's API at scale can raise ToS concerns (automated usage, high volume, etc.).

With Lynkr:

Local models = no external API terms apply
Your hardware = your rules
Fallback is minimal = within normal usage patterns

🧠 Advanced: Memory & Compression

Lynkr includes enterprise features that further reduce costs:

Long-Term Memory (Titans-inspired):

MEMORY_ENABLED=true
MEMORY_RETRIEVAL_LIMIT=5
MEMORY_SURPRISE_THRESHOLD=0.3

Headroom Compression (47-92% token reduction):

HEADROOM_ENABLED=true
HEADROOM_SMART_CRUSHER=true
HEADROOM_CACHE_ALIGNER=true

These features mean even when you hit cloud fallback, you're using far fewer tokens.

🎯 Recommended Models

Use Case	Ollama Model	Pull Command
General Assistant	kimi-k2.5	`ollama pull kimi-k2.5`
Coding Tasks	qwen2.5-coder:latest	`ollama pull qwen2.5-coder:latest`
Fast/Light	llama3.2:3b	`ollama pull llama3.2:3b`
Embeddings	nomic-embed-text	`ollama pull nomic-embed-text`

🏃 TL;DR

# Install
curl -fsSL https://ollama.com/install.sh | sh
ollama pull kimi-k2.5
npm install -g lynkr

# Configure (.env)
MODEL_PROVIDER=ollama
OLLAMA_MODEL=kimi-k2.5
PREFER_OLLAMA=true

# Run
lynkr

# Point OpenClaw to http://localhost:8081/v1

Result: OpenClaw running 24/7, $0/month, 100% private.

How I Cut My AI Coding Tool Costs by 70% (And You Can Too)

Vishal VeeraReddy — Sun, 01 Feb 2026 01:45:11 +0000

Run Cursor, Claude Code, Cline, and more on ANY LLM — including free local models

If you're like me, you've probably fallen in love with AI coding assistants. Tools like Cursor, Claude Code CLI, Cline, and OpenClaw/Clawdbot have genuinely transformed how I write code. But there's a catch — they're expensive.

Between API costs and subscription fees, I was burning through $100-300/month just on AI coding tools. That's when I built Lynkr.

🔗 What is Lynkr?

Lynkr is an open-source universal LLM proxy that lets you run your favorite AI coding tools on any model provider — including completely free local models via Ollama.

Think of it as a universal adapter. Your tools think they're talking to their native API, but Lynkr transparently routes requests to whatever backend you choose.

💡 The Problem Lynkr Solves

Here's what frustrates developers:

Vendor lock-in — Cursor only works with OpenAI/Anthropic. Claude Code CLI only works with Anthropic.
Expensive APIs — Claude API costs add up fast, especially for heavy coding sessions
No local option — Want to use your RTX 4090 for coding assistance? Too bad.
Enterprise restrictions — Many companies can't send code to external APIs

Lynkr fixes all of this.

🏗️ How It Works

┌─────────────┐     ┌─────────┐     ┌──────────────────┐
│ Cursor      │     │         │     │ Ollama (local)   │
│ Claude Code │────▶│  Lynkr  │────▶│ AWS Bedrock      │
│ Cline       │     │  Proxy  │     │ Azure OpenAI     │
│ OpenClaw    │     │         │     │ OpenRouter       │
└─────────────┘     └─────────┘     └──────────────────┘

Lynkr acts as a drop-in replacement for the Anthropic API. It:

Receives requests from your AI coding tool
Translates them to your target provider's format
Streams responses back seamlessly

Your tools don't know the difference.

🚀 Supported Providers

Lynkr supports 12+ providers:

Ollama - 100% local, FREE
AWS Bedrock - Enterprise-grade, ~60% cheaper
Azure OpenAI - Enterprise-grade
Azure Anthropic - Claude on Azure
OpenRouter - 100+ models via single API
OpenAI - Direct GPT access
Google Vertex AI - Gemini models
Databricks - Enterprise ML platform
Z.AI (Zhipu) - ~1/7 cost of Anthropic
LM Studio - Local models with GUI
llama.cpp - Local GGUF models

📦 Quick Start (5 minutes)

Option 1: Run locally with Ollama (FREE)

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a coding model
ollama pull qwen2.5-coder:latest

# Clone and configure Lynkr
git clone https://github.com/Fast-Editor/Lynkr.git
cd Lynkr
cp .env.example .env

# Edit .env:
MODEL_PROVIDER=ollama
OLLAMA_MODEL=qwen2.5-coder:latest
OLLAMA_ENDPOINT=http://localhost:11434

# Start
npm install && npm start

Option 2: Use with AWS Bedrock

# Clone and configure
git clone https://github.com/Fast-Editor/Lynkr.git
cd Lynkr
cp .env.example .env

# Edit .env:
MODEL_PROVIDER=bedrock
AWS_BEDROCK_API_KEY=your-bedrock-api-key
AWS_BEDROCK_REGION=us-east-1
AWS_BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20241022-v2:0

# Start
npm install && npm start

Option 3: OpenRouter (Simplest Cloud Setup)

# Edit .env:
MODEL_PROVIDER=openrouter
OPENROUTER_API_KEY=sk-or-v1-your-key
OPENROUTER_MODEL=anthropic/claude-3.5-sonnet

npm start

Configure Your Tool

Point your AI coding tool to Lynkr:

# For Claude Code CLI
export ANTHROPIC_API_KEY=dummy
export ANTHROPIC_BASE_URL=http://localhost:8081

# Now use Claude Code normally!
claude "Refactor this function"

💰 Real Cost Comparison

Here's what I was spending vs. what I spend now:

Tool	Before (Direct API)	After (Lynkr + Bedrock)	Savings
Claude Code CLI	$150/month	$45/month	70%
Heavy Cursor usage	$100/month	$30/month	70%
With Ollama	-	$0/month	100%

The local Ollama option is genuinely free. If you have a decent GPU (RTX 3080+), models like qwen2.5-coder run surprisingly well.

🔒 Enterprise Use Cases

Lynkr shines in enterprise environments:

Air-gapped networks: Run entirely local with Ollama
Compliance: Keep code on AWS/Azure infrastructure you control
Cost control: Set usage limits and track spending per team
Audit trails: Log all requests for compliance

⚡ Advanced Features

Hybrid Routing: Use Ollama for simple requests, fallback to cloud for complex ones
Token Optimization: 60-80% cost reduction through smart compression
Long-Term Memory: Titans-inspired memory system for context persistence
Headroom Compression: 47-92% token reduction via intelligent context compression
Hot Reload: Config changes apply without restart
Smart Tool Selection: Automatic tool filtering to reduce token usage

🤝 Contributing

Lynkr is open source (MIT license). Contributions welcome:

🐛 Bug reports and fixes
🔌 New provider integrations
📖 Documentation improvements
⭐ Stars on GitHub!

Try It Today

Stop overpaying for AI coding tools. With Lynkr, you can:

Save 60-80% using AWS Bedrock or Azure
Pay nothing using local Ollama models
Keep code private in enterprise environments

⭐ Star on GitHub: github.com/Fast-Editor/Lynkr

📚 Full Documentation: deepwiki.com/Fast-Editor/Lynkr

What AI coding tools do you use? Have you tried running them locally? Let me know in the comments!

I Slashed My AI Coding Bills by 65% With This One Weird Trick.

Vishal VeeraReddy — Wed, 31 Dec 2025 05:57:34 +0000

The Problem Every Dev Using AI Assistants Faces.You know that moment when you're using Claude Code CLI, crushing it with AI-powered coding, and then you check your Anthropic bill at the end of the month?
Yeah. $347 for me last month. 😱
And here's the kicker: 65% of my requests were literally just "write a hello world function" or "explain this error message" - stuff that could easily run on my laptop.
I was paying premium API rates for queries that a local 7B model could handle in 300ms.
So I did what any reasonable developer would do: I spent a weekend building a solution that now saves me hundreds of dollars monthly.
Meet Lynkr: The Claude Code "Jailbreak" Nobody Asked For
Lynkr is a self-hosted proxy that sits between Claude Code CLI and... well, literally any LLM backend you want.
Databricks? ✅
Azure? ✅
OpenRouter with 100+ models? ✅
Local Ollama models that cost $0 per request? ✅✅✅
llama.cpp with your own GGUF quantized models? ✅✅✅✅
But here's where it gets interesting...
The 3-Tier Routing System That Changed Everything
Instead of sending every single request to expensive cloud APIs, Lynkr automatically routes based on complexity:

🏎️

Tier 1: Local/Free (0-2 tools needed)

Ollama or llama.cpp running on your machine
Response time: 100-500ms
Cost: $0.00
Handles: "explain this code", "write a function", "fix this bug"

💰 Tier 2: Mid-Tier Cloud (3-14 tools)

OpenRouter with GPT-4o-mini ($0.15 per 1M tokens)
Response time: 300-1500ms
Cost: ~$0.0002 per request
Handles: Multi-file refactoring, moderate complexity

🏢 Tier 3: Enterprise (15+ tools)

Databricks or Azure Anthropic (Claude Opus/Sonnet)
Response time: 500-2500ms
Cost: Standard API rates
Handles: Complex analysis, heavy workflows

The proxy automatically decides which tier to use. No configuration. No manual routing. It just works.
The Results Speak For Themselves

Here's what happened after I switched:

Metric	Before Lynkr	After Lynkr	Improvement
Avg Response Time	1500-2500ms	400-800ms	70% faster
Monthly API Bill	$347	$122	65% cheaper
Local Request %	0%	68%	$0 cost on 68% of requests
Downtime Impact	100% blocked	0% (fallback works)	∞% more reliable

That's not a typo. I'm getting 70% faster responses while spending 65% less money.
Automatic Fallback = Zero Downtime

The killer feature nobody talks about: if your local Ollama server crashes (mine does, frequently), Lynkr automatically falls back to the next tier.

Request → Try Ollama → [Connection Refused]
       → Try OpenRouter → [Rate Limited]  
       → Try Databricks → ✅ Success

MCP Server Integration (Because Why Not)

Want to integrate GitHub, Jira, Slack, or literally any other tool via Model Context Protocol?
Just drop a manifest file in ~/.claude/mcp and Lynkr automatically:

Discovers it
Launches the MCP server
Exposes the tools to your AI assistant
Sandboxes it in Docker (optional but recommended)

Production-Ready From Day One

I learned from my mistakes. This isn't a weekend hack held together with duct tape:

✅ Circuit breakers (no cascading failures)
✅ Load shedding (503s when overloaded, not crashes)
✅ Prometheus metrics api(because you can't improve what you don't measure)
✅ Kubernetes health checks (liveness + readiness probes)
✅ Graceful shutdown (zero-downtime deployments)
✅ Request ID correlation (debug production issues in seconds)

Quick Install (curl)

curl -fsSL https://raw.githubusercontent.com/vishalveerareddy123/Lynkr/main/install.sh | bash

For .env

Template 1: Databricks Only (Simple)
bash# .env
MODEL_PROVIDER=databricks
DATABRICKS_API_BASE=https://your-workspace.cloud.databricks.com
DATABRICKS_API_KEY=dapi1234567890abcdef
DATABRICKS_ENDPOINT_PATH=/serving-endpoints/databricks-claude-sonnet-4-5/invocations

PORT=8080
WORKSPACE_ROOT=/path/to/your/project
PROMPT_CACHE_ENABLED=true

Template 2: Ollama Only (100% Local)
bash# .env
MODEL_PROVIDER=ollama
OLLAMA_ENDPOINT=http://localhost:11434
OLLAMA_MODEL=qwen2.5-coder:latest
OLLAMA_TIMEOUT_MS=120000

PORT=8080
WORKSPACE_ROOT=/path/to/your/project
PROMPT_CACHE_ENABLED=true

Template 3: Hybrid Routing (Cost Optimized)
bash# .env
MODEL_PROVIDER=databricks
PREFER_OLLAMA=true
FALLBACK_ENABLED=true

# Ollama (Free Tier)
OLLAMA_ENDPOINT=http://localhost:11434
OLLAMA_MODEL=qwen2.5-coder:latest
OLLAMA_MAX_TOOLS_FOR_ROUTING=3

# OpenRouter (Mid Tier)
OPENROUTER_API_KEY=sk-or-v1-your-key-here
OPENROUTER_MODEL=openai/gpt-4o-mini
OPENROUTER_MAX_TOOLS_FOR_ROUTING=15

# Databricks (Heavy Tier)
DATABRICKS_API_BASE=https://your-workspace.cloud.databricks.com
DATABRICKS_API_KEY=dapi1234567890abcdef

PORT=8080
WORKSPACE_ROOT=/path/to/your/project

That's it. You're now running Claude Code CLI with:

Real-World Use Cases (AKA "Will This Actually Help Me?")

For Indie Developers

Use free Ollama models for 90% of your work. Only pay for complex tasks. Your $347/month bill becomes $35/month.

For Enterprise Teams

Route simple queries to on-premise llama.cpp servers. Complex queries go to your Databricks workspace. Data never leaves your network for simple requests.

For AI Researchers

Test your own fine-tuned models with Claude Code CLI. Compare them side-by-side with GPT-4, Claude, Gemini via OpenRouter.

For Privacy-Conscious Devs

Run Ollama or llama.cpp locally. Code never leaves your machine unless you explicitly need cloud capabilities.

The Part Where I Show You The Code

Okay fine, here's how the hybrid routing actually works under the hood:

javascript// Simplified version - actual code has more checks
async function routeRequest(request) {
  const toolCount = request.tools?.length || 0;

  // Tier 1: Local/Free (0-2 tools)
  if (toolCount <= 2 && config.PREFER_OLLAMA) {
    try {
      return await ollamaClient.send(request);
    } catch (err) {
      logger.warn('Ollama failed, falling back to cloud');
      // Fallback to next tier...
    }
  }

  // Tier 2: Mid-Tier (3-14 tools)
  if (toolCount <= 14 && config.OPENROUTER_API_KEY) {
    try {
      return await openRouterClient.send(request);
    } catch (err) {
      logger.warn('OpenRouter failed, falling back to Databricks');
      // Fallback to next tier...
    }
  }

  // Tier 3: Enterprise (15+ tools)
  return await databricksClient.send(request);
}

The circuit breaker wraps each client, so after 5 consecutive failures, requests fail fast (100ms instead of 30s timeout).

Models That Actually Work Well

Through extensive testing, here's what actually performs:

For Ollama (Local):

qwen2.5-coder:7b - Best for code generation
llama3.1:8b - Best for general tasks
mistral:7b - Fastest responses

For OpenRouter (Mid-Tier):

openai/gpt-5.1 - Best value ($0.15/1M tokens)
meta-llama/llama-3.1-8b-instruct:free - Actually free (rate limited)

For llama.cpp (Maximum Control):

Any GGUF model works
I use Qwen2.5-Coder-7B-Instruct-Q5_K_M.gguf
Point to your llama.cpp server's OpenAI-compatible endpoint

The Catches (Because Nothing's Perfect)

Ollama doesn't support all Claude features

No extended thinking mode
No prompt caching (Lynkr adds its own though)
Tool calling works but varies by model

You need to run local inference

Ollama = ~8GB RAM for 7B models
llama.cpp = ~6GB RAM with quantization
Not great for 4GB laptops

Initial setup requires some config

Environment variables for API keys
Workspace paths
Model selection

But the wizard handles 90% of this automatically.

Get Started Now

GitHub: https://github.com/Fast-Editor/Lynkr
Docs: fast-editor.github.io/Lynkr/
npm: npm install -g lynkr
Apache licensed. PRs welcome. Built with Node.js, SQLite, and determination.

The Future Roadmap

Things I'm working on:

[ ] Response caching layer (Redis-backed)
[ ] Per-file diff comments (like Claude's review UX)
[ ] Better LSP integration for more languages
[ ] Claude Skills compatibility layer
[ ] Historical metrics dashboard

Final Thoughts

Look, I'm not saying Anthropic's hosted service is bad. It's excellent. But for developers who want:

Control over their infrastructure
Cost optimization
Privacy for simple queries
Custom model integration

Lynkr gives you all of that while keeping the Claude Code CLI experience you already love.

Try it for a week. Track your costs. I bet you'll see similar savings.

And if you don't? Well, it's open source. Make it better and send a PR. 😉

Questions? Comments? Roasts? Drop them below. I'll answer everything except "why did you waste a weekend on this" (because I saved $225 already).

⭐ Star the repo if you found this useful: https://github.com/Fast-Editor/Lynkr

Emulating the Claude Code Backend for Databricks LLM Models(with MCP, Git Tools, and Prompt Caching)

Vishal VeeraReddy — Thu, 04 Dec 2025 06:40:51 +0000

Claude Code has quickly become one of my favorite tools for repo-aware AI workflows. It understands your codebase, navigates files, summarizes diffs, runs tools, and integrates with Git—all through a simple CLI.

But there’s a catch:
The Claude Code CLI expects to speak directly to Anthropic’s hosted backend.

That means if you want to:

use Databricks-hosted Claude models,

route requests through Azure’s Anthropic /anthropic/v1/messages endpoint,

extend Claude Code with local tools and Model Context Protocol (MCP) servers,

add prompt caching,

or simply run your own backend for experimentation…

…you’re out of luck.

So I built Lynkr, a self-hosted Claude Code–compatible proxy that solves this.

👉 GitHub: https://github.com/vishalveerareddy123/Lynkr

🚀 What Lynkr Does

At a high level:

Lynkr is an HTTP proxy that emulates the Claude Code backend, forwards requests to Databricks or Azure Anthropic, and wires in workspace tools, Git helpers, prompt caching, and MCP servers.

You can continue using the regular Claude Code CLI, but point it at your own backend:

Claude Code CLI → Lynkr → Databricks / Azure Anthropic / MCP tools

This lets you keep the familiar development workflow while customizing everything under the hood.

🔧 Core Features

Provider Adapters

Built-in support for two upstream providers:

Databricks (default)

Azure Anthropic

Requests are normalized so the CLI sees standard Claude-style responses.

Repo Intelligence

Lynkr builds a lightweight SQLite index of your workspace:

Symbol definitions & references

Framework & dependency hints

Language mix

Lint/test config discovery

It also generates a CLAUDE.md summary that gives the model structured context about your project.

Git Workflow Integration

Includes Git helpers similar to Claude Code:

status, diff, stage, commit, push, pull

diff review summaries

release-note generation

Plus policy guards:

POLICY_GIT_ALLOW_PUSH

POLICY_GIT_REQUIRE_TESTS

POLICY_GIT_TEST_COMMAND

Prompt Caching

A local LRU+TTL cache keyed by prompt signature:

speeds up repeated prompts

reduces Databricks/Azure tokens

avoids re-running identical analysis steps

Tool-invoking turns bypass caching to avoid unsafe side effects.

MCP Orchestration

Lynkr automatically:

discovers MCP manifests

launches servers

wraps them with JSON-RPC

exposes all tools back to the assistant

Optional Docker sandboxing isolates MCP tools when needed.

Workspace Tools

Includes:

repo indexing

symbol search

diff review

test runner

file I/O tools

lightweight task tracker (TODOs stored in SQLite)

Full Transparency

Everything is logged (Pino-based structured logs), including:

request/response traces

repo indexer events

prompt cache hits/misses

MCP registry diagnostics

No black boxes.

🧱 Architecture Overview
Claude Code CLI
↓ (HTTP)
Lynkr Proxy (Express)
├─ Orchestrator (agent loop)
├─ Prompt Cache (LRU + TTL)
├─ Session DB (SQLite)
├─ Repo Indexer (Tree-sitter + CLAUDE.md)
├─ Tool Registry (workspace + git + diff + test)
├─ MCP Registry (JSON-RPC bridge)
└─ Provider Adapters (Databricks / Azure Anthropic)

The codebase is intentionally small and hackable—everything lives in src/.

🛠️ Installing Lynkr
Prerequisites

Node.js 18+

npm

Databricks or Azure Anthropic credentials

(Optional) Docker for MCP sandboxing

(Optional) Claude Code CLI

Install from npm
npm install -g lynkr
lynkr start

or via Homebrew:

brew tap vishalveerareddy123/lynkr
brew install vishalveerareddy123/lynkr/lynkr

or from source:

git clone https://github.com/vishalveerareddy123/Lynkr.git
cd Lynkr
npm install
npm start

⚙️ Configuring the Proxy
Databricks
MODEL_PROVIDER=databricks
DATABRICKS_API_BASE=https://.cloud.databricks.com
DATABRICKS_API_KEY=
WORKSPACE_ROOT=/path/to/repo
PORT=8080

Azure Anthropic
MODEL_PROVIDER=azure-anthropic
AZURE_ANTHROPIC_ENDPOINT=https://.services.ai.azure.com/anthropic/v1/messages
AZURE_ANTHROPIC_API_KEY=
AZURE_ANTHROPIC_VERSION=2023-06-01
WORKSPACE_ROOT=/path/to/repo
PORT=8080

🧩 Hooking Up Claude Code CLI
export ANTHROPIC_BASE_URL=http://localhost:8080
export ANTHROPIC_API_KEY=dummy # required by CLI but unused by Lynkr

Then run the CLI normally inside your repo.

Everything—tool calls, chat, diffs, navigation—flows through your proxy.

🔍 Example: calling a tool
curl http://localhost:8080/v1/messages \
-H "Content-Type: application/json" \
-d '{
"model": "claude-proxy",
"messages": [{ "role": "user", "content": "Rebuild the repo index." }],
"tools": [{
"name": "workspace_index_rebuild",
"type": "function",
"input_schema": { "type": "object" }
}],
"tool_choice": {
"type": "function",
"function": { "name": "workspace_index_rebuild" }
}
}'

🐛 Troubleshooting Highlights

Missing path → check your tool arguments

Git commands blocked → check POLICY_GIT_ALLOW_PUSH

MCP server not discovered → check manifest locations

Prompt cache not working → ensure no tools are used in the request

Web fetch returns HTML scaffolding → JS execution is not supported (use JSON APIs)

🗺️ Roadmap

Coming next:

per-file threaded diff comments

risk scoring on diffs

LSP bridging for deeper language understanding

declarative “skills” layer

historical coverage and test dashboards

🎯 Why I Built This

I love the Claude Code UX, but I wanted the ability to:

run everything locally

plug in Databricks and Azure Anthropic

add my own tools and MCP servers

see and debug all internal behavior

experiment quickly without platform constraints

If you’re exploring AI-assisted development on Databricks or Azure—and want more control over your backend—Lynkr might be useful.

👉 GitHub link: https://github.com/vishalveerareddy123/Lynkr

⭐ Contributions, ideas, and issues welcome.

DEV Community: Vishal VeeraReddy

One npm Install That Makes Every AI Coding Tool Work With Every LLM Provider

What It Does

Setup

12+ Backends

For Agent Builders: LangChain, CrewAI, AutoGen

Token Compression Stack

Enterprise: Circuit Breakers, Telemetry, Hot-Reload

Get Started

Run OpenClaw/Clawdbot for FREE with Lynkr (No API Bills)

🔗 What is Lynkr?

💡 Why This Matters

🚀 Setup Guide (15 minutes)

Step 1: Install Ollama

Step 2: Install Lynkr

Step 3: Configure Lynkr

Step 4: Start Lynkr

Step 5: Configure OpenClaw/Clawdbot

⚡ How Hierarchical Routing Works

💰 Cost Comparison

🔒 Why This is ToS-Safe

🧠 Advanced: Memory & Compression

🎯 Recommended Models

🏃 TL;DR

Links

How I Cut My AI Coding Tool Costs by 70% (And You Can Too)

🔗 What is Lynkr?

💡 The Problem Lynkr Solves

🏗️ How It Works

🚀 Supported Providers

📦 Quick Start (5 minutes)

Option 1: Run locally with Ollama (FREE)

Option 2: Use with AWS Bedrock

Option 3: OpenRouter (Simplest Cloud Setup)

Configure Your Tool

💰 Real Cost Comparison

🔒 Enterprise Use Cases

⚡ Advanced Features

🤝 Contributing

Try It Today

I Slashed My AI Coding Bills by 65% With This One Weird Trick.

Tier 1: Local/Free (0-2 tools needed)

💰 Tier 2: Mid-Tier Cloud (3-14 tools)

🏢 Tier 3: Enterprise (15+ tools)

Here's what happened after I switched:

MCP Server Integration (Because Why Not)

Production-Ready From Day One

Quick Install (curl)

Real-World Use Cases (AKA "Will This Actually Help Me?")

For Indie Developers

For Enterprise Teams

For AI Researchers

For Privacy-Conscious Devs

The Part Where I Show You The Code

Models That Actually Work Well

Get Started Now

The Future Roadmap

Final Thoughts

Emulating the Claude Code Backend for Databricks LLM Models(with MCP, Git Tools, and Prompt Caching)