<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Vishal VeeraReddy</title>
    <description>The latest articles on DEV Community by Vishal VeeraReddy (@lynkr).</description>
    <link>https://dev.to/lynkr</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3645387%2F794ced23-25c9-41ed-863a-401839a48d59.png</url>
      <title>DEV Community: Vishal VeeraReddy</title>
      <link>https://dev.to/lynkr</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lynkr"/>
    <language>en</language>
    <item>
      <title>Lynkr vs LiteLLM vs OpenRouter vs PortKey: Choosing an LLM Gateway in 2026</title>
      <dc:creator>Vishal VeeraReddy</dc:creator>
      <pubDate>Wed, 27 May 2026 00:58:39 +0000</pubDate>
      <link>https://dev.to/lynkr/lynkr-vs-litellm-vs-openrouter-vs-portkey-choosing-an-llm-gateway-in-2026-ea0</link>
      <guid>https://dev.to/lynkr/lynkr-vs-litellm-vs-openrouter-vs-portkey-choosing-an-llm-gateway-in-2026-ea0</guid>
      <description>&lt;h1&gt;
  
  
  Lynkr vs LiteLLM vs OpenRouter vs PortKey: Choosing an LLM Gateway in 2026
&lt;/h1&gt;

&lt;p&gt;If you're building anything on top of LLMs in 2026 — a chatbot, an agent, a coding tool, an internal AI app — you've probably hit the same wall I did:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One provider goes down and your product dies with it.&lt;/li&gt;
&lt;li&gt;Your OpenAI bill is climbing faster than your MRR.&lt;/li&gt;
&lt;li&gt;You want to try a cheaper model, but switching means rewriting code.&lt;/li&gt;
&lt;li&gt;Your team is now juggling 4 different SDKs for 4 different providers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The answer is an &lt;strong&gt;LLM gateway&lt;/strong&gt; — a proxy that sits between your app and every LLM provider, giving you one API, automatic failover, cost routing, and observability.&lt;/p&gt;

&lt;p&gt;There are four serious contenders in this space right now: &lt;strong&gt;Lynkr&lt;/strong&gt;, &lt;strong&gt;LiteLLM&lt;/strong&gt;, &lt;strong&gt;OpenRouter&lt;/strong&gt;, and &lt;strong&gt;PortKey&lt;/strong&gt;. I've shipped production code on all four. Here's an honest comparison.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Full disclosure: I built Lynkr. I'll try to be fair about where the others are stronger.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Lynkr&lt;/th&gt;
&lt;th&gt;LiteLLM&lt;/th&gt;
&lt;th&gt;OpenRouter&lt;/th&gt;
&lt;th&gt;PortKey&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Setup&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;npm install -g lynkr&lt;/code&gt; (3 lines)&lt;/td&gt;
&lt;td&gt;Python + Docker + Postgres&lt;/td&gt;
&lt;td&gt;Account signup, no self-host&lt;/td&gt;
&lt;td&gt;Docker + YAML config&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Self-hosted&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌ (SaaS only)&lt;/td&gt;
&lt;td&gt;✅ (paid tier)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Code / Codex / Cursor native&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;⚠️ (manual config)&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;⚠️ (manual config)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Local models (Ollama, llama.cpp)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ first-class&lt;/td&gt;
&lt;td&gt;⚠️ Ollama only&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Token optimization (caching/dedup)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Built-in (60-80%)&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;⚠️ Provider caching only&lt;/td&gt;
&lt;td&gt;✅ Caching layer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Auto-failover&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Observability dashboard&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Basic&lt;/td&gt;
&lt;td&gt;✅ Strong&lt;/td&gt;
&lt;td&gt;✅ Strong&lt;/td&gt;
&lt;td&gt;✅ Strongest&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;License&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Apache 2.0&lt;/td&gt;
&lt;td&gt;MIT&lt;/td&gt;
&lt;td&gt;Proprietary&lt;/td&gt;
&lt;td&gt;Mixed (OSS + paid)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best for&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Devs who want zero-config + coding tools&lt;/td&gt;
&lt;td&gt;Python teams w/ existing infra&lt;/td&gt;
&lt;td&gt;Quick prototyping&lt;/td&gt;
&lt;td&gt;Enterprise observability&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  1. Lynkr — Zero-config gateway with first-class coding-tool support
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; A self-hosted Node.js proxy that exposes both OpenAI and Anthropic wire protocols, routing to 12+ providers underneath.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it wins:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Drop-in for Claude Code, Codex CLI, and Cursor.&lt;/strong&gt; Set one env var (&lt;code&gt;ANTHROPIC_BASE_URL=http://localhost:8081&lt;/code&gt;) and your existing tools transparently use any backend — Ollama, Bedrock, OpenRouter, Azure, DeepSeek. No other gateway in this list speaks the Anthropic protocol natively, which means none of them work as drop-ins for Claude Code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built-in token optimization&lt;/strong&gt; (smart tool selection, prompt caching, memory dedup) shaves 60-80% off token counts on top of provider savings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3-command install:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; lynkr
   &lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:8081
   lynkr start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Local-first.&lt;/strong&gt; Ollama, llama.cpp, LM Studio, MLX are all first-class providers, not afterthoughts. Run Claude Code on free local models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apache 2.0&lt;/strong&gt;, self-hosted, your data never leaves your infra.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Where it loses:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Observability is basic — log-level only. If you need a polished dashboard with per-team usage charts, PortKey or LiteLLM are ahead.&lt;/li&gt;
&lt;li&gt;Newer project, smaller community than LiteLLM (~700 tests passing, growing).&lt;/li&gt;
&lt;li&gt;Node.js only — if your team is Python-first, the LiteLLM SDK feels more native.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pick Lynkr if:&lt;/strong&gt; You want a coding-tool gateway that works in 60 seconds, or you want to run local models with the tools you already use.&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;https://github.com/Fast-Editor/Lynkr&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  2. LiteLLM — The mature Python-native gateway
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; The granddaddy of LLM gateways. A Python library and proxy server that normalizes 100+ providers to the OpenAI API format.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it wins:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Massive provider coverage.&lt;/strong&gt; Hands down the most LLM providers supported — every obscure model you can name.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strong Python SDK.&lt;/strong&gt; If your app is Python, &lt;code&gt;from litellm import completion&lt;/code&gt; feels native.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise features:&lt;/strong&gt; team management, budgets, virtual keys, SSO, audit logs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mature dashboard&lt;/strong&gt; (LiteLLM UI) with per-key spend tracking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Battle-tested&lt;/strong&gt; — used by Microsoft, Anthropic internal teams, and tons of YC startups.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Where it loses:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Setup is heavy.&lt;/strong&gt; Production deployment wants Docker + Postgres + Redis. Not a "3 commands and go" experience.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No Anthropic protocol support.&lt;/strong&gt; Can't drop into Claude Code as a transparent backend.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No token optimization layer.&lt;/strong&gt; You pay full token cost.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local model support is shallow&lt;/strong&gt; — Ollama works, but llama.cpp/MLX are second-class.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pick LiteLLM if:&lt;/strong&gt; You have a Python codebase, need enterprise features (teams, budgets, SSO), and you're comfortable running Postgres.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. OpenRouter — Quick prototyping, zero self-hosting
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; A hosted SaaS that aggregates 100+ models behind one OpenAI-compatible API. You pay them, they pay the providers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it wins:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Literally zero setup.&lt;/strong&gt; Sign up, get an API key, change your base URL. Done in 60 seconds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single bill&lt;/strong&gt; instead of managing 5 provider accounts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built-in fallback&lt;/strong&gt; — if one model fails, route to another automatically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-discovery&lt;/strong&gt; of new models — they add them as providers release them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Great for prototyping&lt;/strong&gt; when you want to A/B test models without commitment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Where it loses:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Not self-hosted.&lt;/strong&gt; Your prompts and completions transit their infrastructure. For many enterprises, that's a non-starter.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No local model support.&lt;/strong&gt; Cloud-only by design.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No Anthropic protocol&lt;/strong&gt; — doesn't work with Claude Code, Cursor, or anything that expects Anthropic's API shape.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Markup on tokens.&lt;/strong&gt; They take a small margin on every API call (~5%).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No token optimization.&lt;/strong&gt; You pay full token cost, plus their margin.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pick OpenRouter if:&lt;/strong&gt; You're prototyping, you don't care about self-hosting, and you want the simplest possible "try any model" experience.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. PortKey — Enterprise observability + gateway
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; A gateway + observability platform that emphasizes prompt management, evals, and production monitoring.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it wins:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Best-in-class observability.&lt;/strong&gt; Per-request tracing, prompt versioning, eval pipelines, latency/cost dashboards.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt management built in.&lt;/strong&gt; Treat prompts like code with versions, A/B tests, and rollback.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Caching layer&lt;/strong&gt; — semantic + exact-match caching out of the box.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Guardrails&lt;/strong&gt; — built-in PII filtering, content moderation, response validation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SOC 2, HIPAA&lt;/strong&gt; options for regulated industries.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Where it loses:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Configuration is heavy.&lt;/strong&gt; YAML-driven, with a learning curve. Not for weekend hacking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The good stuff is paid.&lt;/strong&gt; Self-hosted is free, but team features and advanced observability require their cloud or enterprise tier.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coding-tool integration is manual&lt;/strong&gt; — no native drop-in for Claude Code or Codex.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Doesn't shine for local models.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pick PortKey if:&lt;/strong&gt; You're an enterprise that needs deep observability, governance, and prompt management more than you need raw provider count.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to choose — by use case
&lt;/h2&gt;

&lt;h3&gt;
  
  
  "I want to run Claude Code on free local models"
&lt;/h3&gt;

&lt;p&gt;→ &lt;strong&gt;Lynkr.&lt;/strong&gt; It's the only one in this list that natively speaks Anthropic's protocol, which is what Claude Code expects. Three commands and you're running Claude Code on Ollama for $0/day.&lt;/p&gt;

&lt;h3&gt;
  
  
  "I'm prototyping and just want to try every model fast"
&lt;/h3&gt;

&lt;p&gt;→ &lt;strong&gt;OpenRouter.&lt;/strong&gt; Sign up, swap base URL, done. Don't self-host until you have to.&lt;/p&gt;

&lt;h3&gt;
  
  
  "I have a Python production codebase with team budgets and SSO needs"
&lt;/h3&gt;

&lt;p&gt;→ &lt;strong&gt;LiteLLM.&lt;/strong&gt; Mature, Python-native, every enterprise feature.&lt;/p&gt;

&lt;h3&gt;
  
  
  "I need deep observability, prompt versioning, and compliance"
&lt;/h3&gt;

&lt;p&gt;→ &lt;strong&gt;PortKey.&lt;/strong&gt; Most polished dashboards and governance features.&lt;/p&gt;

&lt;h3&gt;
  
  
  "I'm building a multi-provider product and want token costs minimized"
&lt;/h3&gt;

&lt;p&gt;→ &lt;strong&gt;Lynkr&lt;/strong&gt; (for the built-in 60-80% optimization) &lt;strong&gt;or LiteLLM&lt;/strong&gt; (for breadth).&lt;/p&gt;




&lt;h2&gt;
  
  
  The honest landscape in 2026
&lt;/h2&gt;

&lt;p&gt;LLM gateways used to be a "nice to have." In 2026 they're table stakes — provider outages, pricing changes, and the explosion of capable open models mean &lt;strong&gt;no serious app should be hard-wired to one provider&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The right gateway depends on what you're building:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Coding tools and local-model fans:&lt;/strong&gt; Lynkr.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python production apps with team management:&lt;/strong&gt; LiteLLM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quick prototyping with zero ops:&lt;/strong&gt; OpenRouter.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regulated enterprise with deep observability:&lt;/strong&gt; PortKey.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The good news: all four are viable. The bad news: most teams pick the wrong one because they didn't realize the others existed.&lt;/p&gt;

&lt;p&gt;If you're paying any LLM bill today, the highest-leverage hour you can spend this week is &lt;strong&gt;switching to a gateway&lt;/strong&gt;. Pick one, point your app at it, and never let a provider outage take you down again.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What gateway are you running, and what do you wish it did better? Drop a comment — I'd love to see what's working and what isn't.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>webdev</category>
      <category>devops</category>
    </item>
    <item>
      <title>Run Hermes Agent on Any Model — Free, Local, and Cost-Routed</title>
      <dc:creator>Vishal VeeraReddy</dc:creator>
      <pubDate>Fri, 22 May 2026 05:22:50 +0000</pubDate>
      <link>https://dev.to/lynkr/hermes-lynkr-the-self-improving-agent-meets-the-universal-llm-proxy-3n11</link>
      <guid>https://dev.to/lynkr/hermes-lynkr-the-self-improving-agent-meets-the-universal-llm-proxy-3n11</guid>
      <description>&lt;p&gt;If you've spent any time wrestling with AI coding tools and agents in 2026, you've hit two walls:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Provider lock-in.&lt;/strong&gt; Claude Code expects Anthropic. Codex expects OpenAI. Your shiny new agent framework wants whatever its README assumes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent amnesia.&lt;/strong&gt; Every session starts from zero. Your "AI assistant" doesn't actually learn anything about you, your codebase, or the work you did yesterday.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Two open-source projects address those problems head-on — and they pair beautifully together.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;Hermes Agent&lt;/a&gt;&lt;/strong&gt; (by Nous Research) — a self-improving AI agent with a built-in learning loop, multi-platform presence, and a serious tool ecosystem.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;Lynkr&lt;/a&gt;&lt;/strong&gt; — a self-hosted universal LLM proxy that lets any AI tool talk to any model provider.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This post explains what each one is, why they exist, and shows you the exact steps to run &lt;strong&gt;Hermes through Lynkr&lt;/strong&gt; so you can route Hermes to Databricks, Bedrock, Ollama, llama.cpp, Azure, OpenRouter — or all of them with automatic cost-tier routing.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Hermes Agent?
&lt;/h2&gt;

&lt;p&gt;Hermes is an open-source AI agent (MIT-licensed, built by &lt;a href="https://nousresearch.com" rel="noopener noreferrer"&gt;Nous Research&lt;/a&gt;) that you actually live inside, not just call.&lt;/p&gt;

&lt;p&gt;What makes it different from "yet another agent":&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A closed learning loop.&lt;/strong&gt; Hermes curates its own memory, autonomously creates &lt;em&gt;skills&lt;/em&gt; (procedural memory) after complex tasks succeed, improves them during use, and searches its own past conversations via SQLite FTS5. It's the only agent I've seen that gets meaningfully better the longer you use it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lives where you do.&lt;/strong&gt; A single gateway process plugs into Telegram, Discord, Slack, WhatsApp, Signal, Email, and a real terminal TUI. Send a voice memo from your phone, get a transcribed answer back, continue the same thread from your laptop later.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runs anywhere.&lt;/strong&gt; Seven terminal backends — local, Docker, SSH, Singularity, Modal, Daytona, Vercel Sandbox. Run it on a $5 VPS or a GPU cluster. Modal/Daytona give you serverless persistence — hibernates when idle, wakes on demand.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built-in cron.&lt;/strong&gt; "Every weekday at 8am, summarize my GitHub notifications and send to Telegram." That's a one-line cron job in natural language.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Delegates and parallelizes.&lt;/strong&gt; Spawns isolated subagents for parallel workstreams; results come back without flooding your context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provider-agnostic by design.&lt;/strong&gt; OpenRouter, Nous Portal, NovitaAI, NVIDIA NIM, Xiaomi MiMo, z.ai/GLM, Kimi/Moonshot, MiniMax, Hugging Face, OpenAI, or your own endpoint. Switch with &lt;code&gt;hermes model&lt;/code&gt; — no code changes.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Architecture in one paragraph
&lt;/h3&gt;

&lt;p&gt;The core is &lt;code&gt;AIAgent&lt;/code&gt; in &lt;code&gt;run_agent.py&lt;/code&gt; — a synchronous tool-calling loop over OpenAI-format messages. &lt;code&gt;model_tools.py&lt;/code&gt; orchestrates ~40 built-in tools auto-discovered from &lt;code&gt;tools/&lt;/code&gt;. The CLI (&lt;code&gt;cli.py&lt;/code&gt;, ~11k LOC) handles slash commands, prompt_toolkit input, Rich rendering, and a data-driven skin engine. Provider profiles live under &lt;code&gt;plugins/model-providers/&amp;lt;name&amp;gt;/&lt;/code&gt; and contribute &lt;code&gt;base_url&lt;/code&gt;, &lt;code&gt;env_vars&lt;/code&gt;, &lt;code&gt;api_mode&lt;/code&gt;, and &lt;code&gt;fallback_models&lt;/code&gt; — the runtime resolver merges those with &lt;code&gt;custom_providers&lt;/code&gt; from &lt;code&gt;config.yaml&lt;/code&gt; to figure out where to send each request. That last detail is what makes Lynkr integration trivial.&lt;/p&gt;

&lt;h3&gt;
  
  
  Install Hermes in one line
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then &lt;code&gt;hermes&lt;/code&gt; to start chatting.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Lynkr?
&lt;/h2&gt;

&lt;p&gt;Lynkr is a self-hosted Node.js proxy that sits &lt;strong&gt;between any AI coding tool and any LLM provider&lt;/strong&gt;. One environment variable change, and your tool works with whatever backend you want.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Code / Cursor / Codex / Cline / Continue / Hermes / Vercel AI SDK
                                |
                              Lynkr  (http://localhost:8081)
                                |
   Ollama | Bedrock | Databricks | OpenRouter | Azure | OpenAI | llama.cpp | LM Studio | z.ai | Vertex | Moonshot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What's actually inside
&lt;/h3&gt;

&lt;p&gt;I went through the source. Lynkr is more than a "translate request, forward, translate response" proxy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Format conversion.&lt;/strong&gt; Anthropic ↔ OpenAI ↔ Codex Responses API ↔ Databricks ↔ Bedrock — handled in &lt;code&gt;src/clients/&lt;/code&gt; (&lt;code&gt;openai-format.js&lt;/code&gt;, &lt;code&gt;responses-format.js&lt;/code&gt;, &lt;code&gt;databricks.js&lt;/code&gt;, &lt;code&gt;bedrock-utils.js&lt;/code&gt;, etc.).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tier-based routing.&lt;/strong&gt; &lt;code&gt;src/routing/&lt;/code&gt; analyzes prompt complexity, agentic intent, risk, and latency, then routes to a &lt;code&gt;TIER_SIMPLE&lt;/code&gt; / &lt;code&gt;TIER_STANDARD&lt;/code&gt; / &lt;code&gt;TIER_COMPLEX&lt;/code&gt; model. Cheap stuff goes to Ollama; gnarly stuff goes to a frontier cloud model. This is where the headline "60–80% cost savings" comes from.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resilience.&lt;/strong&gt; Circuit breaker (cockatiel), retries, DNS logging, prompt cache injection.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP integration + Code Mode.&lt;/strong&gt; Auto-discovers MCP servers and can collapse 100+ MCP tool definitions into 4 meta-tools (~96% token reduction).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability built in.&lt;/strong&gt; Telemetry, latency tracking, usage reporting (&lt;code&gt;lynkr usage&lt;/code&gt; shows AI spend and tier savings), trajectory export as JSONL for training (&lt;code&gt;lynkr trajectory&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;699 passing tests.&lt;/strong&gt; Routing, format conversion, streaming, error resilience, memory store, prompt cache — it's seriously tested for a side-project proxy.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Install Lynkr in one line
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/Fast-Editor/Lynkr/main/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or via npm: &lt;code&gt;npm install -g pino-pretty &amp;amp;&amp;amp; npm install -g lynkr&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Use Them Together?
&lt;/h2&gt;

&lt;p&gt;Hermes already supports a long list of providers natively. Why bolt Lynkr in front?&lt;/p&gt;

&lt;p&gt;Three concrete reasons:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Unify your enterprise creds
&lt;/h3&gt;

&lt;p&gt;Your company has a Databricks endpoint serving Claude, an AWS Bedrock account with cross-region inference profiles, an Azure OpenAI deployment, &lt;em&gt;and&lt;/em&gt; a private Ollama box. With Lynkr, all of those live behind &lt;strong&gt;one&lt;/strong&gt; OpenAI-compatible URL. Hermes points at that URL and stops caring which backend is serving the request.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Automatic cost-tier routing
&lt;/h3&gt;

&lt;p&gt;This is the killer feature. Hermes can switch models with &lt;code&gt;/model&lt;/code&gt;, but Lynkr will switch &lt;em&gt;per request&lt;/em&gt; based on complexity. Simple tool calls and short prompts go to free local Ollama. Heavy reasoning goes to your premium cloud model. You don't think about it — Lynkr's &lt;code&gt;complexity-analyzer.js&lt;/code&gt; and &lt;code&gt;risk-analyzer.js&lt;/code&gt; decide.&lt;/p&gt;

&lt;p&gt;Run &lt;code&gt;lynkr usage&lt;/code&gt; afterward to see the actual savings.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Centralized observability for every agent + tool
&lt;/h3&gt;

&lt;p&gt;If you run Hermes + Claude Code + Cursor + Codex all on the same machine — and a lot of us do — Lynkr becomes a single chokepoint for spend, telemetry, prompt caching, and trajectory capture across all of them. You get one usage report instead of four dashboards.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Use Lynkr With Hermes
&lt;/h2&gt;

&lt;p&gt;The integration is genuinely 3 minutes of work because both tools speak OpenAI-compatible HTTP.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Start Lynkr with a backend
&lt;/h3&gt;

&lt;p&gt;Pick whatever provider you want Lynkr to route to. For a local-first setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# .env in your Lynkr directory (or just exports)&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;MODEL_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ollama
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OLLAMA_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;qwen2.5-coder:latest
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OLLAMA_ENDPOINT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:11434

lynkr start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or for tier routing across providers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;TIER_SIMPLE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ollama:qwen2.5-coder:latest
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;TIER_STANDARD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;openrouter:anthropic/claude-3.5-haiku
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;TIER_COMPLEX&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;bedrock:anthropic.claude-3-5-sonnet-20241022-v2:0
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENROUTER_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-or-...
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AWS_BEDROCK_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;...
lynkr start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lynkr now listens on &lt;code&gt;http://localhost:8081&lt;/code&gt; (OpenAI-compatible) and &lt;code&gt;http://localhost:8081/v1/messages&lt;/code&gt; (Anthropic-compatible).&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Register Lynkr as a custom provider in Hermes
&lt;/h3&gt;

&lt;p&gt;Hermes resolves providers through &lt;code&gt;plugins/model-providers/&amp;lt;name&amp;gt;/&lt;/code&gt; profiles &lt;strong&gt;plus&lt;/strong&gt; a &lt;code&gt;custom_providers&lt;/code&gt; list in your &lt;code&gt;~/.hermes/config.yaml&lt;/code&gt;. Add an entry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;custom_providers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;lynkr&lt;/span&gt;
    &lt;span class="na"&gt;base_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://localhost:8081/v1&lt;/span&gt;
    &lt;span class="na"&gt;api_mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;chat_completions&lt;/span&gt;
    &lt;span class="na"&gt;env_var&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LYNKR_API_KEY&lt;/span&gt;      &lt;span class="c1"&gt;# any string works — Lynkr doesn't validate&lt;/span&gt;
    &lt;span class="na"&gt;models&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;auto&lt;/span&gt;                    &lt;span class="c1"&gt;# Lynkr's tier router picks the actual model&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;qwen2.5-coder:latest&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;anthropic/claude-3.5-sonnet&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then set the key (any value):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hermes config &lt;span class="nb"&gt;set &lt;/span&gt;env.LYNKR_API_KEY sk-lynkr
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Point Hermes at Lynkr
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hermes model custom:lynkr/auto
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or interactively: run &lt;code&gt;hermes model&lt;/code&gt;, pick &lt;code&gt;custom:lynkr&lt;/code&gt;, choose &lt;code&gt;auto&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That's it. Every Hermes turn now flows through Lynkr, which routes to the right backend based on tier and complexity. Run a few turns, then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;lynkr usage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;…and you'll see the per-tier spend breakdown and dollars saved versus a single-frontier-model baseline.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bonus: voice memo → Hermes → Lynkr → cheapest model
&lt;/h3&gt;

&lt;p&gt;Because Hermes already has Telegram and voice memo transcription wired in, this whole stack means:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Record a voice memo on your phone → Hermes transcribes it → routes the request through Lynkr → Lynkr picks Ollama for the "what time is it in Tokyo" parts and Sonnet for the "refactor this function" parts → reply comes back to your phone.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You built that in 5 minutes with two &lt;code&gt;npm&lt;/code&gt;/&lt;code&gt;bash&lt;/code&gt; installers and a YAML edit.&lt;/p&gt;




&lt;h2&gt;
  
  
  When NOT to Use Lynkr With Hermes
&lt;/h2&gt;

&lt;p&gt;Being honest:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You only use one provider.&lt;/strong&gt; Hermes already supports it natively. Adding Lynkr is extra latency and another process to babysit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You need streaming reasoning tokens from a specific model.&lt;/strong&gt; Make sure Lynkr's format converter for that provider preserves what you need — it does for most cases, but verify before betting on it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You're on a constrained environment.&lt;/strong&gt; Lynkr is Node 20+. Hermes is Python 3.11. That's two runtimes on a Raspberry Pi.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For everything else — multi-provider workflows, enterprise creds, cost optimization, observability — the combination is hard to beat.&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Need&lt;/th&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;A real AI agent that learns, remembers, and lives across Telegram/Discord/CLI&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Hermes&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Route any AI tool to any LLM provider with automatic cost tiers&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Lynkr&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Both&lt;/td&gt;
&lt;td&gt;Point Hermes at Lynkr via &lt;code&gt;custom_providers&lt;/code&gt; in &lt;code&gt;config.yaml&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Links
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Hermes Agent: &lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;https://github.com/NousResearch/hermes-agent&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Hermes docs: &lt;a href="https://hermes-agent.nousresearch.com/docs" rel="noopener noreferrer"&gt;https://hermes-agent.nousresearch.com/docs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Lynkr: &lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;https://github.com/Fast-Editor/Lynkr&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Lynkr docs: &lt;a href="https://fast-editor.github.io/Lynkr/" rel="noopener noreferrer"&gt;https://fast-editor.github.io/Lynkr/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you build something with this combo, drop a comment — I'd love to see what stacks people are putting together.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>agents</category>
      <category>llm</category>
    </item>
    <item>
      <title>What is Google Gemini Spark? A Deep Dive Into Google's 24/7 Personal AI Agent</title>
      <dc:creator>Vishal VeeraReddy</dc:creator>
      <pubDate>Fri, 22 May 2026 05:16:13 +0000</pubDate>
      <link>https://dev.to/lynkr/what-is-google-gemini-spark-a-deep-dive-into-googles-247-personal-ai-agent-3iji</link>
      <guid>https://dev.to/lynkr/what-is-google-gemini-spark-a-deep-dive-into-googles-247-personal-ai-agent-3iji</guid>
      <description>&lt;h1&gt;
  
  
  What is Google Gemini Spark? A Deep Dive Into Google's 24/7 Personal AI Agent
&lt;/h1&gt;

&lt;p&gt;If you watched Google I/O 2026, one announcement quietly stole the show — not because it was the flashiest, but because it represents a fundamental shift in how we'll interact with computers. That announcement was &lt;strong&gt;Gemini Spark&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Spark isn't another chatbot. It isn't a smarter Search. It's Google's first serious attempt at a &lt;strong&gt;24/7 autonomous personal agent&lt;/strong&gt; — software that keeps working when your phone is in your pocket, when your laptop is closed, and when you're asleep.&lt;/p&gt;

&lt;p&gt;Let's break down what it actually is, how it works, and why it matters for developers.&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Gemini Spark&lt;/strong&gt; is a persistent, autonomous AI agent that lives across Gmail, Docs, Calendar, and the rest of Google Workspace.&lt;/li&gt;
&lt;li&gt;It runs &lt;strong&gt;24/7 in the background&lt;/strong&gt;, even when your device is closed.&lt;/li&gt;
&lt;li&gt;It's powered by &lt;strong&gt;Gemini 3.5 Flash&lt;/strong&gt; (with Pro coming soon) and built on top of the open &lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt; so third-party tools can plug in.&lt;/li&gt;
&lt;li&gt;It launched first for &lt;strong&gt;Google AI Ultra subscribers ($100/month)&lt;/strong&gt; in the US, with broader rollout to follow.&lt;/li&gt;
&lt;li&gt;For developers, Spark + MCP is the most important integration surface Google has shipped in years.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  1. What Exactly Is Gemini Spark?
&lt;/h2&gt;

&lt;p&gt;Gemini Spark is a &lt;strong&gt;personal agent&lt;/strong&gt; that Google describes as a "24/7 collaborator." Unlike previous AI features that respond when you ask, Spark is &lt;strong&gt;proactive&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It &lt;strong&gt;reads&lt;/strong&gt; your incoming email and flags what needs action.&lt;/li&gt;
&lt;li&gt;It &lt;strong&gt;drafts&lt;/strong&gt; replies, schedules, and follow-ups before you ask.&lt;/li&gt;
&lt;li&gt;It &lt;strong&gt;tracks&lt;/strong&gt; ongoing tasks across days and weeks.&lt;/li&gt;
&lt;li&gt;It &lt;strong&gt;executes&lt;/strong&gt; multi-step workflows that span multiple apps.&lt;/li&gt;
&lt;li&gt;It keeps doing all of the above &lt;strong&gt;while your phone is locked&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of it as the difference between hiring a contractor (Gemini chat) and hiring a full-time assistant (Spark). One responds to tickets. The other owns outcomes.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. How It Works Under the Hood
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Model Layer: Gemini 3.5 Flash
&lt;/h3&gt;

&lt;p&gt;Spark runs on &lt;strong&gt;Gemini 3.5 Flash&lt;/strong&gt;, Google's new default model. Key specs Google announced:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;~&lt;strong&gt;4x faster token output&lt;/strong&gt; than competing frontier models&lt;/li&gt;
&lt;li&gt;Beats the older Gemini 3.1 Pro on coding and agentic benchmarks&lt;/li&gt;
&lt;li&gt;Optimized for the kind of long-running, low-latency tool use that agents need&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Flash is the right choice here because agents make &lt;strong&gt;a lot&lt;/strong&gt; of small decisions ("should I draft this? wait for more context? ask the user?"). Latency compounds.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Protocol Layer: MCP (Model Context Protocol)
&lt;/h3&gt;

&lt;p&gt;This is the part most developers missed. Instead of building a proprietary plugin system, &lt;strong&gt;Google adopted MCP&lt;/strong&gt; — the open standard originally pushed by Anthropic — as the way third-party tools connect to Spark.&lt;/p&gt;

&lt;p&gt;This is huge. It means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One MCP server you build can serve Claude, Gemini Spark, and any other MCP-compatible host.&lt;/li&gt;
&lt;li&gt;You don't need to maintain a separate "Google plugin" SDK.&lt;/li&gt;
&lt;li&gt;Tool definitions, auth, and resource exposure all follow one spec.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A minimal MCP tool looks roughly like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp.server&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Server&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp.types&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TextContent&lt;/span&gt;

&lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-tool-server&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@server.list_tools&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;list_tools&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_invoice_status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Look up status of an invoice by ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;inputSchema&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;invoice_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;invoice_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nd"&gt;@server.call_tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;TextContent&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_invoice_status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;lookup_invoice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;invoice_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;TextContent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Register this with Spark and it can now check invoice status on your behalf — at 3am, while you're asleep, when an email asking about an invoice arrives.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Runtime Layer: Background Execution
&lt;/h3&gt;

&lt;p&gt;The genuinely new piece is the &lt;strong&gt;persistent runtime&lt;/strong&gt;. Most "AI assistants" stop existing the moment you close the tab. Spark keeps a server-side execution context tied to your account that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Subscribes to events (new email, calendar change, doc edit).&lt;/li&gt;
&lt;li&gt;Wakes the agent loop on relevant triggers.&lt;/li&gt;
&lt;li&gt;Executes tool calls (MCP, Workspace APIs, Search).&lt;/li&gt;
&lt;li&gt;Surfaces results via the &lt;strong&gt;Android Halo&lt;/strong&gt; — a small communication band at the top of the phone screen showing what background agents are doing.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The Halo is a UX innovation worth noticing: it solves the "what is my agent secretly doing?" trust problem by always making background work visible.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. What Spark Can Actually Do Today
&lt;/h2&gt;

&lt;p&gt;From the I/O 2026 demos and rollout notes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Email triage&lt;/strong&gt; — Reads inbox, drafts replies, surfaces what needs your attention.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schedule management&lt;/strong&gt; — Reschedules meetings, finds slots, sends invites.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Daily Brief&lt;/strong&gt; — Morning digest pulling from Gmail, Calendar, and Tasks, ranked by priority.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-app workflows&lt;/strong&gt; — "Find the contract Sarah sent last month, summarize the changes, and email her the redline" → executes end-to-end.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistent monitoring&lt;/strong&gt; — "Watch this listing and ping me if the price drops below X" runs indefinitely.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  4. Pricing: The Catch
&lt;/h2&gt;

&lt;p&gt;Spark launched on Google's new &lt;strong&gt;Ultra tier ($100/month)&lt;/strong&gt;. More importantly, Google scrapped per-day prompt limits and moved to a &lt;strong&gt;compute-used&lt;/strong&gt; billing model. You're charged based on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prompt complexity&lt;/li&gt;
&lt;li&gt;Features invoked (Spark, Flow, Omni)&lt;/li&gt;
&lt;li&gt;Length of the conversation/agent run&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The allowance refreshes roughly every 5 hours. A heavy debugging session — or a Spark agent that runs hot for an afternoon — can drain it fast. Build accordingly.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Why This Matters for Developers
&lt;/h2&gt;

&lt;p&gt;Three concrete reasons Spark should be on your radar:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. MCP is now a two-vendor standard
&lt;/h3&gt;

&lt;p&gt;With both Anthropic and Google supporting MCP, it's the closest thing to a universal "tools for agents" spec we have. Build MCP servers, not proprietary integrations.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The unit of software is shifting
&lt;/h3&gt;

&lt;p&gt;For ten years we built apps that respond to clicks. The next decade is about building &lt;strong&gt;services that agents can drive&lt;/strong&gt;. That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stable, well-documented APIs&lt;/li&gt;
&lt;li&gt;Clear tool descriptions (the LLM has to pick yours over a competitor's)&lt;/li&gt;
&lt;li&gt;Idempotent operations (agents will retry)&lt;/li&gt;
&lt;li&gt;Streaming/long-running job patterns (agents wait for things)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Background-first design
&lt;/h3&gt;

&lt;p&gt;If your product can be useful while the user isn't looking at it, Spark is a distribution channel. "What can my app do for the user at 2pm Tuesday when they're in a meeting?" is now a real product question.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. How to Start Building for Spark Today
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Stand up an MCP server&lt;/strong&gt; exposing your product's core capabilities. The official Python and TypeScript SDKs are at &lt;a href="https://modelcontextprotocol.io" rel="noopener noreferrer"&gt;modelcontextprotocol.io&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write tool descriptions like marketing copy.&lt;/strong&gt; The model picks tools based on the description — clarity wins.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Make every operation idempotent and resumable.&lt;/strong&gt; Background agents crash, retry, and resume.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test with multiple hosts.&lt;/strong&gt; Same MCP server should work in Claude Desktop, Gemini Spark, and any other MCP client. If it doesn't, your server is doing something non-standard.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Design for partial autonomy.&lt;/strong&gt; The best agent UX is "I drafted this — approve?" not "I sent this." At least at first.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Closing Thought
&lt;/h2&gt;

&lt;p&gt;Gemini Spark is the clearest signal yet that the &lt;strong&gt;agent era is the product era of the next decade&lt;/strong&gt;. The companies that win it won't be the ones with the smartest model — they'll be the ones whose software is the easiest for agents to use.&lt;/p&gt;

&lt;p&gt;If you ship developer tools, APIs, or SaaS, your roadmap question for 2026 is no longer "how do we add AI features?" It's &lt;strong&gt;"how do we become the tool an agent reaches for?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Spark is Google placing its bet. Time to place yours.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Found this useful? Drop a comment with what you're building for the agent era — I'd love to see it.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>google</category>
      <category>ai</category>
      <category>agents</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Run OpenHands on Any Model You Want</title>
      <dc:creator>Vishal VeeraReddy</dc:creator>
      <pubDate>Fri, 22 May 2026 05:09:18 +0000</pubDate>
      <link>https://dev.to/lynkr/run-openhands-on-any-model-you-want-1mnd</link>
      <guid>https://dev.to/lynkr/run-openhands-on-any-model-you-want-1mnd</guid>
      <description>&lt;p&gt;There's a quiet shift happening in how serious developers are using AI in 2026. The hype cycle has moved past "ask the chatbot" and landed somewhere more interesting: autonomous coding agents that actually open files, run commands, and ship work — paired with self-hosted routing layers that decide which model handles which turn so you don't go broke doing it.&lt;/p&gt;

&lt;p&gt;This post is about the two open-source projects I've quietly stitched together into my daily driver — and the setup has all but replaced the closed tools I used to pay a small fortune for. &lt;strong&gt;OpenHands&lt;/strong&gt; is the agent: a sandboxed, autonomous software engineer that opens files, runs commands, writes tests, and ships PRs. &lt;strong&gt;Lynkr&lt;/strong&gt; is the router: a self-hosted proxy that sits in front of every LLM provider on the market and decides, request by request, which one should answer. One runs the work. The other decides what the work is worth. Together, they run locally, leak nothing to a third-party SaaS, and cost a fraction of anything closed you can buy in 2026.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is OpenHands?
&lt;/h2&gt;

&lt;p&gt;OpenHands is the most credible open-source answer to closed AI coding agents like Devin, Cursor's background agents, and Codex CLI. It grew out of the &lt;strong&gt;OpenDevin&lt;/strong&gt; research project (renamed in early 2025) and is now maintained by &lt;strong&gt;All-Hands-AI&lt;/strong&gt;, a venture-backed company with an $18.8M Series A. The repo lives at &lt;code&gt;All-Hands-AI/OpenHands&lt;/code&gt;, ships under the MIT license, and at version 1.7.0 (May 2026) has crossed &lt;strong&gt;74,400+ GitHub stars&lt;/strong&gt;, &lt;strong&gt;9,400+ forks&lt;/strong&gt;, &lt;strong&gt;102 releases&lt;/strong&gt;, and &lt;strong&gt;6,700+ commits&lt;/strong&gt; — making it by a wide margin the most-adopted open agent framework in the world. The codebase is roughly 63% Python and 36% TypeScript. To understand why it works, it helps to look at the architecture, the agent design, the runtime, and the customization surface in turn.&lt;/p&gt;

&lt;h3&gt;
  
  
  The architecture: event-sourced, modular, deploy-anywhere
&lt;/h3&gt;

&lt;p&gt;The V1 SDK that landed alongside the November 2025 paper &lt;em&gt;"The OpenHands Software Agent SDK"&lt;/em&gt; is a clean reimagining of the original V0 monolith. Where V0 had 140+ configuration fields spread across 15 classes and tightly coupled the agent to the sandbox, V1 splits the system into &lt;strong&gt;four independent packages&lt;/strong&gt;: an &lt;strong&gt;SDK&lt;/strong&gt; for agent definitions, a &lt;strong&gt;Tools&lt;/strong&gt; layer for action handlers, a &lt;strong&gt;Workspace&lt;/strong&gt; layer for execution environments, and a &lt;strong&gt;Server&lt;/strong&gt; for hosting. Everything is stateless and immutable — agents are configuration objects, not living entities — and all mutable context lives in a single event-sourced &lt;code&gt;ConversationState&lt;/code&gt; object. The conversation itself is a typed stream of &lt;strong&gt;Action&lt;/strong&gt; and &lt;strong&gt;Observation&lt;/strong&gt; Pydantic events, both immutable, both replayable. This is the single most important design decision in OpenHands: an autonomous agent is modeled as a pure function from event history to next event, run in a loop. Pause, resume, fork, deterministic replay, and full audit trails come for free. The same code runs in a Jupyter notebook locally or against a remote container farm in production — the &lt;code&gt;Conversation&lt;/code&gt; factory transparently picks &lt;code&gt;LocalConversation&lt;/code&gt; (in-process) or &lt;code&gt;RemoteConversation&lt;/code&gt; (HTTP/WebSocket to a containerized server) based on config, with zero code changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  The agent: CodeActAgent and the "code as universal tool" insight
&lt;/h3&gt;

&lt;p&gt;The flagship agent is &lt;strong&gt;CodeActAgent&lt;/strong&gt;, and its premise comes from the original CodeAct paper: instead of handing the LLM 20 bespoke tools each wrapped in their own JSON schema, give it &lt;strong&gt;bash, Python (via Jupyter), and a browser DSL&lt;/strong&gt;, and let it express any action as code. Want to read a file? &lt;code&gt;cat&lt;/code&gt;. Want to refactor across the repo? Python. Want to verify a fix lands? Run the tests. Want to scrape documentation? Drive the browser. Empirically this generalizes far better than tool-per-task designs and dramatically reduces parsing errors, because the model is doing what it's already best at — writing code — instead of filling in arbitrary JSON. Tool use is unified through a single &lt;strong&gt;Action → Execution → Observation&lt;/strong&gt; contract. The SDK also ships &lt;strong&gt;MCPToolDefinition&lt;/strong&gt; extending the standard &lt;code&gt;ToolDefinition&lt;/code&gt; interface, so Model Context Protocol tools plug in alongside native ones with no glue code, and a built-in &lt;strong&gt;RouterLLM&lt;/strong&gt; that can switch models mid-conversation (e.g., escalate to a multimodal model only when an image actually appears in context). A &lt;strong&gt;LLMSecurityAnalyzer&lt;/strong&gt; scores every proposed tool call as LOW / MEDIUM / HIGH / UNKNOWN and can pause for confirmation on dangerous operations — a layer most agent frameworks simply don't have.&lt;/p&gt;

&lt;h3&gt;
  
  
  The runtime: a sandboxed Docker microservice
&lt;/h3&gt;

&lt;p&gt;Agents never touch your host directly. Every action runs inside a sandboxed runtime container that exposes an &lt;strong&gt;action execution server over a REST API&lt;/strong&gt;, which the OpenHands backend talks to in a tight loop: send Action, receive Observation, repeat. The container ships its own bash shell, Jupyter kernel, headless browser, and a pluggable skills system, plus optional &lt;strong&gt;VS Code Web&lt;/strong&gt; access on a tokenized URL so you can drop into the sandbox visually if you want to. Images are managed through a clever &lt;strong&gt;three-tier tagging system&lt;/strong&gt; — source-hash, lock-hash, versioned — so rebuilds are incremental and reproducible across machines. The runtime is pluggable too: &lt;strong&gt;Docker&lt;/strong&gt; is the default, &lt;strong&gt;LocalRuntime&lt;/strong&gt; runs the action server on the host for fastest iteration, and &lt;strong&gt;RemoteRuntime&lt;/strong&gt; targets remote container infrastructure for fleet-scale deployment. Plugin backends include &lt;strong&gt;E2B&lt;/strong&gt;, &lt;strong&gt;Modal&lt;/strong&gt;, and &lt;strong&gt;Daytona&lt;/strong&gt; for teams that already have sandbox infrastructure they trust. Storage supports bind mounts and Docker named volumes with optional &lt;strong&gt;copy-on-write overlay mode&lt;/strong&gt; for isolation.&lt;/p&gt;

&lt;h3&gt;
  
  
  The customization surface: microagents (V0) / Skills (V1)
&lt;/h3&gt;

&lt;p&gt;OpenHands is opinionated but extensible through what V0 calls &lt;strong&gt;microagents&lt;/strong&gt; and V1 renames to &lt;strong&gt;Skills&lt;/strong&gt; (both terms still work — V1 reads the V0 layout for backward compatibility). You drop a &lt;code&gt;.openhands/microagents/&lt;/code&gt; directory at the root of your repo, add markdown files like &lt;code&gt;repo.md&lt;/code&gt;, &lt;code&gt;frontend.md&lt;/code&gt;, or &lt;code&gt;migrations.md&lt;/code&gt;, and the main agent loads them on demand. A &lt;code&gt;repo.md&lt;/code&gt; gives the agent the high-level mental model of your codebase — directory layout, build commands, test conventions, where the gotchas are — so it doesn't have to rediscover them every session. Triggered microagents activate only when their keywords appear in the conversation, so you can teach the agent narrow domain knowledge without bloating every prompt. This is where the project's "agents that ship in production" claim earns its keep: tuning OpenHands for your specific codebase is a markdown commit, not a fork.&lt;/p&gt;

&lt;h3&gt;
  
  
  The interfaces: four ways in
&lt;/h3&gt;

&lt;p&gt;You can use OpenHands as a &lt;strong&gt;CLI&lt;/strong&gt; (terminal-native, like Aider or Claude Code), a &lt;strong&gt;local GUI&lt;/strong&gt; (React SPA backed by a REST/WebSocket API, the default &lt;code&gt;docker run&lt;/code&gt; experience), the &lt;strong&gt;hosted cloud&lt;/strong&gt; at &lt;code&gt;all-hands.dev&lt;/code&gt; (free tier on Minimax models, paid tiers on frontier providers, GitHub/GitLab login), or as a &lt;strong&gt;headless service&lt;/strong&gt; driven programmatically via the SDK. The headless mode is what powers the &lt;strong&gt;OpenHands Resolver&lt;/strong&gt;: connect it to a GitHub repo, label an issue, and OpenHands spins up a sandboxed runtime, analyzes the issue, edits the code, runs the tests, and opens a PR — fully autonomous. The same Resolver pattern works for GitLab. For enterprise teams, the deployment story extends to private Kubernetes clusters with RBAC, Slack/Jira/Linear connectors, and source-available enterprise features.&lt;/p&gt;

&lt;h3&gt;
  
  
  The numbers that matter
&lt;/h3&gt;

&lt;p&gt;OpenHands publishes against the standard benchmark: &lt;strong&gt;77.6% on SWE-Bench Verified&lt;/strong&gt; with Claude 3.5 Sonnet Thinking on the V0 harness, and &lt;strong&gt;72.8% SWE-Bench Verified + 67.9% GAIA&lt;/strong&gt; with Claude Sonnet 4.5 on the V1 SDK. Those aren't the highest numbers ever posted, but they are the highest by an open-source, self-hostable system you can actually inspect, fork, and run yourself. In January 2026 the team launched the &lt;strong&gt;OpenHands Index&lt;/strong&gt; — a continuously-updated leaderboard evaluating models across five real engineering categories (Issue Resolution, Greenfield Development, Frontend Development, Software Testing, and Information Gathering) by ability, cost, and execution time. That tells you something about where the project is heading: away from "agent as parlor trick" and toward "agent as measured, reproducible infrastructure."&lt;/p&gt;

&lt;h3&gt;
  
  
  What ties it together
&lt;/h3&gt;

&lt;p&gt;The throughline is that OpenHands is not trying to be a chatbot or an autocomplete. It's trying to be the runtime layer for autonomous software agents — composable, sandboxed, observable, model-agnostic, and equally happy in a notebook or behind a load balancer. It uses &lt;strong&gt;LiteLLM&lt;/strong&gt; under the hood for LLM dispatch, which means &lt;strong&gt;100+ providers&lt;/strong&gt; are reachable today (Claude, GPT, Gemini, Bedrock, Vertex, Azure, OpenRouter, Ollama, llama.cpp, anything LiteLLM speaks) with no code changes. That last property is what makes the Lynkr pairing not just possible but obvious.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Lynkr?
&lt;/h2&gt;

&lt;p&gt;Lynkr is a self-hosted Node.js proxy that sits between your AI coding tools and the dozen-plus LLM providers worth using, listening on &lt;code&gt;http://localhost:8081&lt;/code&gt; and presenting both Anthropic Messages and OpenAI Chat Completions APIs simultaneously. From the outside it looks like a normal LLM endpoint — point Claude Code, Cursor, Aider, Codex CLI, or OpenHands at it with one environment variable and you're done — but on the inside it does something none of those tools do on their own: it analyzes every request across 15 weighted dimensions (including an AST-based knowledge graph called &lt;strong&gt;Graphify&lt;/strong&gt; that understands code structure across 19 languages, detecting god nodes, community cohesion, and architectural blast radius) and routes it to one of four model tiers — &lt;code&gt;simple&lt;/code&gt;, &lt;code&gt;medium&lt;/code&gt;, &lt;code&gt;complex&lt;/code&gt;, or &lt;code&gt;reasoning&lt;/code&gt; — based on what the request actually needs. Supported backends include local-and-free options (&lt;strong&gt;Ollama&lt;/strong&gt;, &lt;strong&gt;llama.cpp&lt;/strong&gt;, &lt;strong&gt;LM Studio&lt;/strong&gt;, &lt;strong&gt;MLX Server&lt;/strong&gt;) and cloud providers (&lt;strong&gt;AWS Bedrock&lt;/strong&gt; with 100+ models, &lt;strong&gt;OpenRouter&lt;/strong&gt; with 100+ more, &lt;strong&gt;Azure OpenAI&lt;/strong&gt;, &lt;strong&gt;Azure Anthropic&lt;/strong&gt;, &lt;strong&gt;OpenAI&lt;/strong&gt;, &lt;strong&gt;Google Vertex&lt;/strong&gt;, &lt;strong&gt;Databricks&lt;/strong&gt;, &lt;strong&gt;Moonshot&lt;/strong&gt;, &lt;strong&gt;Z.AI&lt;/strong&gt;, &lt;strong&gt;DeepSeek&lt;/strong&gt;). On top of routing, Lynkr layers a seven-phase token optimization pipeline — smart tool selection, &lt;strong&gt;Code Mode&lt;/strong&gt; which collapses 100+ MCP tools into 4 meta-tools (≈96% tool-overhead reduction), Distill structural compression, SHA-256-keyed LRU prompt caching, memory deduplication, sliding-window history compression, and an optional ML-based headroom sidecar (Smart Crusher, CCR, LLMLingua) — plus a &lt;strong&gt;Titans-inspired long-term memory subsystem&lt;/strong&gt; that stores observations in a SQLite FTS5 database (&lt;code&gt;lynkr.db&lt;/code&gt;) scored on surprise, recency, and relevance, and injects only the relevant slice back into system prompts on future requests. Production deployments get Prometheus metrics at &lt;code&gt;/metrics&lt;/code&gt;, Kubernetes-ready health checks, circuit breakers with half-open probe recovery, hot-reloadable config via &lt;code&gt;POST /v1/admin/reload&lt;/code&gt;, SQLite-backed routing telemetry with P50/P95/P99 latencies and 0–100 quality scoring on every decision, and load shedding under pressure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why You Should Use Lynkr With OpenHands
&lt;/h2&gt;

&lt;p&gt;The honest case for stacking these two has four layers, and each one matters more than the last.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost, but not the dumb kind.&lt;/strong&gt; OpenHands is brilliant precisely because it doesn't ask permission — it just does the work, which means it burns through tokens at a rate that's painful if every turn hits a frontier model. A long session can quietly cost $20+ in Opus tokens because the agent doesn't know — and shouldn't have to know — that a &lt;code&gt;mv foo.py bar.py&lt;/code&gt; doesn't deserve a $15-per-million model. Lynkr's complexity analysis is the missing brain: simple file moves and grep calls drop to &lt;code&gt;simple&lt;/code&gt; tier (Haiku, GPT-4o-mini, local Qwen), real architectural reasoning gets routed to &lt;code&gt;reasoning&lt;/code&gt; tier (Opus, GPT-5, Gemini Ultra). Users typically report 60–80% lower spend without a meaningful drop in output quality, because the expensive models still get called — just for the turns that actually need them. Unlike OpenRouter, Lynkr doesn't take a 5.5% cut of your credits, and unlike LiteLLM's proxy, the token optimization pipeline is built in rather than something you bolt on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Provider redundancy that doesn't require you to redeploy.&lt;/strong&gt; When Anthropic has a bad afternoon — and they do — your OpenHands session doesn't stop. Lynkr's circuit breakers detect the failure, route around it to a configured fallback (Bedrock Claude, Azure Anthropic, OpenAI, whatever you've set), and quietly recover via half-open probes when the primary comes back. You change zero code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Local models when you want them, frontier when you don't.&lt;/strong&gt; Drop in Ollama or LM Studio as your &lt;code&gt;simple&lt;/code&gt; tier and the cheapest turns in your session cost literally nothing — they never leave your machine. The same OpenHands install can be 100% offline-capable for development tasks and seamlessly burst to cloud Opus for the hard problems. No other agent + router combination on the market does this without significant glue code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One config owns your model strategy.&lt;/strong&gt; This is the underrated win. As models drop monthly (GPT-5.1, Sonnet 4.7, Gemini 3, the next open-weights surprise from DeepSeek), you stop rewiring your tools — you change one line in Lynkr's config and hot-reload. OpenHands keeps doing what it does. Cursor, Claude Code, Aider, and every other tool pointed at the same Lynkr instance get the new strategy for free.&lt;/p&gt;

&lt;p&gt;A note on philosophy: nothing about your code, prompts, or context ever passes through a third-party SaaS on the way to the model. Lynkr is self-hosted, your provider keys live in your environment, and the routing decisions are auditable in a local SQLite database. For anyone working in a regulated environment — or who just doesn't love handing prompt logs to an intermediary — this matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Use It
&lt;/h2&gt;

&lt;p&gt;The full setup is three steps and roughly five minutes.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Install and start Lynkr
&lt;/h3&gt;

&lt;p&gt;Pick whichever feels right:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# One-line install (recommended)&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/Fast-Editor/Lynkr/main/install.sh | bash

&lt;span class="c"&gt;# Or npm&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; lynkr

&lt;span class="c"&gt;# Or Homebrew&lt;/span&gt;
brew tap vishalveerareddy123/lynkr &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; brew &lt;span class="nb"&gt;install &lt;/span&gt;lynkr

&lt;span class="c"&gt;# Or Docker&lt;/span&gt;
docker-compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Configure your providers
&lt;/h3&gt;

&lt;p&gt;Lynkr reads from environment variables (or a &lt;code&gt;.env&lt;/code&gt; file). A reasonable starter config that mixes a local model for cheap turns with cloud frontier models for hard ones:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Tier definitions&lt;/span&gt;
&lt;span class="nv"&gt;SIMPLE_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ollama
&lt;span class="nv"&gt;SIMPLE_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;qwen2.5-coder:latest

&lt;span class="nv"&gt;MEDIUM_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;openai
&lt;span class="nv"&gt;MEDIUM_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gpt-4o-mini

&lt;span class="nv"&gt;COMPLEX_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;anthropic
&lt;span class="nv"&gt;COMPLEX_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;claude-sonnet-4-6

&lt;span class="nv"&gt;REASONING_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;anthropic
&lt;span class="nv"&gt;REASONING_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;claude-opus-4-7

&lt;span class="c"&gt;# Provider credentials&lt;/span&gt;
&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-...
&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-ant-...

&lt;span class="c"&gt;# Optimizations&lt;/span&gt;
&lt;span class="nv"&gt;SEMANTIC_CACHE_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true
&lt;/span&gt;&lt;span class="nv"&gt;CODE_MODE_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true
&lt;/span&gt;&lt;span class="nv"&gt;MEMORY_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start it and verify:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;lynkr start
curl http://localhost:8081/v1/models   &lt;span class="c"&gt;# should return a JSON model list&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Point OpenHands at Lynkr
&lt;/h3&gt;

&lt;p&gt;Three environment variables on the OpenHands container do the entire job. The &lt;code&gt;LLM_BASE_URL&lt;/code&gt; tells LiteLLM where to send requests, the &lt;code&gt;LLM_API_KEY&lt;/code&gt; is a placeholder (Lynkr accepts anything because the real auth lives upstream), and the &lt;code&gt;openai/&lt;/code&gt; prefix on &lt;code&gt;LLM_MODEL&lt;/code&gt; tells LiteLLM to use OpenAI wire format against Lynkr's &lt;code&gt;/v1/chat/completions&lt;/code&gt; endpoint — even when the model on the other side is Claude or Gemini.&lt;/p&gt;

&lt;p&gt;Full copy-pasteable command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-it&lt;/span&gt; &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;--pull&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;always &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;AGENT_SERVER_IMAGE_REPOSITORY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ghcr.io/openhands/agent-server &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;AGENT_SERVER_IMAGE_TAG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1.19.1-python &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;LOG_ALL_EVENTS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;LLM_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"http://host.docker.internal:8081/v1"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;LLM_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"sk-lynkr"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;LLM_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"openai/claude-sonnet-4-6"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; /var/run/docker.sock:/var/run/docker.sock &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; ~/.openhands:/.openhands &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 3000:3000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--add-host&lt;/span&gt; host.docker.internal:host-gateway &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; openhands-app &lt;span class="se"&gt;\&lt;/span&gt;
  docker.openhands.dev/openhands/openhands:1.7
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;code&gt;http://localhost:3000&lt;/code&gt;, give OpenHands a task, and watch the proxy do its job. If you tail &lt;code&gt;lynkr&lt;/code&gt; logs in another terminal you'll see the routing decisions in real time — which tier each request landed on, latency, cached vs cold, and the quality score.&lt;/p&gt;

&lt;p&gt;If you prefer &lt;code&gt;config.toml&lt;/code&gt; over env vars, OpenHands also accepts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[llm]&lt;/span&gt;
&lt;span class="py"&gt;model&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"openai/claude-sonnet-4-6"&lt;/span&gt;
&lt;span class="py"&gt;base_url&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"http://localhost:8081/v1"&lt;/span&gt;
&lt;span class="py"&gt;api_key&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"sk-lynkr"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same pattern works from OpenHands' GUI under &lt;code&gt;Settings → LLM → Advanced&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Actually Happening Under the Hood
&lt;/h2&gt;

&lt;p&gt;When OpenHands fires a request, here's the full lifecycle:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;OpenHands → LiteLLM&lt;/strong&gt; formats the call as OpenAI Chat Completions and ships it to &lt;code&gt;http://host.docker.internal:8081/v1/chat/completions&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lynkr ingests&lt;/strong&gt; the request, runs token counting and budget enforcement.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimization pipeline&lt;/strong&gt; applies prompt caching (SHA-256 LRU), memory deduplication, tool truncation (Code Mode collapses MCP tool definitions), and history compression.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complexity analysis&lt;/strong&gt; scores the request across 15 dimensions including Graphify AST signals; the router picks a tier.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Format translation&lt;/strong&gt; converts to whatever wire protocol the destination needs — Bedrock Converse, Vertex Gemini, Anthropic Messages, or stays OpenAI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provider invocation&lt;/strong&gt; via the unified &lt;code&gt;invokeModel()&lt;/code&gt; abstraction. If the circuit breaker is open, the request silently fails over.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Response translation&lt;/strong&gt; converts back to OpenAI Chat Completions for LiteLLM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Telemetry&lt;/strong&gt; writes the routing decision, latency percentiles, and quality score to SQLite for later inspection.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory subsystem&lt;/strong&gt; scores the exchange for surprise / recency / relevance and stores any high-signal observations for future injection.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;OpenHands sees a perfectly normal OpenAI response. It has no idea any of that happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gotchas Worth Knowing
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Don't&lt;/strong&gt; also set &lt;code&gt;ANTHROPIC_BASE_URL&lt;/code&gt; inside the OpenHands container. OpenHands talks via LiteLLM, not the Anthropic SDK directly — the &lt;code&gt;LLM_*&lt;/code&gt; env vars are the correct lever.&lt;/li&gt;
&lt;li&gt;On Linux, the &lt;code&gt;--add-host host.docker.internal:host-gateway&lt;/code&gt; flag is what lets the container resolve back to your host. The command above already includes it; don't drop it.&lt;/li&gt;
&lt;li&gt;Keep the &lt;code&gt;openai/&lt;/code&gt; prefix on &lt;code&gt;LLM_MODEL&lt;/code&gt;. Without it, LiteLLM tries to use the Anthropic SDK against an OpenAI-shaped endpoint and fails confusingly.&lt;/li&gt;
&lt;li&gt;If you're running Lynkr in Docker too, put both containers on the same Docker network and reference Lynkr by container name instead of &lt;code&gt;host.docker.internal&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;LLM_API_KEY&lt;/code&gt; value is ignored by Lynkr but &lt;strong&gt;must be set to something&lt;/strong&gt; — LiteLLM refuses to send requests without an API key field present.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;The reason this combo works isn't that either project is doing something magical in isolation — it's that they respect the same boundary. OpenHands knows it shouldn't care which model is on the other side of LiteLLM; Lynkr knows it shouldn't care which tool is asking. That clean separation is what makes the stack composable. Tomorrow you can swap OpenHands for Aider or Cline without touching Lynkr's config. Next month you can add Kimi K2 or whatever Mistral ships next without touching OpenHands. The agent layer and the routing layer evolve independently, and your wallet quietly benefits from both.&lt;/p&gt;

&lt;p&gt;That's the version of the AI coding stack worth betting on in 2026: open, local-first, model-agnostic, and configurable from one file you actually control.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you want the source: OpenHands lives at &lt;a href="https://github.com/All-Hands-AI/OpenHands" rel="noopener noreferrer"&gt;github.com/All-Hands-AI/OpenHands&lt;/a&gt;, Lynkr at &lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;github.com/Fast-Editor/Lynkr&lt;/a&gt;. Both accept contributions, and both maintainers ship fast.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>automation</category>
      <category>opensource</category>
    </item>
    <item>
      <title>HTML Is the New Markdown</title>
      <dc:creator>Vishal VeeraReddy</dc:creator>
      <pubDate>Sun, 17 May 2026 07:47:23 +0000</pubDate>
      <link>https://dev.to/lynkr/html-is-the-new-markdown-3hf8</link>
      <guid>https://dev.to/lynkr/html-is-the-new-markdown-3hf8</guid>
      <description>&lt;p&gt;&lt;em&gt;A response to Thariq Shihipar's "HTML is the new markdown" post — and a practical answer for anyone watching their per-request costs creep up.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The tweet that started it
&lt;/h2&gt;

&lt;p&gt;On May 8, Thariq Shihipar — a member of the Claude Code team at Anthropic — &lt;a href="https://x.com/trq212/status/2052811606032269638" rel="noopener noreferrer"&gt;posted&lt;/a&gt; what is now one of the most-discussed dev takes of the quarter:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"HTML is the new markdown. I've stopped writing markdown files for almost everything and switched to using Claude Code to generate HTML for me."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The case he made was genuinely compelling. HTML, he argued, lets Claude do things markdown simply can't:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Inline SVG diagrams&lt;/strong&gt; instead of fenced ASCII art&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interactive widgets&lt;/strong&gt; — sliders, toggles, collapsible sections — so a PR review or architectural doc becomes a navigable artifact instead of a wall of text&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;In-page navigation&lt;/strong&gt;, color-coded severity, real layout&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic structure&lt;/strong&gt; the model can hang richer reasoning on&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thariq's example prompt — &lt;em&gt;"Help me review this PR by creating an HTML artifact… color-code findings by severity"&lt;/em&gt; — captures the appeal in one line. The output looks like a small internal tool, not a chat log. Once you see it, going back to plain markdown feels like trading a dashboard for a printout.&lt;/p&gt;

&lt;p&gt;Simon Willison &lt;a href="https://simonwillison.net/2026/May/8/unreasonable-effectiveness-of-html/" rel="noopener noreferrer"&gt;agreed enough to write it up&lt;/a&gt; the same week. Then the dissents arrived.&lt;/p&gt;

&lt;h2&gt;
  
  
  The counter-argument: this is expensive
&lt;/h2&gt;

&lt;p&gt;The most-shared rebuttal came from Kurtis Redux's &lt;a href="https://kurtis-redux.medium.com/the-unreasonable-ineffectiveness-of-html-5bd01ae1e879" rel="noopener noreferrer"&gt;"The Unreasonable &lt;em&gt;Ineffectiveness&lt;/em&gt; of HTML"&lt;/a&gt;. Three points landed:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;HTML is 2–4× slower to generate&lt;/strong&gt; than the markdown equivalent — and roughly 2–4× more output tokens to render the same information.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verbose markup dilutes model attention.&lt;/strong&gt; The longer the output, the easier it is for the model to lose the plot, repeat itself, or hallucinate structure halfway down.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output tokens are the expensive tokens.&lt;/strong&gt; Across every major API price card, output is 3–5× the cost of input. Recommending 2–4× more output is a real line item.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Then came the uncomfortable subtext from a few corners of the timeline: &lt;em&gt;Anthropic sells by token. Of course a Claude Code engineer wants you generating more of them.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I don't think that's fair to Thariq — he's been &lt;a href="https://x.com/trq212/status/2024574133011673516" rel="noopener noreferrer"&gt;shipping&lt;/a&gt; thoughtful, often unflattering-to-Anthropic posts about prompt caching and token efficiency for a year. But the structural concern stands: the people best positioned to tell us how to use these tools also have the most to gain when we use them more.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real question
&lt;/h2&gt;

&lt;p&gt;The argument has been framed as a binary: rich-but-expensive HTML, or cheap-but-flat markdown. Pick a side.&lt;/p&gt;

&lt;p&gt;That framing is wrong. The question that actually matters is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Can I get the HTML output Thariq is excited about — without paying the HTML token bill?&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The answer is yes, and most of it has nothing to do with which model you're using. It has to do with everything that happens &lt;em&gt;around&lt;/em&gt; the model call. That's the slice I've been working on with &lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;Lynkr&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's actually inflating your token count
&lt;/h2&gt;

&lt;p&gt;Before talking about HTML output, it's worth being honest about where tokens &lt;em&gt;actually&lt;/em&gt; go in a Claude Code-style session. In a typical agentic loop, the output HTML is the &lt;strong&gt;smallest&lt;/strong&gt; chunk. The big spenders are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The system prompt and tool definitions&lt;/strong&gt; — re-sent every single turn. Claude Code ships with 50+ tools, each with verbose JSON schemas. That's tens of thousands of input tokens per turn before the user has typed anything.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Conversation history&lt;/strong&gt; — every prior assistant message, every tool call, every tool result, replayed each step.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool results&lt;/strong&gt; — &lt;code&gt;cargo build&lt;/code&gt;, &lt;code&gt;pytest -v&lt;/code&gt;, &lt;code&gt;eslint .&lt;/code&gt;, &lt;code&gt;git diff&lt;/code&gt;. A failing test run can dump 40k tokens of stack trace.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP tool catalogs&lt;/strong&gt; — wire up half a dozen MCP servers and your tool schema can balloon past 100k tokens.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The HTML the model &lt;em&gt;writes&lt;/em&gt; is real cost, but it's measured in hundreds-to-low-thousands of output tokens. The HTML-vs-markdown debate is fighting over the tip while the iceberg sits unchallenged.&lt;/p&gt;

&lt;p&gt;If you want HTML outputs without a wince at the invoice, the lever isn't "stop generating HTML." The lever is "stop sending the model a quarter-million tokens of repeated context to &lt;em&gt;get&lt;/em&gt; to that HTML."&lt;/p&gt;

&lt;h2&gt;
  
  
  How Lynkr attacks the iceberg
&lt;/h2&gt;

&lt;p&gt;Lynkr is a self-hosted proxy that sits between your coding tool (Claude Code, Cursor, Codex CLI, Cline, Continue) and any of 13+ LLM providers. It speaks Anthropic's &lt;code&gt;/v1/messages&lt;/code&gt; and OpenAI's &lt;code&gt;/v1/chat/completions&lt;/code&gt;, so the client doesn't know it's there. What it does in the middle is the interesting part. A few of the layers most relevant to the HTML-output discussion:&lt;/p&gt;

&lt;h3&gt;
  
  
  0. Preflight short-circuit (zero tokens)
&lt;/h3&gt;

&lt;p&gt;The most extreme expression of the thesis. Before any model call happens, Lynkr can run a user-supplied shell precondition — &lt;code&gt;pytest path/to/test.py&lt;/code&gt;, &lt;code&gt;tsc --noEmit&lt;/code&gt;, &lt;code&gt;lint --quiet&lt;/code&gt;, whatever proves the work is already done. If every command exits 0, Lynkr returns a synthetic "preflight satisfied" response without calling the LLM at all.&lt;/p&gt;

&lt;p&gt;The use case sounds niche until you run a real agent loop: a CI-triggered "fix the failing test" request that arrives 90 seconds &lt;em&gt;after&lt;/em&gt; the test was already fixed on another branch. An idempotent agent retry. A scheduled refactor job whose work was finished by a previous run. Today, every one of those wastes a full agentic loop — sometimes hundreds of thousands of tokens — to rediscover that the answer is "nothing to do."&lt;/p&gt;

&lt;p&gt;The most expensive HTML output is the one you generate to explain a problem that no longer exists. Preflight is the layer that just doesn't.&lt;/p&gt;

&lt;p&gt;Opt-in via &lt;code&gt;preflight_commands&lt;/code&gt; in the request payload, gated on &lt;code&gt;LYNKR_PREFLIGHT_ENABLED=true&lt;/code&gt;. Inspired by the CodexSaver routing patterns.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Smart tool selection (~50–70% token reduction on tool defs)
&lt;/h3&gt;

&lt;p&gt;Claude Code sends every registered tool definition on every turn. Lynkr classifies the incoming request — is this a code edit, a search, a refactor, a "generate an HTML artifact" task? — and ships only the tools that classification actually needs. The model doesn't see the 40 it won't use.&lt;/p&gt;

&lt;p&gt;For an "HTML artifact" task specifically, you don't need &lt;code&gt;Bash&lt;/code&gt;, &lt;code&gt;Glob&lt;/code&gt;, &lt;code&gt;Git&lt;/code&gt;, the test runners, or MCP tools. You need read, write, and maybe web search. The savings compound across every turn of the loop.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Tool result compression
&lt;/h3&gt;

&lt;p&gt;When the model calls &lt;code&gt;pytest&lt;/code&gt; and gets back 40k tokens of output, Lynkr's tool-result compressor detects the pattern (test runner, linter, build, git diff) and compresses it before the model sees it on the &lt;em&gt;next&lt;/em&gt; turn — usually keeping the failing assertions and dropping the redundant traceback frames. A tee cache holds the full version if anything downstream needs it.&lt;/p&gt;

&lt;p&gt;In an agentic HTML-generation flow ("read the codebase, then produce an HTML report"), tool results are typically the largest single token category. Compressing them shrinks every subsequent turn.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. History compression (Distill-style structural dedup)
&lt;/h3&gt;

&lt;p&gt;Long sessions accumulate near-duplicate state: the model re-reads the same file four times, re-runs a slightly-different version of the same query, repeats its own reasoning. Lynkr's history compressor deduplicates older turns structurally while preserving the most recent N verbatim. The model still has the context; it doesn't pay for it twice.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Code Mode for MCP (~96% reduction)
&lt;/h3&gt;

&lt;p&gt;If you've wired up MCP servers, this one's almost embarrassing. A typical MCP catalog — Linear, GitHub, Notion, Slack, a few internal ones — is 100+ tool definitions and tens of thousands of tokens &lt;em&gt;every turn&lt;/em&gt;. Lynkr's Code Mode replaces all of them with four meta-tools (&lt;code&gt;listResources&lt;/code&gt;, &lt;code&gt;readResource&lt;/code&gt;, &lt;code&gt;callTool&lt;/code&gt;, &lt;code&gt;searchTools&lt;/code&gt;) and lets the model discover the rest at runtime. Same capability surface, ~96% fewer tokens spent on the catalog.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Risk-aware smart routing across model tiers
&lt;/h3&gt;

&lt;p&gt;This is the one most directly relevant to the HTML debate. Generating a polished HTML artifact really does want a strong model. But the &lt;strong&gt;fifteen prior turns&lt;/strong&gt; that gathered the data, ran searches, and figured out what to put in that artifact often don't.&lt;/p&gt;

&lt;p&gt;Lynkr routes on two orthogonal axes: a 15-dimension &lt;strong&gt;complexity&lt;/strong&gt; analysis (how hard is the task) and an independent &lt;strong&gt;risk&lt;/strong&gt; score (how dangerous is it if the model gets this wrong). A request that touches &lt;code&gt;auth/*&lt;/code&gt;, &lt;code&gt;payments/*&lt;/code&gt;, &lt;code&gt;migrations/*&lt;/code&gt;, &lt;code&gt;.env&lt;/code&gt;, or &lt;code&gt;.github/workflows&lt;/code&gt;, or that names high-risk concepts like authentication or deployment, is forced to the COMPLEX tier regardless of complexity score. Cheap tier for the cheap turns; frontier tier whenever the blast radius warrants it.&lt;/p&gt;

&lt;p&gt;The four configurable tiers — &lt;code&gt;TIER_SIMPLE&lt;/code&gt;, &lt;code&gt;TIER_MEDIUM&lt;/code&gt;, &lt;code&gt;TIER_COMPLEX&lt;/code&gt;, &lt;code&gt;TIER_REASONING&lt;/code&gt; — let you map each to whatever provider:model pair makes sense. A greeting goes to a local Ollama model and costs zero. A file-read summarization goes to Haiku. The final "now write the HTML artifact" call goes to Opus or Sonnet, where it belongs.&lt;/p&gt;

&lt;p&gt;Pay frontier prices for frontier work and high-risk work. Pay nothing for the rest.&lt;/p&gt;

&lt;p&gt;Optionally, &lt;code&gt;LYNKR_VISIBLE_ROUTING=true&lt;/code&gt; attaches a human-readable interaction block to each response — tier, provider, model, risk level, estimated savings — so the routing decision shows up &lt;em&gt;inside&lt;/em&gt; Claude Code instead of buried in headers no one reads.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Semantic and prompt caching
&lt;/h3&gt;

&lt;p&gt;Identical requests hit the prompt cache (SHA-256 keyed LRU, 5min TTL). Near-identical requests — same intent, different phrasing — hit the semantic cache (embedding cosine similarity at 0.95). For repeated "generate an HTML report of X" prompts within a working session, the second call can be effectively free.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Headroom sidecar (optional ML compression)
&lt;/h3&gt;

&lt;p&gt;For users who really want to push it, there's an optional Python sidecar running LLMLingua-style transforms (Smart Crusher, Continuous Context Reduction, Tool Crusher, Cache Aligner, Rolling Window). Pure context-side compression — the model output doesn't change, the input it operates on gets smaller.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. Token budget enforcement
&lt;/h3&gt;

&lt;p&gt;Each model's context window is tracked. At 85% utilization, Lynkr automatically applies adaptive compression rather than letting the request blow past the window mid-generation. You don't get the 400-error-then-retry burn that costs you the input tokens twice.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this looks like in practice
&lt;/h2&gt;

&lt;p&gt;A concrete scenario: you ask Claude Code to "review this PR and produce an HTML report grouped by severity" — exactly Thariq's example.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Without a proxy layer&lt;/strong&gt;, that's 8–12 model turns. Each turn re-sends the full system prompt, 50+ tool definitions, the growing conversation history, and accumulated tool results. The final turn produces ~2,000 tokens of HTML. Total input cost across the run: somewhere between 400k and 1.2M tokens depending on repo size. The HTML output is maybe 0.5% of the bill.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With Lynkr in the middle&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If the PR's findings are already merged or the failing check has self-resolved, preflight returns "satisfied" without a single model token spent.&lt;/li&gt;
&lt;li&gt;Smart tool selection trims the per-turn tool-def overhead by ~60%.&lt;/li&gt;
&lt;li&gt;Tool result compression keeps &lt;code&gt;git diff&lt;/code&gt; and test output from dominating subsequent turns.&lt;/li&gt;
&lt;li&gt;History compression dedupes the redundant intermediate state.&lt;/li&gt;
&lt;li&gt;Risk-aware routing sends the search/read turns to a cheap tier; the final HTML generation — and anything touching protected paths — goes to your strongest model.&lt;/li&gt;
&lt;li&gt;The semantic cache catches the second time you ask for the same review.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You keep the rich HTML output. You stop paying for the parts of the run that don't earn their tokens.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;Thariq is right. HTML output really is better for the kinds of artifacts coding agents produce — and "we should use markdown because tokens are expensive" is an argument from a 32k-context-window world we no longer live in.&lt;/p&gt;

&lt;p&gt;The critics are also right. HTML is more expensive than markdown to generate, the cost matters, and "use the richer format" without a story for cost control is a recommendation that benefits the seller more than the buyer.&lt;/p&gt;

&lt;p&gt;What's missing from both sides of the timeline is the recognition that &lt;strong&gt;the output format is not the cost lever&lt;/strong&gt;. The cost lever is everything upstream of the final response: which tools you send, how you compress prior turns, which tier you route to, what you cache, whether the work even needs to run at all.&lt;/p&gt;

&lt;p&gt;Generate the HTML. Then put a smarter proxy in front of the model so you can afford to do it again tomorrow — and so that sometimes, when the answer is already on disk, you don't have to.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Lynkr is open source (Apache 2.0) and self-hosted: &lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;https://github.com/Fast-Editor/Lynkr&lt;/a&gt;. Install with &lt;code&gt;npm i -g lynkr&lt;/code&gt;, point your coding tool's base URL at &lt;code&gt;localhost:8080&lt;/code&gt;, and you're routing.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you've measured your own HTML-vs-markdown token deltas in real workflows, I'd love to compare numbers — reply with your results.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>html</category>
      <category>productivity</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Open Cowork : A Free, Alternative to claude cowork</title>
      <dc:creator>Vishal VeeraReddy</dc:creator>
      <pubDate>Thu, 14 May 2026 23:05:00 +0000</pubDate>
      <link>https://dev.to/lynkr/open-cowork-lynkr-a-free-local-ai-agent-that-actually-does-the-work-204p</link>
      <guid>https://dev.to/lynkr/open-cowork-lynkr-a-free-local-ai-agent-that-actually-does-the-work-204p</guid>
      <description>&lt;h4&gt;
  
  
  How to set up a desktop AI agent on your own machine, route its model calls through a local proxy, and stop paying $20/month for chatbots that only talk.
&lt;/h4&gt;




&lt;p&gt;There is a meaningful gap between AI tools that &lt;em&gt;describe&lt;/em&gt; what to do and AI tools that &lt;em&gt;do&lt;/em&gt; it. ChatGPT, Claude, Gemini, Perplexity — these are chat boxes. You ask, they answer in words, and you are still the one opening PowerPoint, dragging files around, summarizing the webpage by hand. The actual work hasn't moved.&lt;/p&gt;

&lt;p&gt;The newer category — agentic desktop apps — closes that gap. Anthropic shipped Claude Cowork. The open-source community shipped Open Cowork, which does the same thing without locking you to one vendor. Pair it with Lynkr, a local Anthropic-compatible proxy, and you get a complete AI agent stack running on your own machine with no monthly subscription.&lt;/p&gt;

&lt;p&gt;This article explains what Open Cowork is, what Lynkr is, why they pair well together, and how to set up the integration end to end.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part I — What Is Open Cowork?
&lt;/h2&gt;

&lt;p&gt;Open Cowork is an open-source desktop AI agent application for Windows and macOS. The project lives at &lt;code&gt;github.com/OpenCoworkAI/open-cowork&lt;/code&gt; and ships as a one-click Electron installer — no terminal commands, no Python setup, no manual dependency wrangling. You download, install, and it opens as a normal desktop app.&lt;/p&gt;

&lt;p&gt;The shape of the app is unusual compared to a chat tool. It is structured around three primary surfaces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;chat window&lt;/strong&gt; where you describe what you want&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;workspace folder&lt;/strong&gt; on your filesystem that the agent can read from and write to&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;trace panel&lt;/strong&gt; that shows the agent's reasoning and every tool it executes in real time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What makes it different from a chat tool is that the AI doesn't just respond. It executes. It generates real files, opens your browser, reads pages, and interacts with other applications on your computer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Core capabilities
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Document generation through the Skills system.&lt;/strong&gt; Open Cowork ships with built-in workflows for producing PPTX, DOCX, XLSX, and PDF files. You ask for a 10-slide pitch deck and you get a real PowerPoint file with editable text, a chosen color scheme, and structured content. Not a markdown summary. Not a description of what the slides should contain. An actual file you can open in Keynote or PowerPoint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP (Model Context Protocol) integration.&lt;/strong&gt; Open Cowork connects to external services through MCP connectors — browser automation, Notion, custom internal apps. Once a connector is configured, the AI can drive those services as part of its workflow. Open a webpage, extract structured data, push it into a Notion database.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GUI automation (computer use).&lt;/strong&gt; The app supports controlling other desktop applications by reading the screen and operating the mouse and keyboard. The recommended model for this surface is Gemini-3-Pro per their documentation, but any vision-capable model works.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sandbox isolation via WSL2 and Lima.&lt;/strong&gt; Every command the agent executes runs inside a virtual machine — WSL2 on Windows, Lima on macOS. The host filesystem is protected. Even if the AI is told to run something destructive, it physically cannot reach the parts of your computer you didn't authorize.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Remote control through Feishu and Slack.&lt;/strong&gt; You can send tasks to Open Cowork from your phone via Slack messages or Feishu (the Chinese equivalent). The agent executes them on your desktop machine and reports back through the same channel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multimodal input.&lt;/strong&gt; Drag images, PDFs, and other files directly into the chat input. The agent can read them as part of the conversation.&lt;/p&gt;

&lt;h3&gt;
  
  
  How it compares to Claude Cowork
&lt;/h3&gt;

&lt;p&gt;Claude Cowork is Anthropic's first-party desktop agent app. Open Cowork is the community's open-source implementation of the same idea. The functional capabilities overlap substantially:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Claude Cowork&lt;/th&gt;
&lt;th&gt;Open Cowork&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MCP + Skills&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sandbox isolation&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Remote control (Slack/Feishu)&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GUI operation (computer use)&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model flexibility&lt;/td&gt;
&lt;td&gt;Claude only&lt;/td&gt;
&lt;td&gt;Claude, GPT, Gemini, DeepSeek, GLM, MiniMax, Kimi, Ollama, custom&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost&lt;/td&gt;
&lt;td&gt;Anthropic subscription&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The biggest practical difference is model flexibility. Claude Cowork only works with Claude, which means you pay Anthropic for every call. Open Cowork lets you plug in any provider, including a local model running on Ollama — which is where Lynkr comes in.&lt;/p&gt;

&lt;h3&gt;
  
  
  Configuration model
&lt;/h3&gt;

&lt;p&gt;Open Cowork is configured through three primary mechanisms:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;.env&lt;/code&gt; file in the project root.&lt;/strong&gt; The Anthropic SDK respects standard environment variables: &lt;code&gt;ANTHROPIC_AUTH_TOKEN&lt;/code&gt;, &lt;code&gt;ANTHROPIC_BASE_URL&lt;/code&gt;, &lt;code&gt;CLAUDE_MODEL&lt;/code&gt;. This is the simplest way to wire up a custom backend.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In-app Settings panel.&lt;/strong&gt; The GUI has a full API Configuration page with provider presets (Anthropic, OpenAI, OpenRouter, Gemini, Ollama) and a custom-provider option with editable base URL, selectable protocol (&lt;code&gt;anthropic&lt;/code&gt; / &lt;code&gt;openai&lt;/code&gt; / &lt;code&gt;gemini&lt;/code&gt;), and a free-text API key field.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Workspace folder selection.&lt;/strong&gt; On first launch, you pick a folder on your filesystem that the agent will read from and write to. Everything happens inside that folder. The Lima or WSL2 VM mounts this folder so the agent can operate on it without ever touching the rest of your disk.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part II — What Is Lynkr?
&lt;/h2&gt;

&lt;p&gt;Lynkr is a Node.js proxy that exposes a unified Anthropic-compatible API surface at &lt;code&gt;/v1/messages&lt;/code&gt; while internally routing each request to whichever provider and model is most appropriate.&lt;/p&gt;

&lt;p&gt;The core capabilities relevant to this integration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic and OpenAI API compatibility.&lt;/strong&gt; Lynkr accepts requests in either format on the same port. Tools using the Anthropic SDK and tools using the OpenAI SDK can hit the same instance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complexity-based routing.&lt;/strong&gt; Lynkr analyzes each incoming request, scores it for complexity, and routes simple requests to a local Ollama model and complex requests to Claude Opus or Sonnet.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-request model override.&lt;/strong&gt; Clients can specify an exact model and Lynkr will honor it, otherwise it picks based on the configured tier.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token budget enforcement.&lt;/strong&gt; Lynkr tracks cumulative spend per project and can throttle or downshift when budgets are exceeded.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Telemetry dashboard.&lt;/strong&gt; Every request is logged with provider, latency, tokens, and cost. The dashboard at &lt;code&gt;http://localhost:8081/dashboard&lt;/code&gt; shows live throughput, request volume by provider, and cumulative spend.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For Open Cowork specifically, three of these matter most: &lt;strong&gt;complexity routing&lt;/strong&gt; (most agent calls are short summarization or tool-result handling, which local Ollama handles fine), &lt;strong&gt;telemetry&lt;/strong&gt; (see exactly how much each session costs), and &lt;strong&gt;API compatibility&lt;/strong&gt; (Open Cowork's Anthropic SDK calls flow through Lynkr without any code change).&lt;/p&gt;




&lt;h2&gt;
  
  
  Part III — Why Pair Them?
&lt;/h2&gt;

&lt;p&gt;A typical Open Cowork session involves dozens of model calls per task. The agent plans, then makes a tool call, then summarizes the tool result, then plans the next step, then makes another tool call, and so on. Each round trip is a separate API call. A single "organize my downloads folder" task can fire fifteen to twenty model calls before finishing.&lt;/p&gt;

&lt;p&gt;If every one of those calls hits Claude Opus, you are looking at roughly $0.10-$0.30 per task. A working session of fifty tasks runs $5-$15. Continuous daily use is hundreds of dollars per month. For most people, this defeats the entire point of using a "free" open-source app.&lt;/p&gt;

&lt;p&gt;Most of those calls don't need a frontier model. Summarizing a tool result is a 200-token job. Picking the next file to read is a 100-token job. Deciding whether to continue or stop is a 50-token job. These are exactly the kinds of tasks where a local model running on Ollama produces identical-quality output at zero cost.&lt;/p&gt;

&lt;p&gt;Routing Open Cowork's calls through Lynkr changes the economics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trivial agent calls&lt;/strong&gt; (tool result handling, status checks, simple reasoning) → local Ollama → $0&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document generation and planning&lt;/strong&gt; → cloud Sonnet → ~$0.05 per call&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GUI automation and computer use&lt;/strong&gt; → cloud Opus or Gemini-3-Pro → $0.30 per call&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In practice this comes out to a 10-20x reduction in cost per session with no loss of fidelity on the operations that actually need a capable model.&lt;/p&gt;

&lt;p&gt;The second benefit is privacy. The calls routed to local Ollama never leave your machine. Files you reference, prompts you write, and intermediate reasoning all stay on your laptop. For users working with sensitive data, internal company information, or anything under compliance constraints, this is the difference between Open Cowork being usable and being a non-starter.&lt;/p&gt;

&lt;p&gt;The third benefit is model choice. Lynkr can route different request shapes to different providers — a Gemini call for vision tasks, a Sonnet call for planning, a local Qwen call for short summarization. Open Cowork sees one base URL and one model name. Lynkr does the dispatch.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part IV — Setup Guide
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Node.js 22+&lt;/td&gt;
&lt;td&gt;Run Open Cowork and Lynkr&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lima (macOS) or WSL2 (Windows)&lt;/td&gt;
&lt;td&gt;Sandbox isolation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ollama 0.4+&lt;/td&gt;
&lt;td&gt;Local model inference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lynkr&lt;/td&gt;
&lt;td&gt;The proxy&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Pull a coding-aware model into Ollama for the simple-tier routing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull qwen2.5-coder:7b
&lt;span class="c"&gt;# optional: a larger model for moderate-complexity calls&lt;/span&gt;
ollama pull minimax-m2.5:cloud
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 1 — Start Lynkr
&lt;/h3&gt;

&lt;p&gt;From your Lynkr installation directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;node bin/cli.js start &lt;span class="nt"&gt;--port&lt;/span&gt; 8081
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lynkr starts, auto-discovers your local Ollama instance, and is ready to accept requests at &lt;code&gt;http://localhost:8081&lt;/code&gt;. Verify it's running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:8081/health
&lt;span class="c"&gt;# {"status":"ok"}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open the dashboard at &lt;code&gt;http://localhost:8081/dashboard&lt;/code&gt; in your browser and leave it visible — you'll use it later to confirm routing is working.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2 — Clone and install Open Cowork
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/OpenCoworkAI/open-cowork.git
&lt;span class="nb"&gt;cd &lt;/span&gt;open-cowork
npm &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The install takes about five minutes. It downloads the bundled Node binaries, builds the WSL agent (Windows) or Lima agent (macOS), bundles the MCP servers, and rebuilds &lt;code&gt;better-sqlite3&lt;/code&gt; against Electron's ABI. Don't be alarmed by the number of packages — Electron apps are heavy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3 — Configure the Lynkr backend
&lt;/h3&gt;

&lt;p&gt;Create &lt;code&gt;.env&lt;/code&gt; in the Open Cowork root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Any non-empty string. Lynkr doesn't validate the key for local providers.&lt;/span&gt;
&lt;span class="nv"&gt;ANTHROPIC_AUTH_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;local&lt;/span&gt;

&lt;span class="c"&gt;# Point the Anthropic SDK at Lynkr instead of api.anthropic.com.&lt;/span&gt;
&lt;span class="nv"&gt;ANTHROPIC_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:8081

&lt;span class="c"&gt;# The model name Open Cowork sends. Lynkr's router may override this&lt;/span&gt;
&lt;span class="c"&gt;# based on request complexity — leaving it as a cloud model name is fine.&lt;/span&gt;
&lt;span class="nv"&gt;CLAUDE_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;claude-sonnet-4-6
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the entire integration. Open Cowork uses the Anthropic SDK under the hood, the SDK respects &lt;code&gt;ANTHROPIC_BASE_URL&lt;/code&gt;, and every call now flows through Lynkr before reaching any actual provider.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4 — Launch Open Cowork
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first launch takes 60-90 seconds because vite is starting, the sandbox agents are being built, and Electron is opening for the first time. When the desktop window appears, you'll be prompted to pick a workspace folder. Create one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; ~/cowork-workspace
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Point Open Cowork at &lt;code&gt;~/cowork-workspace&lt;/code&gt;. This is now the only directory the agent can read from and write to.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5 — Verify the integration
&lt;/h3&gt;

&lt;p&gt;Before sending your first message, make sure the Lynkr dashboard is open in another window. Then in Open Cowork, send a simple test prompt:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Read any text file in my workspace and tell me what's inside.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Watch the Lynkr dashboard. You should see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Request count incrementing in real time&lt;/li&gt;
&lt;li&gt;A mix of providers in the breakdown (likely &lt;code&gt;ollama&lt;/code&gt; for the simple turns, &lt;code&gt;anthropic&lt;/code&gt; for the planning turns)&lt;/li&gt;
&lt;li&gt;Latency distribution showing local Ollama calls completing in 1-3s and cloud calls in 5-15s&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If nothing shows up on the dashboard, Open Cowork is hitting Anthropic directly — the SDK in their agent runtime bypassed our env var. This is uncommon but possible. The fix is a one-line change to where the SDK client is constructed in their codebase to honor &lt;code&gt;ANTHROPIC_BASE_URL&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6 — Try a real task
&lt;/h3&gt;

&lt;p&gt;Drop a file into your workspace folder and send a task that exercises the full agent loop:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Organize the files in this folder by type. Move images into an "images" subfolder, documents into "docs", and code into "code". Tell me what you moved.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You'll see the trace panel on the right populate with each tool call: &lt;code&gt;list_files&lt;/code&gt;, &lt;code&gt;read_file&lt;/code&gt;, &lt;code&gt;move_file&lt;/code&gt;, and so on. Every one of those tool result summaries fires a model call. Most of them should route to local Ollama via Lynkr. The high-level plan and final summary may route to Sonnet.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part V — Advanced Configuration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Routing tiers tuned for an agent loop
&lt;/h3&gt;

&lt;p&gt;The default Lynkr complexity scoring works well for chat workloads, but agent loops have a different shape. The bulk of calls are tool-result handling and planning between tool calls, both of which are short. You probably want to lower the threshold that pushes calls to Sonnet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"routing"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tiers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"SIMPLE"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ollama"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"qwen2.5-coder:7b"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"MODERATE"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"anthropic"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"claude-sonnet-4-6"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"COMPLEX"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"anthropic"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"claude-opus-4-7"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"complexity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"thresholds"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"simple"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"complex"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lower thresholds mean more calls go to Ollama. Tune until you see a routing mix of roughly 70% local, 25% Sonnet, 5% Opus on the dashboard.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vision-capable routing for GUI tasks
&lt;/h3&gt;

&lt;p&gt;Open Cowork's computer-use feature needs a model that can read screenshots. Local Ollama coding models don't support vision. Configure Lynkr to detect image content in requests and route them to Gemini-3-Pro or Claude Sonnet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"routing"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"vision"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"google"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gemini-3.1-pro-preview"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This keeps cost low on text-only calls while routing the heavy GUI-interpretation calls to a model that can actually see the screen.&lt;/p&gt;

&lt;h3&gt;
  
  
  Per-session budget caps
&lt;/h3&gt;

&lt;p&gt;If you want to prevent any single Open Cowork session from going over a spend ceiling, set a budget env var:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;LYNKR_BUDGET_USD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;2.00
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lynkr will refuse new requests once the limit is hit, preventing a runaway loop on a malformed prompt from burning through a hundred dollars overnight.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multiple workspaces, same backend
&lt;/h3&gt;

&lt;p&gt;If you want different Open Cowork workspaces (work, personal, side project) to use different routing policies, run multiple Lynkr instances on different ports and point each Open Cowork install at the right one. The Lynkr CLI supports &lt;code&gt;--config &amp;lt;path&amp;gt;&lt;/code&gt; to load a workspace-specific routing config.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part VI — Troubleshooting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  "I sent a message but nothing showed up on the Lynkr dashboard"
&lt;/h3&gt;

&lt;p&gt;The Open Cowork agent runtime (the &lt;code&gt;pi-coding-agent&lt;/code&gt; library it depends on) is not your standard Anthropic SDK usage. It may construct its HTTP client in a way that ignores &lt;code&gt;ANTHROPIC_BASE_URL&lt;/code&gt;. To check, look at network traffic during a session — if requests are going to &lt;code&gt;api.anthropic.com&lt;/code&gt; instead of &lt;code&gt;localhost:8081&lt;/code&gt;, the env var is being bypassed.&lt;/p&gt;

&lt;p&gt;The fix is small but requires a code change. Find where the Anthropic client is instantiated in Open Cowork's source (likely under &lt;code&gt;src/main/claude/&lt;/code&gt; based on the file structure) and confirm it reads &lt;code&gt;process.env.ANTHROPIC_BASE_URL&lt;/code&gt; when constructing the client. If it doesn't, add it.&lt;/p&gt;

&lt;h3&gt;
  
  
  "The agent says it can't read my PDF"
&lt;/h3&gt;

&lt;p&gt;Three likely causes, in order of probability:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Lynkr routed the request to a local Ollama model that doesn't support PDF content blocks.&lt;/strong&gt; PDF support is a Claude-specific feature. If your routing tier sent the request to Qwen or MiniMax, the PDF was silently dropped. Force a Claude route by setting &lt;code&gt;CLAUDE_MODEL=claude-sonnet-4-6&lt;/code&gt; more aggressively or saying "use Sonnet" in your prompt.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Lima sandbox doesn't have a PDF parser installed.&lt;/strong&gt; If you put the PDF in your workspace and asked the agent to read it via filesystem tools, it tried &lt;code&gt;cat report.pdf&lt;/code&gt;, got binary garbage, then tried &lt;code&gt;pdftotext&lt;/code&gt; and got "command not found." Install poppler-utils inside the Lima VM:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   limactl shell default
   &lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; poppler-utils
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The PDF was dragged into chat but exceeded the multimodal API size limits.&lt;/strong&gt; Anthropic caps PDFs at 32MB and about 100 pages. Move the file into your workspace folder instead and reference it by name.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  "Streaming cuts off mid-response"
&lt;/h3&gt;

&lt;p&gt;Usually means Lynkr routed to a model with a smaller context window than the conversation needed. The fix is either to use a model with more context (Opus or Sonnet) for that session, or to start a new conversation so the history compresses.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Lima won't start"
&lt;/h3&gt;

&lt;p&gt;On macOS, Lima needs to bootstrap once before Open Cowork can use it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;limactl start default
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that initial start, Open Cowork manages it automatically.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part VII — When This Stack Pays Off
&lt;/h2&gt;

&lt;p&gt;Open Cowork plus Lynkr makes sense when you fit roughly this profile:&lt;/p&gt;

&lt;p&gt;You do enough AI-driven work that subscription costs feel like a real expense. You want files generated, not files described. You work with information that should stay on your machine. You don't want to bet your workflow on one vendor's pricing or feature roadmap. And you have a laptop capable enough to run a local model — anything M-series or a recent Intel chip with 16GB+ of RAM will do fine.&lt;/p&gt;

&lt;p&gt;If you fit two or more of those, the math works. The setup is a one-time investment of maybe two hours. After that, the agent runs itself, the proxy handles routing transparently, and you stop seeing line items for AI tools on your monthly statement.&lt;/p&gt;

&lt;p&gt;If you don't fit that profile — if you only use AI occasionally, if cost isn't a concern, if you don't care about privacy — then a managed product like Claude Cowork is genuinely simpler. There's no shame in paying for convenience. But for the people who do fit, this stack is one of the few open-source combinations where the open-source version is &lt;em&gt;better&lt;/em&gt; than the commercial alternative, not just cheaper.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Open Cowork closes the gap between AI tools that talk and AI tools that work. It generates real files, drives your browser, connects to other apps, and runs everything inside a sandbox so it can't break your machine.&lt;/p&gt;

&lt;p&gt;Lynkr makes Open Cowork affordable. By routing the boring 70% of calls to a local model and reserving the cloud models for the calls that actually need them, the per-session cost drops from dollars to cents.&lt;/p&gt;

&lt;p&gt;The integration is two lines of &lt;code&gt;.env&lt;/code&gt; configuration. The whole stack runs on your own machine. There's no subscription, no telemetry going to a vendor, no monthly bill.&lt;/p&gt;

&lt;p&gt;If you've been waiting for the open-source AI agent ecosystem to grow up, this is it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Open Cowork: &lt;code&gt;github.com/OpenCoworkAI/open-cowork&lt;/code&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Lynkr: &lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;https://github.com/Fast-Editor/Lynkr&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Ollama: &lt;code&gt;ollama.com&lt;/code&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you set this up and run into something I didn't cover here, drop a note in the comments. I'm collecting failure modes for a follow-up post.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>opensource</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to use Vercel's Deepsec with ollama</title>
      <dc:creator>Vishal VeeraReddy</dc:creator>
      <pubDate>Thu, 14 May 2026 01:38:46 +0000</pubDate>
      <link>https://dev.to/lynkr/how-to-use-vercels-deepsec-with-ollama-20pa</link>
      <guid>https://dev.to/lynkr/how-to-use-vercels-deepsec-with-ollama-20pa</guid>
      <description>&lt;h5&gt;
  
  
  How to run continuous, AI-powered security audits on your codebase — routed through a local proxy that picks the cheapest viable model for each file.
&lt;/h5&gt;




&lt;p&gt;Most security scanners feel like spam filters from 2005. They flag every &lt;code&gt;eval()&lt;/code&gt;, every string concatenation that looks vaguely SQL-ish, and every hard-coded constant longer than ten characters. The signal-to-noise ratio is so low that teams either stop running them or learn to ignore the output entirely.&lt;/p&gt;

&lt;p&gt;The newer generation of scanners uses LLMs to read the code the way a human reviewer would — understanding &lt;em&gt;intent&lt;/em&gt; before flagging &lt;em&gt;pattern&lt;/em&gt;. Deepsec, from Vercel, is one of the most usable tools in this category. The problem is that running it across a real codebase, file by file, with a frontier model, gets expensive quickly.&lt;/p&gt;

&lt;p&gt;This article explains what deepsec is, what it actually catches, and how to point its AI calls at Lynkr — a local Anthropic-compatible proxy — to route simple files to a local Ollama model and reserve the expensive cloud model for the cases that genuinely need it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part I — What Is Deepsec?
&lt;/h2&gt;

&lt;p&gt;Deepsec is an AI security scanner published by Vercel as an npm package. The shape of it is unusual compared to other security tools: it is not a service, not a CI bot, not a SaaS dashboard. It is a workspace-style CLI you install into a &lt;code&gt;.deepsec/&lt;/code&gt; directory inside your repo, configure with project context, and run on demand.&lt;/p&gt;

&lt;p&gt;The output is a folder of findings, organized by severity, that you read like a code review.&lt;/p&gt;

&lt;h3&gt;
  
  
  The pipeline
&lt;/h3&gt;

&lt;p&gt;Deepsec runs in stages, and each stage has a different cost profile:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Scan (free).&lt;/strong&gt; A pure regex pass over your source tree. It identifies files that contain patterns potentially worth deeper analysis — auth flows, database queries, request handlers, crypto operations, environment variable use, file system access. This stage is cheap, fast, and runs on every file. It is the funnel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Process (paid, AI).&lt;/strong&gt; The files surfaced by scan are read by an LLM (Claude Opus by default, ~$0.30 per file). The model receives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The full file contents&lt;/li&gt;
&lt;li&gt;The project context you supplied in &lt;code&gt;INFO.md&lt;/code&gt; (auth primitives, threat model, false-positive paths)&lt;/li&gt;
&lt;li&gt;Custom matchers you have written for this codebase&lt;/li&gt;
&lt;li&gt;The deepsec built-in matcher prompts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model produces findings: severity, location, explanation, recommended fix.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Revalidate (paid, AI).&lt;/strong&gt; A second AI pass over each finding to confirm it is real. This step alone dramatically reduces false-positive rates because it forces the model to defend each conclusion against the surrounding context. Without revalidation, AI scanners hallucinate findings about as often as they find real bugs. With it, the signal becomes usable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Triage (paid, AI).&lt;/strong&gt; Findings are clustered, deduplicated, and ranked. The output is a folder of markdown files under &lt;code&gt;findings/&lt;/code&gt; organized by severity: &lt;code&gt;CRITICAL&lt;/code&gt;, &lt;code&gt;HIGH&lt;/code&gt;, &lt;code&gt;HIGH_BUG&lt;/code&gt;, &lt;code&gt;MEDIUM&lt;/code&gt;, &lt;code&gt;LOW&lt;/code&gt;, &lt;code&gt;BUG&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Export.&lt;/strong&gt; Findings get written out in the format you want — markdown directory, JSON for CI, or a single report file.&lt;/p&gt;

&lt;h3&gt;
  
  
  What it actually catches
&lt;/h3&gt;

&lt;p&gt;Deepsec is built around the kinds of issues that traditional SAST tools miss because they require context, not just pattern matching:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Auth bypass via missing guard.&lt;/strong&gt; A new endpoint that forgot to call the auth middleware, in a codebase where 99% of endpoints do.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IDOR.&lt;/strong&gt; An endpoint that accepts a resource ID and queries the database by ID but never checks ownership.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Path traversal in user-provided filenames.&lt;/strong&gt; Not "anything that calls &lt;code&gt;fs.readFile&lt;/code&gt;" — only the cases where the filename came from a request and was not validated.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token leakage in logs.&lt;/strong&gt; A &lt;code&gt;logger.info&lt;/code&gt; call that includes the entire request body, where the request body sometimes contains a session token.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unsafe deserialization.&lt;/strong&gt; A &lt;code&gt;JSON.parse&lt;/code&gt; of untrusted input that gets passed to a function expecting a specific shape, without validation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Race conditions in auth flows.&lt;/strong&gt; Double-spend patterns, TOCTOU bugs in permission checks, session fixation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SSRF in URL fetchers.&lt;/strong&gt; A &lt;code&gt;fetch(url)&lt;/code&gt; where &lt;code&gt;url&lt;/code&gt; came from user input and the code does not block private IP ranges or localhost.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What makes it work is the project context. When you tell deepsec "auth in this codebase is enforced by &lt;code&gt;requireUser()&lt;/code&gt; middleware, exported from &lt;code&gt;src/auth/middleware.ts&lt;/code&gt;" — it will flag every request handler that doesn't call it. A regex scanner cannot do this. A generic LLM scanner without project context flags too many false positives to be useful.&lt;/p&gt;

&lt;h3&gt;
  
  
  The configuration model
&lt;/h3&gt;

&lt;p&gt;Deepsec is configured through three files inside &lt;code&gt;.deepsec/&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;deepsec.config.ts&lt;/code&gt;&lt;/strong&gt; — the project list. Each entry points at a codebase to scan (path relative to &lt;code&gt;.deepsec/&lt;/code&gt;). One workspace can scan multiple repos.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;defineConfig&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;deepsec/config&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nf"&gt;defineConfig&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;projects&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-code&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;root&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;..&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;api-gateway&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;root&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;../../api-gateway&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;data/&amp;lt;project-id&amp;gt;/INFO.md&lt;/code&gt;&lt;/strong&gt; — the hand-curated project context. This is the most important file in the entire setup. The scanner injects it into every AI batch, so verbose context dilutes signal. Target 50-100 lines covering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What the codebase does (one paragraph)&lt;/li&gt;
&lt;li&gt;The auth shape (3-5 most important primitives by name)&lt;/li&gt;
&lt;li&gt;The threat model (2-4 sentences, ranked by impact)&lt;/li&gt;
&lt;li&gt;Project-specific patterns to flag (3-5 unique patterns)&lt;/li&gt;
&lt;li&gt;Known false-positives (paths or patterns that look risky but are intentional)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;.env.local&lt;/code&gt;&lt;/strong&gt; — the API token. Anthropic key, OpenAI key, or Vercel AI Gateway token. If you have Claude Code or Codex logged in locally, deepsec can reuse that subscription and skip the token entirely for non-sandbox stages.&lt;/p&gt;

&lt;h3&gt;
  
  
  What deepsec is not
&lt;/h3&gt;

&lt;p&gt;It is worth being explicit about the scope.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It is &lt;strong&gt;not&lt;/strong&gt; a SAST scanner replacement. It will miss issues that require deep dataflow analysis or whole-program reasoning. Use it alongside Semgrep / CodeQL, not instead of them.&lt;/li&gt;
&lt;li&gt;It is &lt;strong&gt;not&lt;/strong&gt; a runtime tool. It does not watch traffic, replay requests, or instrument your application. It reads source code.&lt;/li&gt;
&lt;li&gt;It is &lt;strong&gt;not&lt;/strong&gt; continuous by default. You run it manually or on schedule via CI. The findings are a snapshot, not a stream.&lt;/li&gt;
&lt;li&gt;It is &lt;strong&gt;not&lt;/strong&gt; free at scale. A repository of 1,000 source files at $0.30 each is $300 per full run. You will want incremental modes (only scan changed files) and routing tiers in production.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last point is exactly where Lynkr comes in.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part II — A Quick Recap on Lynkr
&lt;/h2&gt;

&lt;p&gt;Lynkr is a Node.js proxy that exposes a unified Anthropic-compatible API surface (&lt;code&gt;/v1/messages&lt;/code&gt;) while internally routing each request to whichever provider and model is most appropriate.&lt;/p&gt;

&lt;p&gt;The core capabilities that matter for this article:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI-API-compatible alias.&lt;/strong&gt; Lynkr also speaks OpenAI's API format on the same port, so tools using either SDK can hit the same instance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complexity-based routing.&lt;/strong&gt; Lynkr scores each incoming request for complexity and routes simple requests to a local Ollama model, complex requests to Claude Opus or Sonnet.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-request model override.&lt;/strong&gt; Clients can specify a model explicitly (e.g. &lt;code&gt;claude-opus-4-7&lt;/code&gt;) and Lynkr honors it, otherwise picking from the configured tier.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token budget enforcement.&lt;/strong&gt; Lynkr tracks cumulative spend per project and can throttle or downshift when budgets are exceeded.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Telemetry and tracing.&lt;/strong&gt; Every request is logged with provider, latency, tokens, and cost. The dashboard at &lt;code&gt;localhost:8081/dashboard&lt;/code&gt; shows live throughput.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For deepsec specifically, two of these matter most: &lt;strong&gt;complexity routing&lt;/strong&gt; (route boring files to a local model, expensive files to a cloud model) and &lt;strong&gt;telemetry&lt;/strong&gt; (see exactly how much each scan costs and where the time goes).&lt;/p&gt;




&lt;h2&gt;
  
  
  Part III — Why Pair Them?
&lt;/h2&gt;

&lt;p&gt;Deepsec's economics are simple but unforgiving: cost scales linearly with file count multiplied by model price. A repository of 1,000 source files, scanned with Opus at default settings, is roughly $300 per full pass. Running it nightly is $9,000/month before revalidation. Revalidation roughly doubles that.&lt;/p&gt;

&lt;p&gt;Most of those files don't need a frontier model. A 30-line CSS file. A static config object. A pure-utility module that does string formatting. A test file. These get the same expensive Opus treatment as the auth middleware, even though a smaller model would reach the same "nothing here" verdict in a tenth the time and a hundredth the cost.&lt;/p&gt;

&lt;p&gt;Routing deepsec's AI calls through Lynkr changes the math. Lynkr analyzes each request, looks at the file content and project context being passed to the model, and decides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trivially safe file&lt;/strong&gt; (CSS, static config, test fixture, generated code) → local Ollama (&lt;code&gt;qwen2.5-coder:7b&lt;/code&gt;) → $0&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Application code that touches request handling or storage&lt;/strong&gt; → cloud Sonnet → ~$0.05/file&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auth middleware, payment processing, crypto, or files with high risk signals&lt;/strong&gt; → cloud Opus → $0.30/file&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A typical codebase has the trivial-to-critical ratio at roughly 70/25/5. The blended cost drops from $300 to about $20 for a full scan — a 15× reduction with no loss of fidelity on the files that actually matter.&lt;/p&gt;

&lt;p&gt;There is a secondary benefit: privacy. The files going to local Ollama never leave your machine. For codebases under compliance constraints (HIPAA, SOC2 with data residency requirements, government work), this is the difference between deepsec being usable and being a non-starter.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part IV — Setup Guide
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Node.js 20+&lt;/td&gt;
&lt;td&gt;Run deepsec and Lynkr&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pnpm&lt;/td&gt;
&lt;td&gt;Install deepsec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ollama 0.4+&lt;/td&gt;
&lt;td&gt;Local model inference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lynkr&lt;/td&gt;
&lt;td&gt;The proxy (we built this in the previous article)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Pull at least one coding-aware Ollama model. For security scanning, models trained on code work best:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull qwen2.5-coder:7b
&lt;span class="c"&gt;# optional: a bigger fallback for complex files&lt;/span&gt;
ollama pull qwen2.5-coder:32b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 1 — Install deepsec into your repo
&lt;/h3&gt;

&lt;p&gt;From the root of the repository you want to scan:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; .deepsec &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd&lt;/span&gt; .deepsec
pnpm init
pnpm add deepsec
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create &lt;code&gt;deepsec.config.ts&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;defineConfig&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;deepsec/config&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nf"&gt;defineConfig&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;projects&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;myapp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;root&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;..&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then scaffold the project data folder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pnpm deepsec init-project ..
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates &lt;code&gt;data/myapp/INFO.md&lt;/code&gt; and &lt;code&gt;data/myapp/SETUP.md&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2 — Fill in INFO.md
&lt;/h3&gt;

&lt;p&gt;Open &lt;code&gt;data/myapp/INFO.md&lt;/code&gt; in your coding agent (Claude Code, Cursor, etc.) and have it follow &lt;code&gt;data/myapp/SETUP.md&lt;/code&gt; to populate the context. The setup prompt is good — it tells the agent exactly what to read and what to write.&lt;/p&gt;

&lt;p&gt;If you are doing this by hand, the rubric is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# myapp&lt;/span&gt;

&lt;span class="gu"&gt;## What this codebase does&lt;/span&gt;
Next.js SaaS application. Multi-tenant. Customers sign up, integrate
their Stripe account, and use the dashboard to monitor subscription
churn. Routes under &lt;span class="sb"&gt;`/app/*`&lt;/span&gt; are authenticated; &lt;span class="sb"&gt;`/marketing/*`&lt;/span&gt; is
public.

&lt;span class="gu"&gt;## Auth shape&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`requireUser(req)`&lt;/span&gt; — server-side auth check, throws &lt;span class="sb"&gt;`UnauthorizedError`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`getCurrentUser()`&lt;/span&gt; — client-side hook
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`withTenantScope(query, tenantId)`&lt;/span&gt; — every DB query MUST be wrapped
&lt;span class="p"&gt;-&lt;/span&gt; Sessions stored in HTTP-only &lt;span class="sb"&gt;`__session`&lt;/span&gt; cookie, signed with HMAC

&lt;span class="gu"&gt;## Threat model&lt;/span&gt;
Highest impact: cross-tenant data leak via missing &lt;span class="sb"&gt;`withTenantScope`&lt;/span&gt;
or IDOR in resource endpoints. Second: Stripe webhook spoofing.
Third: token leakage via logs or error pages.

&lt;span class="gu"&gt;## Project-specific patterns to flag&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Any DB query under &lt;span class="sb"&gt;`src/db/queries/`&lt;/span&gt; that doesn't call &lt;span class="sb"&gt;`withTenantScope`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Any handler in &lt;span class="sb"&gt;`src/app/api/`&lt;/span&gt; that doesn't call &lt;span class="sb"&gt;`requireUser`&lt;/span&gt; first
&lt;span class="p"&gt;-&lt;/span&gt; Stripe webhook handlers that don't verify &lt;span class="sb"&gt;`stripe-signature`&lt;/span&gt; header
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`console.log`&lt;/span&gt; or &lt;span class="sb"&gt;`logger.info`&lt;/span&gt; calls passing entire &lt;span class="sb"&gt;`req`&lt;/span&gt; or &lt;span class="sb"&gt;`res`&lt;/span&gt;

&lt;span class="gu"&gt;## Known false-positives&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`src/app/api/public/*`&lt;/span&gt; — intentionally unauthenticated
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`src/db/queries/migrations/*`&lt;/span&gt; — runs without tenant scope by design
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`__tests__/fixtures/*`&lt;/span&gt; — fake tokens, not real secrets
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The quality of your &lt;code&gt;INFO.md&lt;/code&gt; determines the quality of your scan. Spend an hour on this. It will pay back the time within the first run.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3 — Add the Lynkr base URL
&lt;/h3&gt;

&lt;p&gt;This is the integration point. Deepsec uses the Anthropic SDK under the hood. The Anthropic SDK respects the &lt;code&gt;ANTHROPIC_BASE_URL&lt;/code&gt; environment variable. Point it at Lynkr.&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;.deepsec/.env.local&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Any non-empty string — Lynkr doesn't validate for local providers&lt;/span&gt;
&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;local&lt;/span&gt;

&lt;span class="c"&gt;# Route all Anthropic calls through Lynkr instead of api.anthropic.com&lt;/span&gt;
&lt;span class="nv"&gt;ANTHROPIC_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:8081
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are running Lynkr on a different host (a shared dev box, a homelab server), substitute the appropriate address.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4 — Configure Lynkr's routing tiers
&lt;/h3&gt;

&lt;p&gt;In your Lynkr config (typically &lt;code&gt;~/.lynkr/config.json&lt;/code&gt; or wherever you store it), define how to route deepsec's traffic. The key configuration is the complexity threshold and the tier mapping:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"routing"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tiers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"SIMPLE"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ollama"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"qwen2.5-coder:7b"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"MODERATE"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"anthropic"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"claude-sonnet-4-6"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"COMPLEX"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"anthropic"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"claude-opus-4-7"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"complexity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"thresholds"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"simple"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"complex"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;70&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The defaults work fine for most cases, but for security scanning you may want to lower the complex threshold — better to over-route to Opus on files that touch auth than under-route and miss something.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5 — Run a scan
&lt;/h3&gt;

&lt;p&gt;From &lt;code&gt;.deepsec/&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Free regex pass — surfaces files worth deeper analysis&lt;/span&gt;
pnpm deepsec scan

&lt;span class="c"&gt;# AI processing — routed through Lynkr&lt;/span&gt;
pnpm deepsec process &lt;span class="nt"&gt;--concurrency&lt;/span&gt; 5

&lt;span class="c"&gt;# AI revalidation — cuts false positives&lt;/span&gt;
pnpm deepsec revalidate &lt;span class="nt"&gt;--concurrency&lt;/span&gt; 5

&lt;span class="c"&gt;# Export findings as a folder of markdown files&lt;/span&gt;
pnpm deepsec &lt;span class="nb"&gt;export&lt;/span&gt; &lt;span class="nt"&gt;--format&lt;/span&gt; md-dir &lt;span class="nt"&gt;--out&lt;/span&gt; ./findings
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first run will take longer because nothing is cached. Subsequent runs are incremental — only changed files are reprocessed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6 — Inspect findings
&lt;/h3&gt;

&lt;p&gt;Open &lt;code&gt;.deepsec/findings/&lt;/code&gt; and you will see a directory per severity:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;findings/
├── CRITICAL/
│   └── 001-cross-tenant-leak-in-billing-endpoint.md
├── HIGH/
│   ├── 002-missing-stripe-signature-verification.md
│   └── 003-idor-in-user-settings-handler.md
├── MEDIUM/
│   └── 004-error-message-leaks-internal-path.md
└── LOW/
    └── 005-noisy-logging-includes-user-agent.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each finding is a markdown file with the location, severity, explanation, code excerpt, and recommended fix. They are written to be readable as a code review — drop them into a PR description, paste them into Linear, hand them to the engineer responsible.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part V — Verifying the Routing
&lt;/h2&gt;

&lt;p&gt;After running a scan, check the Lynkr dashboard at &lt;code&gt;http://localhost:8081/dashboard&lt;/code&gt;. You should see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A spike in request volume during the scan&lt;/li&gt;
&lt;li&gt;A mix of providers (Ollama vs. Anthropic) in the request breakdown&lt;/li&gt;
&lt;li&gt;Latency distribution showing local Ollama calls completing in 1-3s and cloud calls in 5-15s&lt;/li&gt;
&lt;li&gt;Cumulative cost tracked per request&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If everything is routing to Anthropic, your complexity scoring is too aggressive. If everything is routing to Ollama, you have a config error and the scan results will be useless — security analysis genuinely needs a capable model on the high-risk files.&lt;/p&gt;

&lt;p&gt;A healthy distribution for a real codebase looks like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;60-75% local (qwen2.5-coder:7b)&lt;/li&gt;
&lt;li&gt;20-30% cloud Sonnet&lt;/li&gt;
&lt;li&gt;5-10% cloud Opus&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Adjust thresholds until you see numbers in that range.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part VI — Advanced Patterns
&lt;/h2&gt;

&lt;h3&gt;
  
  
  CI integration with budget caps
&lt;/h3&gt;

&lt;p&gt;Run deepsec in your CI pipeline only on files changed in the PR, with Lynkr enforcing a hard budget ceiling per run. Add to your GitHub Actions workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Security scan changed files&lt;/span&gt;
  &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.LYNKR_KEY }}&lt;/span&gt;
    &lt;span class="na"&gt;ANTHROPIC_BASE_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ vars.LYNKR_URL }}&lt;/span&gt;
    &lt;span class="na"&gt;DEEPSEC_BUDGET_USD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2.00"&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;cd .deepsec&lt;/span&gt;
    &lt;span class="s"&gt;git diff --name-only origin/main...HEAD &amp;gt; /tmp/changed&lt;/span&gt;
    &lt;span class="s"&gt;pnpm deepsec scan --files-from /tmp/changed&lt;/span&gt;
    &lt;span class="s"&gt;pnpm deepsec process --concurrency 3&lt;/span&gt;
    &lt;span class="s"&gt;pnpm deepsec export --format json --out ./findings.json&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lynkr will refuse new requests once &lt;code&gt;$2.00&lt;/code&gt; is spent on that scan, preventing a runaway CI cost on a 200-file PR.&lt;/p&gt;

&lt;h3&gt;
  
  
  Custom matchers in INFO.md
&lt;/h3&gt;

&lt;p&gt;The most valuable additions to deepsec are project-specific matchers — things only your codebase knows. A few real examples I have seen pay off immediately:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;"Any function that accepts a &lt;code&gt;customerId&lt;/code&gt; parameter from a request must also accept and validate &lt;code&gt;tenantId&lt;/code&gt;. If only &lt;code&gt;customerId&lt;/code&gt; is checked, that is a cross-tenant access bug."&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"&lt;code&gt;process.env.DEBUG_AUTH_BYPASS&lt;/code&gt; is a dev-only flag. Any code path gated only by this flag, with no other auth check, is critical regardless of where it appears."&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"All Redis keys must include the tenant prefix. A &lt;code&gt;redis.get(\&lt;/code&gt;user:${id}&lt;code&gt;)&lt;/code&gt; without tenant prefix is a tenant-scoping bug."&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These patterns live in the &lt;code&gt;## Project-specific patterns to flag&lt;/code&gt; section of &lt;code&gt;INFO.md&lt;/code&gt;. Add new ones as you discover them — each one prevents an entire class of bug from ever shipping.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scheduled deep scans + on-demand fast scans
&lt;/h3&gt;

&lt;p&gt;Run a different cadence for different scan types:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Trigger&lt;/th&gt;
&lt;th&gt;Scope&lt;/th&gt;
&lt;th&gt;Routing&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Per PR&lt;/td&gt;
&lt;td&gt;Changed files only&lt;/td&gt;
&lt;td&gt;Ollama-heavy via Lynkr&lt;/td&gt;
&lt;td&gt;~$0.10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Nightly&lt;/td&gt;
&lt;td&gt;Full repo, scan + process&lt;/td&gt;
&lt;td&gt;Mixed&lt;/td&gt;
&lt;td&gt;~$10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Weekly&lt;/td&gt;
&lt;td&gt;Full repo, scan + process + revalidate&lt;/td&gt;
&lt;td&gt;Mixed&lt;/td&gt;
&lt;td&gt;~$20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pre-release&lt;/td&gt;
&lt;td&gt;Full repo with strict revalidation&lt;/td&gt;
&lt;td&gt;Sonnet-heavy via Lynkr&lt;/td&gt;
&lt;td&gt;~$50&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The Lynkr dashboard's cost tracking makes it easy to verify these envelopes are being hit.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part VII — When Deepsec Catches Something Real
&lt;/h2&gt;

&lt;p&gt;A worked example from a real codebase, anonymized.&lt;/p&gt;

&lt;p&gt;The codebase had a billing portal at &lt;code&gt;/app/billing&lt;/code&gt; that let customers download invoices as PDFs. The handler took an &lt;code&gt;invoiceId&lt;/code&gt; query parameter, looked up the invoice in the database, and streamed the PDF.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;GET&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;NextRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;requireUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;invoiceId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;nextUrl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;searchParams&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;invoice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;invoices&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findUnique&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;invoiceId&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;invoice&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pdfBytes&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A regex scanner would not flag this. The auth check is there. The query is parameterized. The output is binary data. By every shallow signal, this is fine code.&lt;/p&gt;

&lt;p&gt;Deepsec, with the &lt;code&gt;INFO.md&lt;/code&gt; context that every DB query must be wrapped in &lt;code&gt;withTenantScope&lt;/code&gt;, flagged it as CRITICAL:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Cross-tenant IDOR in billing PDF download&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;findUnique({ where: { id: invoiceId } })&lt;/code&gt; looks up an invoice by ID&lt;br&gt;
with no tenant scoping. Any authenticated user can request any invoice&lt;br&gt;
ID and receive the PDF, including invoices belonging to other tenants.&lt;/p&gt;

&lt;p&gt;Fix:&lt;/p&gt;


&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;invoice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;invoices&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findFirst&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;withTenantScope&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;invoiceId&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;invoice&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Not found&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;404&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/blockquote&gt;

&lt;p&gt;That single finding paid for the entire scan budget for the year. This is the kind of bug that pattern-matching tools cannot find because it requires understanding &lt;em&gt;what auth means in this codebase&lt;/em&gt; — and that understanding lives in &lt;code&gt;INFO.md&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part VIII — Limitations to Know Going In
&lt;/h2&gt;

&lt;p&gt;A few things will trip you up if you treat deepsec as a complete solution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It does not understand framework conventions you have not documented.&lt;/strong&gt; If your Next.js app has unusual middleware composition that grants auth on a route prefix, deepsec doesn't know that. You will get false positives until you write it into &lt;code&gt;INFO.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Generated code wastes tokens.&lt;/strong&gt; Anything under &lt;code&gt;dist/&lt;/code&gt;, &lt;code&gt;.next/&lt;/code&gt;, &lt;code&gt;build/&lt;/code&gt;, or autogenerated GraphQL types should be in your &lt;code&gt;.deepsecignore&lt;/code&gt; or excluded via the scan config. Lynkr's complexity routing will route this to local Ollama, but you are still paying inference latency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Revalidation is not a magic bullet.&lt;/strong&gt; It catches the obvious false positives, not the subtle ones. You will still need to triage by hand. Budget 1-2 hours of human review per 100 findings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Local Ollama models miss things.&lt;/strong&gt; This is the trade-off you accept when routing aggressively to local inference. For most files this is fine. For files in the &lt;code&gt;## Project-specific patterns to flag&lt;/code&gt; paths, force them to cloud routing explicitly via Lynkr's per-path overrides.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Deepsec is the first AI security scanner I have run on a real codebase where the output was actually worth reading. The findings are specific, the language is direct, and the false-positive rate after revalidation is low enough that engineers stop ignoring them.&lt;/p&gt;

&lt;p&gt;Pairing it with Lynkr solves the only real objection — cost. With local-first routing for the 70% of files that don't need a frontier model, scanning becomes cheap enough to run on every PR, nightly across the whole repo, and weekly with strict revalidation, all within a budget that doesn't require a meeting to approve.&lt;/p&gt;

&lt;p&gt;The setup takes maybe two hours end to end: install deepsec, write a good &lt;code&gt;INFO.md&lt;/code&gt;, install Lynkr, point the base URL, run the first scan. After that it runs itself.&lt;/p&gt;

&lt;p&gt;Worth the afternoon.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Deepsec: &lt;code&gt;https://github.com/vercel-labs/deepsec&lt;/code&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Lynkr: &lt;code&gt;https://github.com/Fast-Editor/Lynkr&lt;/code&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Ollama: &lt;code&gt;ollama.com&lt;/code&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you run security scans on your codebase already and have war stories — good or bad — drop them in the comments. I am collecting patterns for a follow-up post on writing custom matchers.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>productivity</category>
      <category>programming</category>
    </item>
    <item>
      <title>Open-Design : Run a Local AI Design Studio for Free</title>
      <dc:creator>Vishal VeeraReddy</dc:creator>
      <pubDate>Tue, 12 May 2026 05:59:09 +0000</pubDate>
      <link>https://dev.to/lynkr/open-design-lynkr-run-a-local-ai-design-studio-for-free-450n</link>
      <guid>https://dev.to/lynkr/open-design-lynkr-run-a-local-ai-design-studio-for-free-450n</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;A Claude Design Alternative&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  Open-Design: Run a Local AI Design Studio for Free
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;How to wire up a self-hosted design generation stack that rivals Figma's AI features — without sending your work to a third-party cloud.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;There is a quiet corner of the open-source world where two tools have been evolving in parallel, each solving a different half of the same problem. &lt;strong&gt;Open-design&lt;/strong&gt; wants to be the AI-native design canvas. &lt;strong&gt;Lynkr&lt;/strong&gt; wants to be the intelligent router that connects any AI model to any client. Together, they form a surprisingly capable stack: a browser-based design studio powered entirely by models you run yourself.&lt;/p&gt;

&lt;p&gt;This article explains what each tool does, why they pair well, and walks you through the exact steps to get both running — from a blank machine to a working design session in under twenty minutes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part I — What Is Open-Design?
&lt;/h2&gt;

&lt;p&gt;Open-design (github: &lt;code&gt;nexu-io/open-design&lt;/code&gt;) is a web application that lets you describe what you want to build in plain English and receive a live, editable HTML design in return. Think of it as a Figma alternative where the "draw" action is replaced by a conversation.&lt;/p&gt;

&lt;h3&gt;
  
  
  The core idea
&lt;/h3&gt;

&lt;p&gt;You open a project, type "Create a SaaS pricing page with three tiers, a purple gradient header, and a FAQ section below the fold," and the assistant responds with a fully rendered HTML page sitting in a split pane next to the chat. You can iterate on it — "make the CTA button larger and change the font to Inter" — and the design updates in place.&lt;/p&gt;

&lt;p&gt;What makes this different from just asking ChatGPT for HTML and copying the output into a file is the &lt;em&gt;structure&lt;/em&gt; open-design wraps around the conversation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Project workspaces&lt;/strong&gt; — each project has its own conversation history, file panel, and design system binding. Your palette and typography rules travel with the project.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Design files panel&lt;/strong&gt; — generated HTML is not dropped in the chat as a code block. It is parsed, saved as a project file, and opened as a rendered tab in a file viewer with an iframe preview.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skills&lt;/strong&gt; — prebuilt workflow templates (landing page, deck, dashboard, prototype) that inject opinionated system prompts so the model follows proven patterns rather than making it up.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Design systems&lt;/strong&gt; — you can attach a &lt;code&gt;DESIGN.md&lt;/code&gt; that defines your color tokens, spacing scale, and component rules. The model reads this as authoritative and binds every artifact to it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live artifacts&lt;/strong&gt; — a second artifact type for data-driven outputs (dashboards, reports) that can be refreshed on a schedule by re-running the generation with updated data.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Two execution modes
&lt;/h3&gt;

&lt;p&gt;Open-design ships with two ways to run the AI backend.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Daemon mode&lt;/strong&gt; spins up a local CLI agent (Codex, Claude Code, Gemini CLI) on your machine. The daemon manages the agent process, streams its output, and interprets the tool calls it makes — reading files, writing artifacts, running shell commands. This is the "full agentic" mode where the AI can explore your file system, install dependencies, and produce multi-file outputs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;API mode&lt;/strong&gt; (also called BYOK — Bring Your Own Key) skips the local agent entirely. You point open-design at any OpenAI-compatible or Anthropic-compatible endpoint, hand it an API key, and it sends requests directly. The AI cannot use tools in this mode; it can only output &lt;code&gt;&amp;lt;artifact&amp;gt;&lt;/code&gt; blocks, which open-design parses and routes to the Design Files panel. This is the simpler, faster mode — and it is exactly the mode Lynkr plugs into.&lt;/p&gt;

&lt;h3&gt;
  
  
  The artifact contract
&lt;/h3&gt;

&lt;p&gt;Both modes share one key convention: the &lt;code&gt;&amp;lt;artifact&amp;gt;&lt;/code&gt; tag.&lt;/p&gt;

&lt;p&gt;When the AI wants to produce a design, it wraps its HTML inside:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;artifact&lt;/span&gt; &lt;span class="na"&gt;identifier=&lt;/span&gt;&lt;span class="s"&gt;"landing-page.html"&lt;/span&gt; &lt;span class="na"&gt;type=&lt;/span&gt;&lt;span class="s"&gt;"text/html"&lt;/span&gt; &lt;span class="na"&gt;title=&lt;/span&gt;&lt;span class="s"&gt;"Landing Page"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="cp"&gt;&amp;lt;!DOCTYPE html&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;html&lt;/span&gt; &lt;span class="na"&gt;lang=&lt;/span&gt;&lt;span class="s"&gt;"en"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  ...
&lt;span class="nt"&gt;&amp;lt;/html&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/artifact&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open-design's streaming parser watches the incoming text for this tag. The moment it sees &lt;code&gt;&amp;lt;artifact&lt;/code&gt;, it opens a new file entry in the Design panel, streams the inner HTML into an iframe as the model generates it, and — when the closing &lt;code&gt;&amp;lt;/artifact&amp;gt;&lt;/code&gt; arrives — saves the file to the project folder and opens it as a tab. The user sees a live preview build up line by line, like watching someone draw.&lt;/p&gt;

&lt;p&gt;The chat view simultaneously strips the artifact out (showing only any prose the model wrote around it) so the conversation stays clean.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part II — What Is Lynkr?
&lt;/h2&gt;

&lt;p&gt;Lynkr is an AI proxy server that presents a unified Anthropic-compatible API surface (&lt;code&gt;/v1/messages&lt;/code&gt;) while internally routing requests to whichever provider and model makes sense for the task.&lt;/p&gt;

&lt;h3&gt;
  
  
  The problem it solves
&lt;/h3&gt;

&lt;p&gt;If you run multiple AI providers — Ollama locally, Anthropic in the cloud, maybe Azure OpenAI for enterprise workloads — every client that talks to one of them is tightly coupled to that provider's API format. Swap the provider, update every client.&lt;/p&gt;

&lt;p&gt;Lynkr breaks that coupling. Clients always speak Anthropic's message format. Lynkr translates on the fly to whatever the target provider expects and translates the response back.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Lynkr actually does
&lt;/h3&gt;

&lt;p&gt;Beyond simple translation, Lynkr runs an &lt;strong&gt;agent loop&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It receives a message request.&lt;/li&gt;
&lt;li&gt;It forwards the request to the target model (Ollama, Anthropic, Azure, OpenRouter).&lt;/li&gt;
&lt;li&gt;If the model responds with tool calls, Lynkr executes them server-side (web search, file reads, subtask delegation) and feeds the results back to the model.&lt;/li&gt;
&lt;li&gt;The loop repeats until the model produces a final text answer or a terminal condition is hit.&lt;/li&gt;
&lt;li&gt;The final response is returned in Anthropic SSE format.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This means clients that speak Anthropic's streaming protocol — including open-design in API mode — get a fully resolved, agentic response even when the underlying model is a local Ollama instance that has no native tool support.&lt;/p&gt;

&lt;h3&gt;
  
  
  Routing intelligence
&lt;/h3&gt;

&lt;p&gt;Lynkr analyzes each incoming request for complexity (simple lookup vs. multi-step reasoning vs. heavy code generation) and routes to the appropriate tier:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Simple / fast&lt;/strong&gt; → local Ollama model (zero cloud cost, low latency)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complex / reasoning&lt;/strong&gt; → cloud model (Anthropic, OpenAI, Azure)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent tasks&lt;/strong&gt; → multi-step loop with tool execution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of this is transparent to the client. Open-design sends one request; Lynkr decides where it goes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Other capabilities
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Token budget enforcement&lt;/strong&gt; — Lynkr tracks token usage and compresses conversation history before it would overflow the context window, preserving the most recent and most important turns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Headroom sidecar&lt;/strong&gt; — a small Python service that monitors GPU/CPU memory and tells Lynkr how much headroom is available for local inference, enabling dynamic load shedding when the machine is under pressure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session memory&lt;/strong&gt; — vector-search-backed conversation memory that lets the model recall context from previous turns in long projects.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Telemetry and tracing&lt;/strong&gt; — structured logs, latency metrics, and per-provider cost tracking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dashboard&lt;/strong&gt; — a web UI at &lt;code&gt;/dashboard&lt;/code&gt; that shows live request throughput, provider health, and routing decisions.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Part III — Why They Pair Well
&lt;/h2&gt;

&lt;p&gt;Open-design in API mode needs an endpoint that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Speaks Anthropic's &lt;code&gt;/v1/messages&lt;/code&gt; format with SSE streaming.&lt;/li&gt;
&lt;li&gt;Understands the design system context injected in the system prompt.&lt;/li&gt;
&lt;li&gt;Produces clean &lt;code&gt;&amp;lt;artifact&amp;gt;&lt;/code&gt; HTML blocks without hallucinating tool calls or emitting ANSI escape codes into the CSS.&lt;/li&gt;
&lt;li&gt;Routes to whatever model the user has available — local or cloud.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Lynkr provides all four. You point open-design at &lt;code&gt;http://localhost:8081&lt;/code&gt;, select "Anthropic" as the protocol, set any string as the API key, and Lynkr handles the rest.&lt;/p&gt;

&lt;p&gt;The integration is also meaningful in the other direction. Lynkr needs clients that exercise its capabilities in realistic ways. Open-design's rich system prompts, multi-turn conversations, and artifact streaming are exactly the kind of traffic that surfaces edge cases — model hallucinations, ANSI corruption in streamed HTML, token overflows — that make a proxy useful to stress-test.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part IV — Setup Guide
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Docker&lt;/td&gt;
&lt;td&gt;24+&lt;/td&gt;
&lt;td&gt;Run open-design&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Node.js&lt;/td&gt;
&lt;td&gt;20+&lt;/td&gt;
&lt;td&gt;Run Lynkr&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ollama&lt;/td&gt;
&lt;td&gt;0.4+&lt;/td&gt;
&lt;td&gt;Local model inference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Git&lt;/td&gt;
&lt;td&gt;any&lt;/td&gt;
&lt;td&gt;Clone repos&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You also need at least one model pulled in Ollama. For design generation, &lt;code&gt;minimax-m2.5:cloud&lt;/code&gt; gives strong results (it reasons visually and follows HTML conventions well). For lighter machines, &lt;code&gt;qwen2.5-coder:7b&lt;/code&gt; or &lt;code&gt;llama3.1:8b&lt;/code&gt; are workable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull minimax-m2.5:cloud
&lt;span class="c"&gt;# or for lighter machines:&lt;/span&gt;
ollama pull qwen2.5-coder:7b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Step 1 — Install and start Lynkr
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone the repo&lt;/span&gt;
git clone https://github.com/Fast-Editor/Lynkr
&lt;span class="nb"&gt;cd &lt;/span&gt;lynkr

&lt;span class="c"&gt;# Install dependencies&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt;

&lt;span class="c"&gt;# Start Lynkr on port 8081&lt;/span&gt;
node bin/cli.js start &lt;span class="nt"&gt;--port&lt;/span&gt; 8081
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lynkr starts, discovers your local Ollama instance automatically, and is ready to accept requests at &lt;code&gt;http://localhost:8081/v1/messages&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You can verify it is running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:8081/health | jq &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;span class="c"&gt;# { "status": "ok", "providers": ["ollama", ...] }&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To check the dashboard, open &lt;code&gt;http://localhost:8081/dashboard&lt;/code&gt; in your browser.&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 2 — Run open-design in Docker
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Pull and run open-design&lt;/span&gt;
docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; open-design &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 7456:7456 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--add-host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;host.docker.internal:host-gateway &lt;span class="se"&gt;\&lt;/span&gt;
  nexuio/open-design:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--add-host&lt;/code&gt; flag is critical. Open-design runs inside a container, and it needs to reach Lynkr which runs on your host machine. &lt;code&gt;host.docker.internal&lt;/code&gt; is the hostname that resolves to your host from inside a Docker container.&lt;/p&gt;

&lt;p&gt;Open your browser at &lt;code&gt;http://localhost:7456&lt;/code&gt;. You should see the open-design welcome screen.&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 3 — Configure API mode to use Lynkr
&lt;/h3&gt;

&lt;p&gt;In the open-design UI:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Click the &lt;strong&gt;Settings&lt;/strong&gt; gear icon (top right).&lt;/li&gt;
&lt;li&gt;Under &lt;strong&gt;Execution &amp;amp; model&lt;/strong&gt;, click the &lt;strong&gt;BYOK&lt;/strong&gt; tab (right side — "API provider").&lt;/li&gt;
&lt;li&gt;Select the &lt;strong&gt;Anthropic&lt;/strong&gt; protocol tab.&lt;/li&gt;
&lt;li&gt;Under &lt;strong&gt;Quick fill provider&lt;/strong&gt;, choose &lt;strong&gt;Custom provider&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Set &lt;strong&gt;API Key&lt;/strong&gt; to any non-empty string (e.g. &lt;code&gt;local&lt;/code&gt;). Lynkr does not validate it for local providers.&lt;/li&gt;
&lt;li&gt;Set &lt;strong&gt;Model&lt;/strong&gt; to the Ollama model you pulled — e.g. &lt;code&gt;minimax-m2.5:cloud&lt;/code&gt; or &lt;code&gt;qwen2.5-coder:7b&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Set &lt;strong&gt;Base URL&lt;/strong&gt; to:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   http://host.docker.internal:8081
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Click &lt;strong&gt;Test&lt;/strong&gt; to confirm the connection, then close Settings.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyc7vtygp9w3gz92l34cy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyc7vtygp9w3gz92l34cy.png" alt=" " width="800" height="709"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The Settings panel with BYOK selected, Anthropic protocol active, and Base URL pointed at Lynkr. The model shown is claude-opus-4-5 but you can type any Ollama model name — the field accepts free text.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That is it. Open-design will now route all AI requests through Lynkr, which routes them to your local Ollama instance.&lt;/p&gt;


&lt;h3&gt;
  
  
  Step 4 — Create your first design
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Click &lt;strong&gt;New Project&lt;/strong&gt; and name it anything — "Landing Page Test".&lt;/li&gt;
&lt;li&gt;When prompted for project kind, choose &lt;strong&gt;Prototype&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Type your first prompt in the chat:&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;Create a modern SaaS landing page with a dark hero section, gradient headline, three feature cards below, and a "Get Early Access" CTA button. Use a deep blue and violet color scheme.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ol&gt;
&lt;li&gt;Hit enter and watch what happens.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Lynkr receives the request, routes it to Ollama, the model generates an extended thinking block, then produces an &lt;code&gt;&amp;lt;artifact&amp;gt;&lt;/code&gt; HTML block. Lynkr streams this back to open-design. Open-design's streaming parser detects the &lt;code&gt;&amp;lt;artifact&amp;gt;&lt;/code&gt; tag, opens a Design panel tab, and renders the HTML in real time — line by line, as the model writes it.&lt;/p&gt;

&lt;p&gt;When the stream ends, the file is saved automatically to the project and the Design tab snaps into focus with your rendered page.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/opendesign-workspace-output.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/opendesign-workspace-output.png" alt="Open-design workspace showing the chat pane on the left, Design Files tab in the center with a layers panel listing 107 editable elements, and a rendered HTML pitch deck slide preview on the right" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The full open-design workspace after a generation completes. Left: the conversation. Center: the Design Files panel with a structured layers tree. Right: the live rendered preview — this is a 10-slide editorial pitch deck generated from a single prompt in about 60 seconds.&lt;/em&gt;&lt;/p&gt;


&lt;h3&gt;
  
  
  Step 5 — Iterate and refine
&lt;/h3&gt;

&lt;p&gt;The conversation history is preserved per-project. Each turn, open-design builds the full message history and sends it to Lynkr, so the model has context from every prior design decision.&lt;/p&gt;

&lt;p&gt;Try follow-up prompts like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Make the hero headline larger, add a subtle animated gradient background to the hero section, and add a navigation bar at the top with the logo on the left and three nav links on the right.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;or:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Add a testimonials section below the feature cards with three customer quotes and avatar placeholders. Keep the same color palette.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Each response produces an updated artifact. Open-design saves the new version and opens it, preserving the previous version in the file history.&lt;/p&gt;


&lt;h2&gt;
  
  
  Part V — Advanced Configuration
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Using a cloud model as fallback
&lt;/h3&gt;

&lt;p&gt;Lynkr supports routing heavy or complex requests to a cloud provider while keeping simple requests local. Add your Anthropic API key to Lynkr's config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# In lynkr directory&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-ant-...

node bin/cli.js start &lt;span class="nt"&gt;--port&lt;/span&gt; 8081
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lynkr's complexity analyzer will automatically route multi-step reasoning requests to &lt;code&gt;claude-sonnet-4-6&lt;/code&gt; and keep lighter requests on Ollama — without any change to your open-design configuration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Binding a design system
&lt;/h3&gt;

&lt;p&gt;Open-design lets you create a design system with a &lt;code&gt;DESIGN.md&lt;/code&gt; that defines your brand tokens. Once bound to a project, this file is injected into every system prompt. The model reads it as authoritative and will not invent colors, fonts, or spacing outside what you defined.&lt;/p&gt;

&lt;p&gt;A minimal &lt;code&gt;DESIGN.md&lt;/code&gt; looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Colors&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Primary: #5B21B6 (violet-800)
&lt;span class="p"&gt;-&lt;/span&gt; Accent: #7C3AED (violet-600)
&lt;span class="p"&gt;-&lt;/span&gt; Background: #0F0F23
&lt;span class="p"&gt;-&lt;/span&gt; Surface: #1A1A2E
&lt;span class="p"&gt;-&lt;/span&gt; Text: #F8FAFC

&lt;span class="gu"&gt;## Typography&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Headings: Inter, weight 700
&lt;span class="p"&gt;-&lt;/span&gt; Body: Inter, weight 400
&lt;span class="p"&gt;-&lt;/span&gt; Code: JetBrains Mono

&lt;span class="gu"&gt;## Spacing&lt;/span&gt;
Base unit: 4px. All spacing is multiples of 4.

&lt;span class="gu"&gt;## Buttons&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Primary: solid violet-600, white text, 8px radius, 14px 28px padding
&lt;span class="p"&gt;-&lt;/span&gt; Ghost: transparent, violet-600 border, violet-600 text
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this bound, every artifact the model generates will use exactly these values — no hallucinated hex codes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Attaching skills
&lt;/h3&gt;

&lt;p&gt;Skills are workflow templates that inject expert-level instructions for specific artifact types. Open-design ships with several built-in skills. You can also write your own &lt;code&gt;SKILL.md&lt;/code&gt; and publish it to the open-design skill registry.&lt;/p&gt;

&lt;p&gt;A good skill for landing pages would define the exact section order, component patterns (hero → social proof → features → CTA), and validation checklist the model must pass before emitting the artifact.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part VI — Troubleshooting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  "Done Xs" in the chat but no design appears
&lt;/h3&gt;

&lt;p&gt;This means the model produced output but it was not recognized as an artifact. The most common causes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The model used tool calls instead of outputting &lt;code&gt;&amp;lt;artifact&amp;gt;&lt;/code&gt; blocks.&lt;/strong&gt; Some models trained on agent data (like MiniMax or Qwen) reflexively try to call file-writing tools instead of producing direct output. Lynkr handles this by detecting the hallucinated tool calls, dropping them, and injecting a redirect message: &lt;em&gt;"You don't have any tools available. Output the result directly as an &lt;code&gt;&amp;lt;artifact&amp;gt;&lt;/code&gt; block."&lt;/em&gt; The model then produces the correct output on the follow-up. If you see this happening, it is normal — it adds one extra round trip but the design still arrives.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The HTML failed validation.&lt;/strong&gt; Open-design requires that artifact HTML start with &lt;code&gt;&amp;lt;!DOCTYPE html&amp;gt;&lt;/code&gt; or &lt;code&gt;&amp;lt;html&lt;/code&gt;. If the model produces a prose response inside &lt;code&gt;&amp;lt;artifact&amp;gt;&lt;/code&gt; tags (e.g. "I updated the header section"), it fails the structural check and is not saved. Try a more explicit prompt: &lt;em&gt;"Output the complete HTML document in an artifact block."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Connection refused from the Docker container.&lt;/strong&gt; Make sure you used &lt;code&gt;host.docker.internal&lt;/code&gt; as the base URL (not &lt;code&gt;localhost&lt;/code&gt;). From inside a Docker container, &lt;code&gt;localhost&lt;/code&gt; refers to the container itself, not your host machine.&lt;/p&gt;

&lt;h3&gt;
  
  
  ANSI escape codes appearing in the generated HTML
&lt;/h3&gt;

&lt;p&gt;This would have caused CSS rules like &lt;code&gt;* { box-sizing: border-box }&lt;/code&gt; to appear as &lt;code&gt;▸ { box-sizing: border-box }&lt;/code&gt; with colored terminal output embedded in the HTML. Lynkr's latest version detects when text content looks like HTML and bypasses the ANSI markdown renderer, keeping the HTML clean. If you see this, make sure you are on the latest Lynkr commit.&lt;/p&gt;

&lt;h3&gt;
  
  
  The model keeps trying to explore the file system
&lt;/h3&gt;

&lt;p&gt;If your Ollama model was trained on agent data (Claude Code, Codex), its first instinct is to run &lt;code&gt;ls&lt;/code&gt; and read files before generating anything. Lynkr injects a system-level note when it detects a tool-less request: &lt;em&gt;"You have NO tools available. Output ONLY text content directly."&lt;/em&gt; Combined with open-design's system prompt, this redirects the model within one or two turns.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part VII — What This Stack Enables
&lt;/h2&gt;

&lt;p&gt;Running open-design with Lynkr gives you a local AI design studio with some properties that are hard to get from fully-managed SaaS alternatives:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Privacy.&lt;/strong&gt; Your prompts, your designs, and your conversation history never leave your machine (when using local Ollama models). Sensitive product ideas, unreleased brand concepts, confidential UI specs — none of it is sent to a third-party cloud.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost control.&lt;/strong&gt; With Ollama models, inference is free. Lynkr's token budget enforcement and complexity routing mean you only pay cloud API costs for the requests that genuinely need it. A typical design session with &lt;code&gt;minimax-m2.5:cloud&lt;/code&gt; on Ollama costs nothing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model flexibility.&lt;/strong&gt; You are not locked to one vendor. If a better open-source design model releases tomorrow, you pull it with &lt;code&gt;ollama pull&lt;/code&gt; and update the model name in open-design settings. The rest of the stack does not change.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Composability.&lt;/strong&gt; Lynkr exposes a standard Anthropic API surface, which means anything that speaks Anthropic can use it — Claude Code, Codex, Continue.dev, your own scripts. Open-design is just one client. You can run others against the same Lynkr instance simultaneously.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Neither open-design nor Lynkr is trying to replace your existing design workflow wholesale. They are building blocks — a canvas that understands artifacts, and a router that understands models. Assembled correctly, they remove the most annoying friction from early-stage design: the gap between &lt;em&gt;I know what I want this to look like&lt;/em&gt; and &lt;em&gt;here is the actual HTML&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The integration is not yet seamless for every model (some still need the redirect injection to break out of agent mode), but the core loop — describe it, see it, iterate — works reliably once both services are running.&lt;/p&gt;

&lt;p&gt;If you are running a product team and want to prototype faster without signing up for another SaaS tool, or if you are building in public and want full ownership of your AI-generated design assets, this stack is worth an afternoon of setup time.&lt;/p&gt;

&lt;p&gt;Both projects are open source. Links below.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Open-design: &lt;code&gt;github.com/nexu-io/open-design&lt;/code&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Lynkr: &lt;code&gt;https://github.com/Fast-Editor/Lynkr&lt;/code&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Ollama: &lt;code&gt;ollama.ai&lt;/code&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you found this useful, share it with someone building with open-source AI tools. Comments and questions are open below.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>One npm Install That Makes Every AI Coding Tool Work With Every LLM Provider</title>
      <dc:creator>Vishal VeeraReddy</dc:creator>
      <pubDate>Thu, 16 Apr 2026 01:27:01 +0000</pubDate>
      <link>https://dev.to/lynkr/one-npm-install-that-makes-every-ai-coding-tool-work-with-every-llm-provider-4c7o</link>
      <guid>https://dev.to/lynkr/one-npm-install-that-makes-every-ai-coding-tool-work-with-every-llm-provider-4c7o</guid>
      <description>&lt;p&gt;Quick question: how many API keys are in your &lt;code&gt;.env&lt;/code&gt; right now just for AI coding tools?&lt;/p&gt;

&lt;p&gt;If you use Claude Code (Anthropic key), Codex (OpenAI key), and Cursor (another OpenAI key) — that's three providers, three billing accounts, three rate limit systems, zero flexibility.&lt;/p&gt;

&lt;p&gt;I built Lynkr to collapse all of that into one proxy.&lt;/p&gt;

&lt;h3&gt;
  
  
  What It Does
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Code ──┐
Codex CLI ────┤
Cursor ───────┤──→ Lynkr (localhost:8081) ──→ Any LLM Provider
Cline ────────┤
Continue ─────┤
LangChain ────┤
Vercel AI ────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lynkr auto-detects which tool is connecting and speaks its language:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic Messages API&lt;/strong&gt; for Claude Code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI Responses API&lt;/strong&gt; for Codex CLI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI Chat Completions&lt;/strong&gt; for everything else (Cursor, Cline, Continue, KiloCode, LangChain, Vercel AI SDK, any OpenAI-compatible client)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; lynkr
lynkr start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then configure each tool to point at Lynkr:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Claude Code&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:8081

&lt;span class="c"&gt;# Codex CLI (~/.codex/config.toml)&lt;/span&gt;
&lt;span class="c"&gt;# base_url = "http://localhost:8081/v1"&lt;/span&gt;

&lt;span class="c"&gt;# Cursor&lt;/span&gt;
&lt;span class="c"&gt;# Settings → Models → Base URL: http://localhost:8081/v1&lt;/span&gt;

&lt;span class="c"&gt;# LangChain&lt;/span&gt;
&lt;span class="c"&gt;# ChatOpenAI(base_url="http://localhost:8081/v1", api_key="sk-lynkr")&lt;/span&gt;

&lt;span class="c"&gt;# Literally any OpenAI-compatible tool&lt;/span&gt;
&lt;span class="c"&gt;# OPENAI_BASE_URL=http://localhost:8081/v1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All of them hit the same Lynkr instance. Same provider. Same routing. Same optimization.&lt;/p&gt;

&lt;h3&gt;
  
  
  12+ Backends
&lt;/h3&gt;

&lt;p&gt;Pick your provider:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Free (local)&lt;/span&gt;
&lt;span class="nv"&gt;MODEL_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ollama

&lt;span class="c"&gt;# Cheap cloud&lt;/span&gt;
&lt;span class="nv"&gt;MODEL_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;openrouter    &lt;span class="c"&gt;# 100+ models&lt;/span&gt;
&lt;span class="nv"&gt;MODEL_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;deepseek      &lt;span class="c"&gt;# 1/10 Anthropic cost&lt;/span&gt;
&lt;span class="nv"&gt;MODEL_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;zai           &lt;span class="c"&gt;# 1/7 Anthropic cost&lt;/span&gt;

&lt;span class="c"&gt;# Enterprise cloud&lt;/span&gt;
&lt;span class="nv"&gt;MODEL_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;bedrock       &lt;span class="c"&gt;# AWS, 100+ models&lt;/span&gt;
&lt;span class="nv"&gt;MODEL_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;vertex        &lt;span class="c"&gt;# Google, Gemini 2.5&lt;/span&gt;
&lt;span class="nv"&gt;MODEL_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;databricks    &lt;span class="c"&gt;# Claude Opus 4.6&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or mix them across complexity tiers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;TIER_SIMPLE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ollama:qwen2.5-coder
&lt;span class="nv"&gt;TIER_MEDIUM&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;openrouter:deepseek-r1
&lt;span class="nv"&gt;TIER_COMPLEX&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;databricks:claude-sonnet-4-5
&lt;span class="nv"&gt;TIER_REASONING&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;vertex:gemini-2.5-pro
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Simple requests (rename a variable) → free local model.&lt;br&gt;
Complex requests (refactor auth across 23 files) → top-tier cloud model.&lt;/p&gt;

&lt;p&gt;The routing engine makes this decision automatically using 5-phase complexity analysis — including Graphify, which reads your actual codebase AST across 19 languages to detect high-risk changes.&lt;/p&gt;
&lt;h3&gt;
  
  
  For Agent Builders: LangChain, CrewAI, AutoGen
&lt;/h3&gt;

&lt;p&gt;This is where Lynkr shines for automation. If you're building agents that make hundreds of LLM calls per pipeline run, most of those calls are simple (read a file, parse JSON, format output). Only a few require deep reasoning.&lt;/p&gt;

&lt;p&gt;Without Lynkr: every call hits GPT-4o at $15/MTok. 200 calls × $0.03 = $6/run.&lt;/p&gt;

&lt;p&gt;With Lynkr: 140 calls hit free Ollama, 40 hit OpenRouter ($0.005 each), 20 hit Databricks ($0.02 each). Total: $0.60/run. 90% savings.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Nothing changes in your agent code
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;

&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:8081/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-lynkr&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# Lynkr routes based on complexity
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Your existing chains, agents, and tools work unchanged
&lt;/span&gt;&lt;span class="n"&gt;agent_executor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Refactor the payment module&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Token Compression Stack
&lt;/h3&gt;

&lt;p&gt;On top of routing, every request passes through 7 optimization phases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Smart tool selection&lt;/strong&gt; — only relevant tools sent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code Mode&lt;/strong&gt; — 100+ tool defs → 4 meta-tools (96% reduction, saves 16,800 tokens/request)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Distill&lt;/strong&gt; — delta rendering via Jaccard similarity (60-80% savings)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt cache&lt;/strong&gt; — SHA-256 keyed LRU&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory dedup&lt;/strong&gt; — removes repeated context across turns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;History compression&lt;/strong&gt; — sliding window with structural dedup&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Headroom sidecar&lt;/strong&gt; — optional ML compression (47-92%)&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Enterprise: Circuit Breakers, Telemetry, Hot-Reload
&lt;/h3&gt;

&lt;p&gt;For teams running this in production:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Health check&lt;/span&gt;
curl http://localhost:8081/health

&lt;span class="c"&gt;# List all providers and models&lt;/span&gt;
curl http://localhost:8081/v1/providers
curl http://localhost:8081/v1/models

&lt;span class="c"&gt;# Routing analytics&lt;/span&gt;
curl http://localhost:8081/v1/routing/stats
curl http://localhost:8081/v1/routing/accuracy

&lt;span class="c"&gt;# Change config without restart&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8081/v1/admin/reload

&lt;span class="c"&gt;# Prometheus metrics&lt;/span&gt;
curl http://localhost:8081/metrics
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Circuit breakers auto-detect provider failures. After 5 failed requests, incoming calls fail instantly instead of timing out. Half-open probes test recovery every 60 seconds. When 2 probes succeed, traffic resumes. No manual intervention.&lt;/p&gt;

&lt;h3&gt;
  
  
  Get Started
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; lynkr &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; lynkr start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;699 tests. Apache 2.0. Node.js only. Zero infrastructure.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;github.com/Fast-Editor/Lynkr&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you're managing multiple AI coding tools or building LLM-powered agents, Lynkr consolidates everything into one proxy with intelligent routing and real cost savings.&lt;/p&gt;

&lt;p&gt;Star it if it helps. PRs welcome.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Run OpenClaw/Clawdbot for FREE with Lynkr (No API Bills)</title>
      <dc:creator>Vishal VeeraReddy</dc:creator>
      <pubDate>Sun, 01 Feb 2026 02:00:41 +0000</pubDate>
      <link>https://dev.to/lynkr/run-openclawclawdbot-for-free-with-lynkr-no-api-bills-3kg2</link>
      <guid>https://dev.to/lynkr/run-openclawclawdbot-for-free-with-lynkr-no-api-bills-3kg2</guid>
      <description>&lt;p&gt;&lt;em&gt;Your personal AI assistant running 24/7 — without burning through API credits&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;If you've tried &lt;a href="https://openclaw.ai" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; (also known as Clawdbot), you know it's incredible. An AI assistant that lives in WhatsApp/Telegram, manages your calendar, clears your inbox, checks you in for flights — all while you chat naturally.&lt;/p&gt;

&lt;p&gt;But there's a catch: &lt;strong&gt;it needs an LLM backend&lt;/strong&gt;, and Anthropic API bills add up fast.&lt;/p&gt;

&lt;p&gt;What if I told you that you can run OpenClaw &lt;strong&gt;completely free&lt;/strong&gt; using local models? Enter &lt;strong&gt;Lynkr&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  🔗 What is Lynkr?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;Lynkr&lt;/a&gt; is a universal LLM proxy that lets you route OpenClaw requests to &lt;strong&gt;any model provider&lt;/strong&gt; — including free local models via Ollama.&lt;/p&gt;

&lt;p&gt;The magic? OpenClaw thinks it's talking to Anthropic, but Lynkr transparently routes requests to your local GPU instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  💡 Why This Matters
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem with direct Anthropic API:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;💸 Bills explode quickly (OpenClaw runs 24/7)&lt;/li&gt;
&lt;li&gt;⚠️ Potential ToS concerns with automated assistants&lt;/li&gt;
&lt;li&gt;🔒 Your data goes to external servers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;With Lynkr + Ollama:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ &lt;strong&gt;$0/month&lt;/strong&gt; — runs entirely on your machine&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;ToS compliant&lt;/strong&gt; — no API abuse concerns&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;100% private&lt;/strong&gt; — data never leaves your computer&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Smart fallback&lt;/strong&gt; — route to cloud only when needed&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🚀 Setup Guide (15 minutes)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Install Ollama
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# macOS/Linux&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh

&lt;span class="c"&gt;# Pull Kimi K2.5 (recommended for coding/assistant tasks)&lt;/span&gt;
ollama pull kimi-k2.5

&lt;span class="c"&gt;# Also grab an embeddings model for semantic search&lt;/span&gt;
ollama pull nomic-embed-text
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Install Lynkr
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Option A: NPM (recommended)&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; lynkr

&lt;span class="c"&gt;# Option B: Clone repo&lt;/span&gt;
git clone https://github.com/Fast-Editor/Lynkr.git
&lt;span class="nb"&gt;cd &lt;/span&gt;Lynkr
npm &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Configure Lynkr
&lt;/h3&gt;

&lt;p&gt;Create your &lt;code&gt;.env&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Copy example config&lt;/span&gt;
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Edit &lt;code&gt;.env&lt;/code&gt; with these settings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Primary provider: Ollama (FREE, local)&lt;/span&gt;
&lt;span class="nv"&gt;MODEL_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ollama
&lt;span class="nv"&gt;OLLAMA_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;kimi-k2.5
&lt;span class="nv"&gt;OLLAMA_ENDPOINT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:11434

&lt;span class="c"&gt;# Enable hybrid routing (local first, cloud fallback)&lt;/span&gt;
&lt;span class="nv"&gt;PREFER_OLLAMA&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true
&lt;/span&gt;&lt;span class="nv"&gt;OLLAMA_MAX_TOOLS_FOR_ROUTING&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3

&lt;span class="c"&gt;# Fallback provider (optional - for complex requests)&lt;/span&gt;
&lt;span class="nv"&gt;FALLBACK_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true
&lt;/span&gt;&lt;span class="nv"&gt;FALLBACK_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;openrouter
&lt;span class="nv"&gt;OPENROUTER_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-or-v1-your-key  &lt;span class="c"&gt;# Only needed if using fallback&lt;/span&gt;

&lt;span class="c"&gt;# Embeddings for semantic search&lt;/span&gt;
&lt;span class="nv"&gt;OLLAMA_EMBEDDINGS_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;nomic-embed-text

&lt;span class="c"&gt;# Token optimization (60-80% cost reduction on cloud fallback)&lt;/span&gt;
&lt;span class="nv"&gt;TOKEN_TRACKING_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true
&lt;/span&gt;&lt;span class="nv"&gt;TOOL_TRUNCATION_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true
&lt;/span&gt;&lt;span class="nv"&gt;HISTORY_COMPRESSION_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Start Lynkr
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# If installed via npm&lt;/span&gt;
lynkr

&lt;span class="c"&gt;# If cloned repo&lt;/span&gt;
npm start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🚀 Lynkr proxy running on http://localhost:8081
📊 Provider: ollama (kimi-k2.5)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 5: Configure OpenClaw/Clawdbot
&lt;/h3&gt;

&lt;p&gt;In your OpenClaw configuration, set:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setting&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model/auth provider&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Copilot&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Copilot auth method&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Copilot Proxy (local)&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Copilot Proxy base URL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;http://localhost:8081/v1&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model ID&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;kimi-k2.5&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's it! Your OpenClaw now runs through Lynkr → Ollama → Kimi K2.5, completely free.&lt;/p&gt;

&lt;h2&gt;
  
  
  ⚡ How Hierarchical Routing Works
&lt;/h2&gt;

&lt;p&gt;The killer feature is &lt;strong&gt;smart routing&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OpenClaw Request
       ↓
   Is it simple?
    /        \
  Yes         No
   ↓           ↓
Ollama     Cloud Fallback
(FREE)     (with caching)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lynkr analyzes each request:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Simple requests&lt;/strong&gt; (&amp;lt; 3 tools) → Ollama (free)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complex requests&lt;/strong&gt; → Cloud fallback (with heavy caching/compression)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means even if you enable cloud fallback, you'll use it sparingly.&lt;/p&gt;

&lt;h2&gt;
  
  
  💰 Cost Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setup&lt;/th&gt;
&lt;th&gt;Monthly Cost&lt;/th&gt;
&lt;th&gt;Privacy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Direct Anthropic API&lt;/td&gt;
&lt;td&gt;$100-300+&lt;/td&gt;
&lt;td&gt;❌ Cloud&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lynkr + Ollama only&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ 100% Local&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lynkr + Hybrid routing&lt;/td&gt;
&lt;td&gt;~$5-15&lt;/td&gt;
&lt;td&gt;✅ Mostly Local&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  🔒 Why This is ToS-Safe
&lt;/h2&gt;

&lt;p&gt;Running OpenClaw directly against Anthropic's API at scale can raise ToS concerns (automated usage, high volume, etc.).&lt;/p&gt;

&lt;p&gt;With Lynkr:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Local models&lt;/strong&gt; = no external API terms apply&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Your hardware&lt;/strong&gt; = your rules&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fallback is minimal&lt;/strong&gt; = within normal usage patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🧠 Advanced: Memory &amp;amp; Compression
&lt;/h2&gt;

&lt;p&gt;Lynkr includes enterprise features that further reduce costs:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Long-Term Memory (Titans-inspired):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;MEMORY_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true
&lt;/span&gt;&lt;span class="nv"&gt;MEMORY_RETRIEVAL_LIMIT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5
&lt;span class="nv"&gt;MEMORY_SURPRISE_THRESHOLD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0.3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Headroom Compression (47-92% token reduction):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;HEADROOM_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true
&lt;/span&gt;&lt;span class="nv"&gt;HEADROOM_SMART_CRUSHER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true
&lt;/span&gt;&lt;span class="nv"&gt;HEADROOM_CACHE_ALIGNER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These features mean even when you hit cloud fallback, you're using far fewer tokens.&lt;/p&gt;

&lt;h2&gt;
  
  
  🎯 Recommended Models
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Ollama Model&lt;/th&gt;
&lt;th&gt;Pull Command&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;General Assistant&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;kimi-k2.5&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ollama pull kimi-k2.5&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Coding Tasks&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;qwen2.5-coder:latest&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ollama pull qwen2.5-coder:latest&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fast/Light&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;llama3.2:3b&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ollama pull llama3.2:3b&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Embeddings&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;nomic-embed-text&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ollama pull nomic-embed-text&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  🏃 TL;DR
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh
ollama pull kimi-k2.5
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; lynkr

&lt;span class="c"&gt;# Configure (.env)&lt;/span&gt;
&lt;span class="nv"&gt;MODEL_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ollama
&lt;span class="nv"&gt;OLLAMA_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;kimi-k2.5
&lt;span class="nv"&gt;PREFER_OLLAMA&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;

&lt;span class="c"&gt;# Run&lt;/span&gt;
lynkr

&lt;span class="c"&gt;# Point OpenClaw to http://localhost:8081/v1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; OpenClaw running 24/7, $0/month, 100% private.&lt;/p&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;⭐ &lt;strong&gt;&lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;Lynkr on GitHub&lt;/a&gt;&lt;/strong&gt; — Star if this helped!&lt;/li&gt;
&lt;li&gt;📚 &lt;strong&gt;&lt;a href="https://deepwiki.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;Lynkr Documentation&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;🦀 &lt;strong&gt;&lt;a href="https://openclaw.ai" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt;&lt;/strong&gt; — The AI assistant&lt;/li&gt;
&lt;li&gt;🦙 &lt;strong&gt;&lt;a href="https://ollama.ai" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt;&lt;/strong&gt; — Local LLM runtime&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Questions? Drop a comment below or join the &lt;a href="https://discord.gg/openclaw" rel="noopener noreferrer"&gt;OpenClaw Discord&lt;/a&gt;!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>tutorial</category>
      <category>productivity</category>
    </item>
    <item>
      <title>How I Cut My AI Coding Tool Costs by 70% (And You Can Too)</title>
      <dc:creator>Vishal VeeraReddy</dc:creator>
      <pubDate>Sun, 01 Feb 2026 01:45:11 +0000</pubDate>
      <link>https://dev.to/lynkr/how-i-cut-my-ai-coding-tool-costs-by-70-and-you-can-too-ol0</link>
      <guid>https://dev.to/lynkr/how-i-cut-my-ai-coding-tool-costs-by-70-and-you-can-too-ol0</guid>
      <description>&lt;p&gt;&lt;em&gt;Run Cursor, Claude Code, Cline, and more on ANY LLM — including free local models&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;If you're like me, you've probably fallen in love with AI coding assistants. Tools like &lt;strong&gt;Cursor&lt;/strong&gt;, &lt;strong&gt;Claude Code CLI&lt;/strong&gt;, &lt;strong&gt;Cline&lt;/strong&gt;, and &lt;strong&gt;OpenClaw/Clawdbot&lt;/strong&gt; have genuinely transformed how I write code. But there's a catch — they're expensive.&lt;/p&gt;

&lt;p&gt;Between API costs and subscription fees, I was burning through $100-300/month just on AI coding tools. That's when I built &lt;strong&gt;Lynkr&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  🔗 What is Lynkr?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;Lynkr&lt;/a&gt; is an open-source universal LLM proxy that lets you run your favorite AI coding tools on &lt;strong&gt;any model provider&lt;/strong&gt; — including completely free local models via Ollama.&lt;/p&gt;

&lt;p&gt;Think of it as a universal adapter. Your tools think they're talking to their native API, but Lynkr transparently routes requests to whatever backend you choose.&lt;/p&gt;

&lt;h2&gt;
  
  
  💡 The Problem Lynkr Solves
&lt;/h2&gt;

&lt;p&gt;Here's what frustrates developers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Vendor lock-in&lt;/strong&gt; — Cursor only works with OpenAI/Anthropic. Claude Code CLI only works with Anthropic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Expensive APIs&lt;/strong&gt; — Claude API costs add up fast, especially for heavy coding sessions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No local option&lt;/strong&gt; — Want to use your RTX 4090 for coding assistance? Too bad.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise restrictions&lt;/strong&gt; — Many companies can't send code to external APIs&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Lynkr fixes all of this.&lt;/p&gt;

&lt;h2&gt;
  
  
  🏗️ How It Works
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────┐     ┌─────────┐     ┌──────────────────┐
│ Cursor      │     │         │     │ Ollama (local)   │
│ Claude Code │────▶│  Lynkr  │────▶│ AWS Bedrock      │
│ Cline       │     │  Proxy  │     │ Azure OpenAI     │
│ OpenClaw    │     │         │     │ OpenRouter       │
└─────────────┘     └─────────┘     └──────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lynkr acts as a drop-in replacement for the Anthropic API. It:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Receives requests from your AI coding tool&lt;/li&gt;
&lt;li&gt;Translates them to your target provider's format&lt;/li&gt;
&lt;li&gt;Streams responses back seamlessly&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Your tools don't know the difference.&lt;/p&gt;

&lt;h2&gt;
  
  
  🚀 Supported Providers
&lt;/h2&gt;

&lt;p&gt;Lynkr supports &lt;strong&gt;12+ providers&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ollama&lt;/strong&gt; - 100% local, FREE&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Bedrock&lt;/strong&gt; - Enterprise-grade, ~60% cheaper&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure OpenAI&lt;/strong&gt; - Enterprise-grade&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure Anthropic&lt;/strong&gt; - Claude on Azure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenRouter&lt;/strong&gt; - 100+ models via single API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI&lt;/strong&gt; - Direct GPT access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google Vertex AI&lt;/strong&gt; - Gemini models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Databricks&lt;/strong&gt; - Enterprise ML platform&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Z.AI (Zhipu)&lt;/strong&gt; - ~1/7 cost of Anthropic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LM Studio&lt;/strong&gt; - Local models with GUI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;llama.cpp&lt;/strong&gt; - Local GGUF models&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  📦 Quick Start (5 minutes)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Option 1: Run locally with Ollama (FREE)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Ollama&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh

&lt;span class="c"&gt;# Pull a coding model&lt;/span&gt;
ollama pull qwen2.5-coder:latest

&lt;span class="c"&gt;# Clone and configure Lynkr&lt;/span&gt;
git clone https://github.com/Fast-Editor/Lynkr.git
&lt;span class="nb"&gt;cd &lt;/span&gt;Lynkr
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env

&lt;span class="c"&gt;# Edit .env:&lt;/span&gt;
&lt;span class="nv"&gt;MODEL_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ollama
&lt;span class="nv"&gt;OLLAMA_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;qwen2.5-coder:latest
&lt;span class="nv"&gt;OLLAMA_ENDPOINT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:11434

&lt;span class="c"&gt;# Start&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Option 2: Use with AWS Bedrock
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone and configure&lt;/span&gt;
git clone https://github.com/Fast-Editor/Lynkr.git
&lt;span class="nb"&gt;cd &lt;/span&gt;Lynkr
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env

&lt;span class="c"&gt;# Edit .env:&lt;/span&gt;
&lt;span class="nv"&gt;MODEL_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;bedrock
&lt;span class="nv"&gt;AWS_BEDROCK_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-bedrock-api-key
&lt;span class="nv"&gt;AWS_BEDROCK_REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-east-1
&lt;span class="nv"&gt;AWS_BEDROCK_MODEL_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;anthropic.claude-3-5-sonnet-20241022-v2:0

&lt;span class="c"&gt;# Start&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Option 3: OpenRouter (Simplest Cloud Setup)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Edit .env:&lt;/span&gt;
&lt;span class="nv"&gt;MODEL_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;openrouter
&lt;span class="nv"&gt;OPENROUTER_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-or-v1-your-key
&lt;span class="nv"&gt;OPENROUTER_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;anthropic/claude-3.5-sonnet

npm start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Configure Your Tool
&lt;/h3&gt;

&lt;p&gt;Point your AI coding tool to Lynkr:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# For Claude Code CLI&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;dummy
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:8081

&lt;span class="c"&gt;# Now use Claude Code normally!&lt;/span&gt;
claude &lt;span class="s2"&gt;"Refactor this function"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  💰 Real Cost Comparison
&lt;/h2&gt;

&lt;p&gt;Here's what I was spending vs. what I spend now:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Before (Direct API)&lt;/th&gt;
&lt;th&gt;After (Lynkr + Bedrock)&lt;/th&gt;
&lt;th&gt;Savings&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code CLI&lt;/td&gt;
&lt;td&gt;$150/month&lt;/td&gt;
&lt;td&gt;$45/month&lt;/td&gt;
&lt;td&gt;70%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Heavy Cursor usage&lt;/td&gt;
&lt;td&gt;$100/month&lt;/td&gt;
&lt;td&gt;$30/month&lt;/td&gt;
&lt;td&gt;70%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;With Ollama&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0/month&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The local Ollama option is genuinely free. If you have a decent GPU (RTX 3080+), models like &lt;code&gt;qwen2.5-coder&lt;/code&gt; run surprisingly well.&lt;/p&gt;

&lt;h2&gt;
  
  
  🔒 Enterprise Use Cases
&lt;/h2&gt;

&lt;p&gt;Lynkr shines in enterprise environments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Air-gapped networks&lt;/strong&gt;: Run entirely local with Ollama&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance&lt;/strong&gt;: Keep code on AWS/Azure infrastructure you control&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost control&lt;/strong&gt;: Set usage limits and track spending per team&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit trails&lt;/strong&gt;: Log all requests for compliance&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  ⚡ Advanced Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid Routing&lt;/strong&gt;: Use Ollama for simple requests, fallback to cloud for complex ones&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token Optimization&lt;/strong&gt;: 60-80% cost reduction through smart compression&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-Term Memory&lt;/strong&gt;: Titans-inspired memory system for context persistence&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Headroom Compression&lt;/strong&gt;: 47-92% token reduction via intelligent context compression&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hot Reload&lt;/strong&gt;: Config changes apply without restart&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smart Tool Selection&lt;/strong&gt;: Automatic tool filtering to reduce token usage&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🤝 Contributing
&lt;/h2&gt;

&lt;p&gt;Lynkr is open source (MIT license). Contributions welcome:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🐛 Bug reports and fixes&lt;/li&gt;
&lt;li&gt;🔌 New provider integrations&lt;/li&gt;
&lt;li&gt;📖 Documentation improvements&lt;/li&gt;
&lt;li&gt;⭐ Stars on GitHub!&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It Today
&lt;/h2&gt;

&lt;p&gt;Stop overpaying for AI coding tools. With Lynkr, you can:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Save 60-80%&lt;/strong&gt; using AWS Bedrock or Azure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pay nothing&lt;/strong&gt; using local Ollama models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep code private&lt;/strong&gt; in enterprise environments&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;⭐ &lt;strong&gt;Star on GitHub&lt;/strong&gt;: &lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;github.com/Fast-Editor/Lynkr&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;📚 &lt;strong&gt;Full Documentation&lt;/strong&gt;: &lt;a href="https://deepwiki.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;deepwiki.com/Fast-Editor/Lynkr&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What AI coding tools do you use? Have you tried running them locally? Let me know in the comments!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>productivity</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>I Slashed My AI Coding Bills by 65% With This One Weird Trick.</title>
      <dc:creator>Vishal VeeraReddy</dc:creator>
      <pubDate>Wed, 31 Dec 2025 05:57:34 +0000</pubDate>
      <link>https://dev.to/lynkr/i-slashed-my-ai-coding-bills-by-65-with-this-one-weird-trick-3hn3</link>
      <guid>https://dev.to/lynkr/i-slashed-my-ai-coding-bills-by-65-with-this-one-weird-trick-3hn3</guid>
      <description>&lt;p&gt;The Problem Every Dev Using AI Assistants Faces.You know that moment when you're using Claude Code CLI, crushing it with AI-powered coding, and then you check your Anthropic bill at the end of the month?&lt;br&gt;
Yeah. $347 for me last month. 😱&lt;br&gt;
And here's the kicker: 65% of my requests were literally just "write a hello world function" or "explain this error message" - stuff that could easily run on my laptop.&lt;br&gt;
I was paying premium API rates for queries that a local 7B model could handle in 300ms.&lt;br&gt;
So I did what any reasonable developer would do: I spent a weekend building a solution that now saves me hundreds of dollars monthly.&lt;br&gt;
Meet Lynkr: The Claude Code "Jailbreak" Nobody Asked For&lt;br&gt;
Lynkr is a self-hosted proxy that sits between Claude Code CLI and... well, literally any LLM backend you want.&lt;br&gt;
Databricks? ✅&lt;br&gt;
Azure? ✅&lt;br&gt;
OpenRouter with 100+ models? ✅&lt;br&gt;
Local Ollama models that cost $0 per request? ✅✅✅&lt;br&gt;
llama.cpp with your own GGUF quantized models? ✅✅✅✅&lt;br&gt;
But here's where it gets interesting...&lt;br&gt;
The 3-Tier Routing System That Changed Everything&lt;br&gt;
Instead of sending every single request to expensive cloud APIs, Lynkr automatically routes based on complexity:&lt;/p&gt;

&lt;p&gt;🏎️ &lt;/p&gt;
&lt;h2&gt;
  
  
  Tier 1: Local/Free (0-2 tools needed)
&lt;/h2&gt;

&lt;p&gt;Ollama or llama.cpp running on your machine&lt;br&gt;
Response time: 100-500ms&lt;br&gt;
Cost: $0.00&lt;br&gt;
Handles: "explain this code", "write a function", "fix this bug"&lt;/p&gt;
&lt;h2&gt;
  
  
  💰 Tier 2: Mid-Tier Cloud (3-14 tools)
&lt;/h2&gt;

&lt;p&gt;OpenRouter with GPT-4o-mini ($0.15 per 1M tokens)&lt;br&gt;
Response time: 300-1500ms&lt;br&gt;
Cost: ~$0.0002 per request&lt;br&gt;
Handles: Multi-file refactoring, moderate complexity&lt;/p&gt;
&lt;h2&gt;
  
  
  🏢 Tier 3: Enterprise (15+ tools)
&lt;/h2&gt;

&lt;p&gt;Databricks or Azure Anthropic (Claude Opus/Sonnet)&lt;br&gt;
Response time: 500-2500ms&lt;br&gt;
Cost: Standard API rates&lt;br&gt;
Handles: Complex analysis, heavy workflows&lt;/p&gt;

&lt;p&gt;The proxy automatically decides which tier to use. No configuration. No manual routing. It just works.&lt;br&gt;
The Results Speak For Themselves&lt;/p&gt;
&lt;h3&gt;
  
  
  Here's what happened after I switched:
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before Lynkr&lt;/th&gt;
&lt;th&gt;After Lynkr&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Avg Response Time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1500-2500ms&lt;/td&gt;
&lt;td&gt;400-800ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;70% faster&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Monthly API Bill&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$347&lt;/td&gt;
&lt;td&gt;$122&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;65% cheaper&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Local Request %&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;68%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0 cost on 68% of requests&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Downtime Impact&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;100% blocked&lt;/td&gt;
&lt;td&gt;0% (fallback works)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;∞% more reliable&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's not a typo. I'm getting 70% faster responses while spending 65% less money.&lt;br&gt;
Automatic Fallback = Zero Downtime&lt;/p&gt;

&lt;p&gt;The killer feature nobody talks about: if your local Ollama server crashes (mine does, frequently), Lynkr &lt;strong&gt;automatically falls back&lt;/strong&gt; to the next tier.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Request → Try Ollama → [Connection Refused]
       → Try OpenRouter → [Rate Limited]  
       → Try Databricks → ✅ Success
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  MCP Server Integration (Because Why Not)
&lt;/h3&gt;

&lt;p&gt;Want to integrate GitHub, Jira, Slack, or literally any other tool via Model Context Protocol?&lt;br&gt;
Just drop a manifest file in ~/.claude/mcp and Lynkr automatically:&lt;/p&gt;

&lt;p&gt;Discovers it&lt;br&gt;
Launches the MCP server&lt;br&gt;
Exposes the tools to your AI assistant&lt;br&gt;
Sandboxes it in Docker (optional but recommended)&lt;/p&gt;
&lt;h3&gt;
  
  
  Production-Ready From Day One
&lt;/h3&gt;

&lt;p&gt;I learned from my mistakes. This isn't a weekend hack held together with duct tape:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Circuit breakers (no cascading failures)&lt;/li&gt;
&lt;li&gt;✅ Load shedding (503s when overloaded, not crashes)&lt;/li&gt;
&lt;li&gt;✅ Prometheus metrics api(because you can't improve what you don't measure)&lt;/li&gt;
&lt;li&gt;✅ Kubernetes health checks (liveness + readiness probes)&lt;/li&gt;
&lt;li&gt;✅ Graceful shutdown (zero-downtime deployments)&lt;/li&gt;
&lt;li&gt;✅ Request ID correlation (debug production issues in seconds)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Quick Install (curl)
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -fsSL https://raw.githubusercontent.com/vishalveerareddy123/Lynkr/main/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;For .env&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Template 1: Databricks Only (Simple)
bash# .env
MODEL_PROVIDER=databricks
DATABRICKS_API_BASE=https://your-workspace.cloud.databricks.com
DATABRICKS_API_KEY=dapi1234567890abcdef
DATABRICKS_ENDPOINT_PATH=/serving-endpoints/databricks-claude-sonnet-4-5/invocations

PORT=8080
WORKSPACE_ROOT=/path/to/your/project
PROMPT_CACHE_ENABLED=true
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Template 2: Ollama Only (100% Local)
bash# .env
MODEL_PROVIDER=ollama
OLLAMA_ENDPOINT=http://localhost:11434
OLLAMA_MODEL=qwen2.5-coder:latest
OLLAMA_TIMEOUT_MS=120000

PORT=8080
WORKSPACE_ROOT=/path/to/your/project
PROMPT_CACHE_ENABLED=true
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Template 3: Hybrid Routing (Cost Optimized)
bash# .env
MODEL_PROVIDER=databricks
PREFER_OLLAMA=true
FALLBACK_ENABLED=true

# Ollama (Free Tier)
OLLAMA_ENDPOINT=http://localhost:11434
OLLAMA_MODEL=qwen2.5-coder:latest
OLLAMA_MAX_TOOLS_FOR_ROUTING=3

# OpenRouter (Mid Tier)
OPENROUTER_API_KEY=sk-or-v1-your-key-here
OPENROUTER_MODEL=openai/gpt-4o-mini
OPENROUTER_MAX_TOOLS_FOR_ROUTING=15

# Databricks (Heavy Tier)
DATABRICKS_API_BASE=https://your-workspace.cloud.databricks.com
DATABRICKS_API_KEY=dapi1234567890abcdef

PORT=8080
WORKSPACE_ROOT=/path/to/your/project
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. You're now running Claude Code CLI with:&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Use Cases (AKA "Will This Actually Help Me?")
&lt;/h2&gt;

&lt;h3&gt;
  
  
  For Indie Developers
&lt;/h3&gt;

&lt;p&gt;Use free Ollama models for 90% of your work. Only pay for complex tasks. Your $347/month bill becomes $35/month.&lt;/p&gt;

&lt;h3&gt;
  
  
  For Enterprise Teams
&lt;/h3&gt;

&lt;p&gt;Route simple queries to on-premise llama.cpp servers. Complex queries go to your Databricks workspace. &lt;strong&gt;Data never leaves your network&lt;/strong&gt; for simple requests.&lt;/p&gt;

&lt;h3&gt;
  
  
  For AI Researchers
&lt;/h3&gt;

&lt;p&gt;Test your own fine-tuned models with Claude Code CLI. Compare them side-by-side with GPT-4, Claude, Gemini via OpenRouter.&lt;/p&gt;

&lt;h3&gt;
  
  
  For Privacy-Conscious Devs
&lt;/h3&gt;

&lt;p&gt;Run Ollama or llama.cpp locally. Code never leaves your machine unless you explicitly need cloud capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Part Where I Show You The Code
&lt;/h2&gt;

&lt;p&gt;Okay fine, here's how the hybrid routing actually works under the hood:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;javascript// Simplified version - actual code has more checks
async function routeRequest(request) {
  const toolCount = request.tools?.length || 0;

  // Tier 1: Local/Free (0-2 tools)
  if (toolCount &amp;lt;= 2 &amp;amp;&amp;amp; config.PREFER_OLLAMA) {
    try {
      return await ollamaClient.send(request);
    } catch (err) {
      logger.warn('Ollama failed, falling back to cloud');
      // Fallback to next tier...
    }
  }

  // Tier 2: Mid-Tier (3-14 tools)
  if (toolCount &amp;lt;= 14 &amp;amp;&amp;amp; config.OPENROUTER_API_KEY) {
    try {
      return await openRouterClient.send(request);
    } catch (err) {
      logger.warn('OpenRouter failed, falling back to Databricks');
      // Fallback to next tier...
    }
  }

  // Tier 3: Enterprise (15+ tools)
  return await databricksClient.send(request);
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The circuit breaker wraps each client, so after 5 consecutive failures, requests fail fast (100ms instead of 30s timeout).&lt;/p&gt;

&lt;h3&gt;
  
  
  Models That Actually Work Well
&lt;/h3&gt;

&lt;p&gt;Through extensive testing, here's what actually performs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;For Ollama (Local):

qwen2.5-coder:7b - Best for code generation
llama3.1:8b - Best for general tasks
mistral:7b - Fastest responses

For OpenRouter (Mid-Tier):

openai/gpt-5.1 - Best value ($0.15/1M tokens)
meta-llama/llama-3.1-8b-instruct:free - Actually free (rate limited)

For llama.cpp (Maximum Control):

Any GGUF model works
I use Qwen2.5-Coder-7B-Instruct-Q5_K_M.gguf
Point to your llama.cpp server's OpenAI-compatible endpoint
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Catches (Because Nothing's Perfect)&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Ollama doesn't support all Claude features&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No extended thinking mode&lt;br&gt;
No prompt caching (Lynkr adds its own though)&lt;br&gt;
Tool calling works but varies by model&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You need to run local inference&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Ollama = ~8GB RAM for 7B models&lt;br&gt;
llama.cpp = ~6GB RAM with quantization&lt;br&gt;
Not great for 4GB laptops&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Initial setup requires some config&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Environment variables for API keys&lt;br&gt;
Workspace paths&lt;br&gt;
Model selection&lt;/p&gt;

&lt;p&gt;But the wizard handles 90% of this automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get Started Now
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;https://github.com/Fast-Editor/Lynkr&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Docs&lt;/strong&gt;: fast-editor.github.io/Lynkr/&lt;br&gt;
&lt;strong&gt;npm&lt;/strong&gt;: npm install -g lynkr&lt;br&gt;
Apache licensed. PRs welcome. Built with Node.js, SQLite, and determination.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Future Roadmap
&lt;/h2&gt;

&lt;p&gt;Things I'm working on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Response caching layer (Redis-backed)&lt;/li&gt;
&lt;li&gt;[ ] Per-file diff comments (like Claude's review UX)&lt;/li&gt;
&lt;li&gt;[ ] Better LSP integration for more languages&lt;/li&gt;
&lt;li&gt;[ ] Claude Skills compatibility layer&lt;/li&gt;
&lt;li&gt;[ ] Historical metrics dashboard&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Look, I'm not saying Anthropic's hosted service is bad. It's excellent. But for developers who want:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Control over their infrastructure&lt;/li&gt;
&lt;li&gt;Cost optimization&lt;/li&gt;
&lt;li&gt;Privacy for simple queries&lt;/li&gt;
&lt;li&gt;Custom model integration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lynkr gives you all of that while keeping the Claude Code CLI experience you already love.&lt;/p&gt;

&lt;p&gt;Try it for a week. Track your costs. I bet you'll see similar savings.&lt;/p&gt;

&lt;p&gt;And if you don't? Well, it's open source. Make it better and send a PR. 😉&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Questions? Comments? Roasts?&lt;/strong&gt; Drop them below. I'll answer everything except "why did you waste a weekend on this" (because I saved $225 already).&lt;/p&gt;

&lt;p&gt;⭐ Star the repo if you found this useful: &lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;https://github.com/Fast-Editor/Lynkr&lt;/a&gt;&lt;/p&gt;




</description>
      <category>ai</category>
      <category>programming</category>
      <category>opensource</category>
      <category>openai</category>
    </item>
  </channel>
</rss>
