razashariff

Posted on May 26

Zero-cost, Zero Trust AI: secure agents on local Qwen with MCPS

#ai #llm #privacy #security

Run a AI agents on free, local Qwen, keep every byte on your own hardware, and prove cryptographically what it did. Signer and verifier included. For AI builders and architects.

By the end of this you will have an AI agent that costs nothing per token, never sends a byte off your own hardware, and can prove -- cryptographically -- exactly what it did and that no one tampered with it. Signer and verifier, both included. About fifteen lines of code.

As a 30 Year Security Leadership and Breach prevention we reogised thee issue of adoption with Local hosted models and the security concerns thats validated lack of adoption.

We have changed this mindset with our stack - Meet MCPS and Local LLMs on your hosting.

That is the whole promise. Let me earn it.

The question your architects keep asking

Where does your prompt actually go?

With a hosted model API, the honest answer is: across your trust boundary, on every single call. Your prompts, your customers' data, your internal context -- all of it leaves the building and lands inside someone else's tenancy. For a regulated team, that one sentence is the difference between a green light and a twelve-week security review.

And the cost is no longer hypothetical. In May 2026 Microsoft began cancelling its internal Claude Code licenses, moving staff to Copilot CLI by June 30. The reported reason was not quality -- engineers liked the tool. It was that token-based billing burned through the annual AI budget in months; flat seat pricing had hidden the true per-token spend (Windows Central).

If Microsoft cannot predict its metered AI bill, neither can you.

There is another way to build, and it has gotten very good.

Free brain, signed hands

Here is the one idea this whole article turns on: free brain, signed hands.

The LLM AI brain is a free, open model -- Qwen -- running locally. It costs nothing per token and it runs on your machine or sever. The hands are the tool calls the agent makes, and every one of them is cryptographically signed, identity-bound, and replay-proof, with a verifier you run on your side of the wire.

The half that local models do not give you -

Running a model locally solves privacy. It does not solve integrity or identity.

A local agent that calls tools is still, by default, anonymous and unsigned. Nothing proves which agent made a call. Nothing stops a captured request being replayed. Nothing detects a tampered argument before it hits your database. You have moved the brain in-house and left the hands bare.

This is the fence I want to build around the approach, because it is exactly where most "run it locally" guides stop. Local privacy without per-call integrity is half a security model.

So we built the other half. MCPS is the security layer we wrote for the Model Context Protocol -- think of it as the secure version of MCP. It signs every tool call with a P-256 key, binds it to a verifiable agent identity (AgentPass), and rejects anything unsigned, tampered, or replayed. The design is published as an IETF Internet-Draft, draft-sharif-mcps-secure-mcp.

Currenty integrated in US based FinTech organisation's with live production.

The MCP ecosystem is enormous -- the official SDKs have been downloaded hundreds of millions of times -- and almost none of that traffic is signed. That is the gap.

We checked that the data really stays local.

Claims about "your data never leaves" should be demonstrated, not asserted. So before writing a word of this, We watched what the model actually talks to.

While Qwen generated a few thousand characters of output, We sampled every network connection the Ollama process held:

# run this during a real inference
lsof -nP -iTCP -a -c ollama | grep ESTABLISHED

Every endpoint was 127.0.0.1 -- loopback. The client, and the model's own internal runner, talking to themselves. Ollama was bound to 127.0.0.1 only: not exposed to the LAN, let alone the internet. Zero external connections. The prompt never left the machine.

You do not have to trust our screenshot. Here is the acid test, and it takes ten seconds:

Turn off Wi-Fi. Run the same prompt. It still answers.

If it works with no network, it provably needs none. That is a sentence you can put in front of an auditor.

It maps to the standards your reviewers cite

This is not a hobby setup. The architecture lines up with the guidance security teams are already being measured against:

Concern	Where it is covered
MCP tool-call integrity, identity, replay	OWASP MCP Security Cheat Sheet
AI agent verification controls (C10)	OWASP AISVS
MCP security design considerations	NSA MCP guidance, May 2026
Data residency / sovereignty	model + tools run on-premise or in your own cloud; no third-party processor

The NSA put MCP security design in writing in May 2026. Signing tool calls is no longer a nice-to-have you have to justify -- it is the direction the guidance is already pointing.

Build it in three steps

All free. All local. Signer and verifier both yours.

1. Run a free model locally. Qwen via Ollama, OpenAI-compatible, fully offline.

ollama pull qwen3:14b
ollama serve

2. The agent signs. The SDK gives the agent an AgentPass identity and MCPS-signs every tool call. It runs on stock Qwen-Agent -- no fork, just a runtime hook.

from secure_qwen import SecureQwenAgent

agent = SecureQwenAgent(
    model="qwen3:14b",
    mcp_servers={"tools": {"command": "python", "args": ["server.py"]}},
)
for msg in agent.run("add 17 and 25 with secure_add"):
    print(msg)

3. The verifier enforces. One line wraps your MCP server. Unsigned, tampered, or replayed calls are rejected at the gate, before they reach your tools or data.

from mcp_secure import secure_mcp

secure_mcp(server)   # signature + identity + replay checked here. fail-closed.

Want DeepSeek instead of Qwen? Same code, swap one line: model="deepseek-r1:14b". The security layer is model-agnostic on purpose -- it does not care which free brain you bolt the signed hands onto.

Verify what you downloaded

Supply-chain integrity cuts both ways: a security tool you cannot verify is just another dependency to worry about. Every release ships a signed hash manifest, so you can check it before you run a line of it:

# integrity: do the files match the manifest?
shasum -a 256 -c SHA256SUMS

# authenticity: was the manifest signed by our release key?
openssl dgst -sha256 -verify release-pubkey.pem -signature SHA256SUMS.sig SHA256SUMS

P-256 ECDSA, the same primitive MCPS uses on the wire. If either check fails, do not run it.

What it actually costs

Nothing, per token. You pay for hardware and electricity once, and then a million calls cost the same as one. There is no meter, no surprise invoice at the end of the quarter, and no budget that quietly evaporates because a few agents got chatty. That is the lesson buried in the Microsoft story: the problem was never the model, it was the metering.

Local inference turns a variable, unpredictable operating cost into a fixed, owned capability.

Build on Qwen. Build secure. Build to comply.

That is the contribution I want to leave you with. A free model gives you economics and privacy. MCPS and AgentPass give you the integrity and identity that local models leave bare. Together they are a stack you can run on your own hardware, prove to an auditor, and never hand to a third party.

Signer and verifier, both yours. Free brain, signed hands.

Read the architecture and the standards mapping: agentpass.co.uk/qwen-builders
The protocol: MCPS Internet-Draft
The identity layer: AgentPass

Want to build now? The SDK, the verifier, and the signed manifest are ready. Contact us at contact@agentsign.dev and we will get you running on secure local Qwen today.

The SDK is licensed BUSL-1.1: free to run, self-host, and modify; not for resale. It converts to Apache 2.0 in 2030.

DEV Community