DEV Community

greymoth
greymoth

Posted on

Why Kairon runs a separate gRPC authorization service

When you're building a multi-tenant platform where users run autonomous trading agents, "just check a middleware flag" isn't a safety model. It's a hope.

This is how we ended up with Guardian -- a standalone Node.js gRPC server on :50052 that every agent execution gates through before a single order can fire.

The problem with inline auth checks

Our initial instinct was the usual: tRPC middleware, a capability check on the procedure, done. It works fine for UI-driven actions where a bad outcome is a 403 and a sad user. It does not work when the "action" is an autonomous agent executing a trading strategy with real capital.

The failure modes are different. A misconfigured middleware might pass a stale session. A quota check might race against a concurrent execution. An unhandled exception might default-allow instead of default-deny. In a UI context those are bugs. In an agent runtime they're incidents.

We needed authorization to be:

  • Explicit -- every execution path calls it, no exceptions
  • Fail-closed -- if the auth service is unreachable, the run is rejected
  • Auditable -- every decision is a record, not a log line

What Guardian does

Guardian exposes a proto3 service with three RPCs:

service GuardianService {
  rpc CheckCapability(CapabilityRequest) returns (CapabilityResponse);
  rpc CheckQuota(QuotaRequest) returns (QuotaResponse);
  rpc AuthorizeAgentRun(AgentRunRequest) returns (AgentRunResponse);
}
Enter fullscreen mode Exit fullscreen mode

AuthorizeAgentRun is the gate. It calls CheckCapability, then CheckQuota, then writes an execution record. If any step fails or Guardian is unreachable, the run is rejected with reason guardian_unavailable. No silent pass-through.

Why a separate process

Two reasons: practical and principled.

Practical: Guardian enforces hard rate limits at the infrastructure level, isolated from API server memory pressure.

Principled: a separate service audits independently. Our kairon_org_audit_log table has exactly one writer with one responsibility.

The tradeoff

Every agent execution has a gRPC round-trip. That latency is deliberate. Trading agent authorization isn't latency-sensitive -- if your strategy breaks because auth took 2ms, the strategy has bigger problems.

What we gained is a single place where "should this agent run?" is answered and recorded, with an immutable sequence of authorization decisions to replay when something goes wrong.

Building this at kairon.trade. Source: github.com/greymoth-jp.

Top comments (0)