Cognivern - Spend OS For Agent Teams

#ai #githubchallenge #githubcopilot

GitHub “Finish-Up-A-Thon” Challenge Submission

This is a submission for the GitHub Finish-Up-A-Thon Challenge

What I Built

Cognivern is a control plane for agent operations — a SpendOS for agent teams.

As AI agents proliferate across development workflows, a quiet crisis is brewing: no one really controls what agents spend, on what, on behalf of whom, or why. Every agent gets what amounts to a blank check — against model APIs, against wallets, against third-party services. Cognivern exists to fix that.

The platform unifies governed wallet spend and AI spend governance across IDE, CLI, and agent workflows into a single auditable control layer. The core promise is simple: move fast without blank checks. Every spend decision can be policy-checked, privacy-preserving, efficiency-aware, and audit-ready — before it executes.

This matters especially in emerging markets and for teams building on-chain infrastructure, where cost overruns from runaway agents aren't just annoying — they're existential. You don't burn budget you don't have chasing a misconfigured prompt loop.

At its core, Cognivern provides:

Policy evaluation — enforce who/what/when rules before any spend executes
Privacy-native operations — evaluate sensitive policy context via confidential paths using Fhenix FHE (Fully Homomorphic Encryption)
AI spend governance — model/runtime usage visibility and optimization alongside financial controls
Audit trails — persist decision evidence (decisionId, attestation, run context) for continuous accountability
Multi-provider AI routing — ChainGPT as the primary Web3-native LLM, with Fireworks, OpenAI, Gemini, Anthropic, and others as fallbacks The stack is TypeScript + Solidity, deployed across X Layer Testnet (execution and policy), Filecoin Calibration (audit storage), and Fhenix (confidential policy state). The frontend lives at cognivern.vercel.app and includes a PromptOS terminal for natural-language governance interaction.

Demo

🔗 Live app: cognivern.vercel.app

🔗 API: cognivern.thisyearnofear.com

🔗 PromptOS Terminal: cognivern.vercel.app/os

🔗 Source: github.com/thisyearnofear/cognivern

Key flows you can explore:

Submit a spend request through the dashboard and watch policy evaluation fire in real time
Use the PromptOS terminal to interact with governance rules in natural language
Inspect the audit log — every decision has a decisionId and attestation

- Try the encrypted spend path (Fhenix), where policy is evaluated over encrypted inputs — the server never sees the raw values

The Comeback Story

Cognivern started as a hackathon project with a clear thesis but rough edges everywhere. The core governance loop worked, but it was held together with duct tape: no proper workspace isolation, no rate limiting, brittle contract interactions, and a frontend that was functional but not something you'd confidently hand to an operator.

Here's what changed during the finish-up:

Infrastructure hardening — Added per-workspace and per-API-key rate limiters with sliding windows, deep health checks, and circuit-breaker patterns. Moved to TypeScript strict mode throughout. Built out a unified CI pipeline.

172 tests — Unit, integration, and E2E via Playwright. The project went from "it works on my machine" to something with real coverage guarantees.

Multi-workspace and policy versioning — Each workspace now has independent API keys, rate limits, and a full policy version history. This was the feature that turned a demo into something a real team could adopt.

Fhenix Wave 5–7 — The FHE integration went from a proof-of-concept to a full institutional demo: encrypted policies, MEV-protected execution, selective auditor disclosure, two-phase FHE resolution with resolveDecision, sealed-bid vendor selection, and a Privara confidential payroll flow. Also migrated from Helium testnet to Arbitrum Sepolia.

ChainGPT integration — Brought in ChainGPT as the primary AI provider for Web3-native governance queries, with the Smart Contract Auditor running as runtime pre-spend defense. This felt like the missing piece — governance AI that actually understands on-chain context.

Operator UX — PromptOS terminal integrated into the sidebar, voice input via ElevenLabs STT, self-service onboarding flow, animated workspace mode toggles, full mobile responsiveness.

The project went from ~60% production-ready to ~93%. The remaining 7% is mostly production key management and a few contract audit items before mainnet.

My Experience with GitHub Copilot

Cognivern is a project with a lot of moving parts — Solidity contracts, TypeScript APIs, multi-chain deployment scripts, FHE integration, and a React frontend — often all in motion at the same time. Copilot was the connective tissue that kept things moving without constant context-switching tax.

A few specific ways it earned its keep:

Boilerplate elimination for the governance endpoints. The API has 12+ endpoints with consistent patterns — request validation, policy lookup, decision logging, response shaping. Writing the first one from scratch was fine; Copilot handled the rest, often getting the full shape right on the first suggestion.

Solidity contract work. The ConfidentialSpendPolicy contract for Fhenix was genuinely novel — FHE operations aren't something most developers have pattern-matched on. Copilot's suggestions weren't always right, but they were useful scaffolding that surfaced the right questions. The back-and-forth of accepting, rejecting, and editing suggestions was faster than writing from scratch.

Test generation. Getting to 172 tests would have taken much longer without Copilot helping generate test cases from the function signatures and existing test patterns. It's particularly good at the "write 10 edge case tests for this validator" kind of ask.

README and documentation. The architecture docs, developer guide, and deployment docs are detailed. Copilot helped maintain consistent voice and structure across them, and was surprisingly good at inferring the right level of technical detail for each audience.

The honest take: Copilot didn't make hard architectural decisions easier. The FHE integration design, the multi-chain deployment strategy, the policy versioning data model — those required real thinking. But it absorbed a huge amount of the mechanical work and kept me in flow during the push to get this finished.

Find me on Farcaster and Lens — always building at the intersection of AI, emerging markets, and on-chain infrastructure.