DEV Community: Prateek Chaudhary

Kakveda v1.0.3 – One-Line AI Agent Governance Integration

Prateek Chaudhary — Tue, 17 Feb 2026 14:01:17 +0000

Just released Kakveda v1.0.3.

Big change: SDK-first integration.

Before:

Manual warn calls
Manual event publishing
Separate registration scripts

Now:
from kakveda_sdk import KakvedaAgent

agent = KakvedaAgent()

agent.execute(
prompt="export data",
tool_name="data_exporter",
execute_fn=my_func
)

That’s it.

SDK handles:

Preflight policy checks
Trace ingestion
Dashboard registration
Heartbeat
Retry & circuit breaker
Strict mode enforcement Removed legacy helpers to simplify developer experience.

If you're building AI agents in production, especially multi-agent systems, feedback is welcome.

Building Failure Intelligence for AI Agents

Prateek Chaudhary — Mon, 16 Feb 2026 16:04:11 +0000

When you run AI agents in production, you quickly realize:

The dangerous failures aren’t random.
They’re recurring patterns.

Examples:

Similar hallucination structures
Repeated tool-call mistakes
Prompt injection variants
Context leakage patterns

Most tools give you logs.
Some give you tracing.
Few give you structured failure memory.

I’ve been exploring a model where:

Every failure becomes a canonical entity
A deterministic fingerprint is generated for executions
New executions are matched against historical failures
A policy engine maps confidence → allow / warn / block

The key idea:

Don’t modify the LLM.
Don’t rely only on prompts.
Insert a deterministic governance layer before execution.

This turns failure history into enforcement intelligence.

Still early, but curious:
LINK : https://github.com/prateekdevisingh/kakveda

How are others handling repeat failure patterns in agent-based systems?

opensource #llm #agents #devops #aigovernance

Operating AI in Production Is an Ops Problem

Prateek Chaudhary — Sat, 07 Feb 2026 11:32:34 +0000

Over the last year, I’ve been working with LLMs and AI systems that actually run in production — not demos, not notebooks, not proof-of-concepts.

What surprised me most wasn’t model behavior.
It was how quickly operational assumptions broke.

From an ops and platform perspective, AI systems don’t fail like models.
They fail like systems.

What breaks first in real environments

When AI systems move into production, the early issues are rarely about accuracy.

Instead, teams struggle with:

unclear decision boundaries

non-reproducible behavior

missing audit trails

no safe rollback paths

uncomfortable “why did this happen?” questions

Most existing tooling focuses on observing outputs.
Very little focuses on governing behavior.

Observability helps, but it’s reactive

We already know how to observe software:

logs

metrics

traces

alerts

AI observability tools extend this to:

drift

cost

latency

token usage

All useful — but mostly after the fact.

In production systems, knowing what happened is not enough.
You also need to know:

whether it should have happened

whether it should happen again

whether it should be allowed at all

The core mismatch

LLMs reason probabilistically.
Production systems expect determinism.

Trying to force AI to behave like traditional software doesn’t work.
But letting AI directly execute decisions inside deterministic systems also doesn’t work.

So we started experimenting with a different boundary:

AI can reason.
Deterministic systems decide.
Execution must remain controlled.

Separating reasoning from execution

Once you separate these concerns, a lot of things become clearer:

AI suggestions can be evaluated before execution

policies can block or correct unsafe actions

failures become structured signals, not surprises

accountability boundaries become explicit

This is a familiar pattern in ops — just applied to intelligence.

Why I started working on Kakveda

This line of thinking led me to start working on Kakveda, an open-source project focused on intelligence monitoring, observability, and deterministic control for AI systems.

The goal isn’t to replace models or agents.
It’s to supervise them.

Kakveda sits around AI systems and focuses on:

observing how AI behaves over time

enforcing rules before actions execute

capturing failures as first-class events

keeping execution predictable

In short: making AI systems operable.

What Kakveda is not

To be clear, Kakveda is not:

a prompt framework

an agent toolkit

an LLM wrapper

a chatbot platform

It doesn’t try to make AI smarter.
It tries to make AI safer to run.

Why open source

Governance and control layers should not be opaque.

If AI already introduces uncertainty, the systems supervising it should be:

inspectable

auditable

adaptable

Open source allows this to evolve based on real failures, not theoretical design.

Kakveda is early-stage and opinionated — and that’s intentional.

The bigger takeaway

As AI adoption grows, the most important question won’t be:

“How powerful is this model?”

It will be:

“Do we understand and control what this system is allowed to do?”

That’s an ops question.
And ops questions deserve first-class systems.

If you’re operating AI systems in production — especially from a DevOps, SRE, or platform perspective — I’d love to hear what’s breaking for you.