jindy zhao

Posted on May 22 • Originally published at kv-shepherd.io

Why KubeVirt Needs a Governance Layer — and How We Built One

#kubevirt #kubernetes #opensource #devops

KubeVirt puts VMs on Kubernetes. But it leaves a different set of questions
open: who can request a VM, who approves it, where are the quotas enforced,
and where is the audit trail?

KubeVirt Shepherd is an open-source
governance platform for KubeVirt, designed from the start around approval
workflows, RBAC, and audit logging as foundational architecture. The project
grew out of internal use in a financial-services Kubernetes environment and is
Apache 2.0 licensed.

Website: kv-shepherd.io
Live demo: demo.kv-shepherd.io
GitHub: kv-shepherd/shepherd
Discord: Join

The Problem

In most KubeVirt environments, the VM lifecycle looks like this:

Developer wants a VM → kubectl apply → done.

This works for dev clusters. In production, harder questions surface:

Who approved this VM? KubeVirt has no built-in approval flow.
Which team owns it? No resource-to-team mapping.
A VM has been idle for three months — who cleans it up? No lifecycle governance.
Security incident — where are the operation records? No platform-level audit trail.
Multiple clusters — how do you enforce policy consistently? No unified governance plane.

The existing options each come with trade-offs:

Approach	Trade-off
OpenShift Virtualization	Full-featured, but tightly coupled to the OpenShift ecosystem
Raw kubectl + K8s RBAC	Too low-level — no approval flow, no self-service UI
Build your own portal	Ongoing development and maintenance cost

What Shepherd Does

Capability	Description
Approval workflows	Every VM operation — create, modify, start, stop, delete — goes through a structured request → approve → deliver flow
Dual-layer RBAC	Platform-level roles plus System → Service → VM membership inheritance, with environment scoping
Audit trail	Every resource change records the actor, timestamp, and payload
Multi-cluster	Manage VMs across multiple Kubernetes/KubeVirt clusters from one control plane
Console access	Browser-based VNC and serial console with approval-aware entrypoints
i18n	Chinese and English UI included
Auth provider plugins	SDK for LDAP, OIDC, and custom identity source integrations

Architecture Choices

Web UI (React 19 · Next.js 16)
  ↓ REST / WebSocket
Go Backend (Gin · Ent ORM · River Queue)
  ↓
PostgreSQL 18 (single data store)
  ↓
Kubernetes / KubeVirt Clusters (client-go · multi-cluster)

A few decisions that shaped the project:

PostgreSQL-only runtime

Shepherd deliberately avoids Redis and external message queues. PostgreSQL
handles business state, audit data, encrypted credentials, and background jobs
(via River).

The practical benefit: async tasks and business data commit in the same
database transaction — either both succeed or both roll back. This avoids the
partial-failure scenarios that are common when a separate message queue is
involved (e.g., a quota is deducted but the VM never gets created).

The operational benefit: one database to back up, monitor, and scale. Deployment
complexity drops significantly compared to a stack with Redis + RabbitMQ +
PostgreSQL.

Contract-first API

The OpenAPI spec is the single source of truth for both the Go backend and the
TypeScript frontend. Server types and client types are generated from the spec.
A CI gate blocks merges if the generated code drifts from the spec.

Architecture Decision Records

The project maintains 53 ADRs that document key decisions — ORM selection,
async model, transaction strategy, concurrency patterns, and more. ADRs are
immutable once accepted; changes require a new ADR that supersedes the old one.
CI gates enforce compliance with active ADRs.

This matters because governance decisions tend to erode over time as a codebase
grows. Making them explicit and enforceable keeps the architecture consistent
as the project evolves.

How Shepherd Compares to OpenShift Virtualization

Shepherd and OpenShift Virtualization operate at different levels:

Dimension	OpenShift Virtualization	Shepherd
Scope	Full enterprise virtualization platform	Governance layer for KubeVirt VMs
Multi-cluster	Requires RHACM	Built in
Approval workflows	Available	Core architecture
Self-service model	Operator-driven	Request → approve → deliver
Vendor dependency	OpenShift ecosystem	Any Kubernetes distribution
License	Commercial	Apache 2.0

If your team is already on OpenShift and satisfied with its VM governance,
that stack likely covers your needs. If you run vanilla KubeVirt and want a
governance layer without platform lock-in, Shepherd may be worth a look.

Try It

Online demo (no setup)

Open demo.kv-shepherd.io in your browser. The
instance is pre-seeded with sample data. You can walk through the full flow:
log in, browse VMs, submit a request, approve it, and check the audit log.

Self-hosted (Docker Compose)

One command to deploy on a VPS or local machine:

mkdir -p shepherd-deploy && cd shepherd-deploy
curl -fsSL https://raw.githubusercontent.com/kv-shepherd/shepherd/main/deploy/prod/deploy-prod.sh | \
  bash -s -- --release-images --with-seed

Helm charts are also available for Kubernetes-native installs:

helm repo add shepherd https://kv-shepherd.github.io/helm-charts
helm repo update
helm upgrade --install shepherd shepherd/shepherd \
  --namespace shepherd --create-namespace

See docs/DEPLOYMENT.md
for external PostgreSQL, domain/TLS configuration, and the security checklist.

Current Status

Shepherd is in Alpha. The core governance paths — approval workflows, RBAC,
audit trails, VM lifecycle management — have been validated through internal
production use. The Alpha label reflects deliberate caution while broader
external feedback is gathered.

What is planned next:

Finish live E2E validation across all major paths
Harden deployment documentation and upgrade guidance
Keep the PostgreSQL-only runtime baseline through V1

Features tracked as RFCs for future versions include VM snapshots, clone
workflows, external approval system adapters, and event archiving. See
ROADMAP.md.

Get Involved

Shepherd is a solo-maintained project at this stage. All forms of participation
are welcome — code, bug reports, documentation, or simply sharing your
experience:

Try the demo and share your impressions
Report bugs or request features via GitHub Issues
Join the conversation on Discord
Star the repo if you find it useful — it helps with visibility

The project is Apache 2.0 licensed. Contributions follow the
DCO sign-off model.

Links:

Website: https://www.kv-shepherd.io
Live demo: https://demo.kv-shepherd.io
GitHub: https://github.com/kv-shepherd/shepherd
Discord: https://discord.gg/9P2wtpPMUe

DEV Community