Shouvik Palit

Posted on Jun 21

Ship Happens: How I Made a 3B Local Model Trustworthy for Kubernetes (By Never Trusting It)

#kubernetes #ai #devops #go

Small local models are surprisingly capable. They're also wrong in
small, easy-to-miss ways: a deprecated API version here, a
placeholder value that looks real until you actually deploy it
there. The usual fix is better prompting. I tried something
different: stop trying to make the model right, and just make sure
nothing wrong ever ships.

The idea

Ship Happens lets you describe a Kubernetes deployment in plain
English. A local model (qwen2.5:3b, via Ollama) drafts the
manifest. Before anything counts as "done," that manifest gets
tested against a real local cluster (kind) using a server-side
dry-run:

kubectl apply --dry-run=server -f manifest.yaml

If the cluster rejects it, the actual error message, not a
vague "try again", gets fed straight back into the model:

func RegenerateFromError(instruction, imageName, clusterError string) (string, error) {
    correction := fmt.Sprintf(
        "A previous attempt was tested against a real Kubernetes "+
            "cluster and REJECTED with this exact error:\n\n%s\n\n"+
            "Fix the YAML to resolve this specific error.",
        clusterError)
    // ...
}

Then it gets checked again. The model never gets to just "try
again" blindly. Every retry is grounded in what the cluster
actually said was wrong.

What this caught, that I didn't expect

1. Missing `---` silently merges resources

Ask for a Deployment + Service and the model sometimes forgets the
YAML document separator between them. The result isn't a parse
error. It's worse. Without ---, you get one YAML document with
duplicate keys, and most parsers (including the one kubectl uses)
just keep the last value for each duplicate key. The Deployment's
spec silently gets overwritten by the Service's spec. Half your
config vanishes with zero warning.

// Models often show a wrong example before the right one, or just
// forget the separator. Either way, this normalizes it before
// validation ever runs:
func fixMissingDocumentSeparators(yamlContent string) string {
    // detects a second "apiVersion:" line without a preceding ---
    // and inserts one automatically
}

2. Placeholder images that pass every check

The model would write image: your-app-image:v1, syntactically
perfect YAML. Dry-run validates schema, not whether the string
happens to be a real, pullable image. This passes validation
cleanly and only fails 20 minutes later, at actual pod scheduling,
as ImagePullBackOff.

The fix wasn't a smarter prompt (tried that, model just produced a
different placeholder). It was deterministic substitution:

var knownImages = map[string]string{
    "nginx": "nginx:latest", "redis": "redis:7", "postgres": "postgres:16",
    // ...
}

func fixKnownPlaceholderImage(yamlContent string) (string, bool) {
    // finds a known technology keyword inside the placeholder text
    // itself ("your-nginx-image" contains "nginx") and substitutes
    // the real image, deterministically. No second model call,
    // no risk of generating a different placeholder
}

3. The repair that "fixed" the version by deleting it

This one was my favorite. A manifest used the deprecated
policy/v1beta1 for a PodDisruptionBudget. Dry-run rejected it
correctly. The repair loop kicked in, read the error, and
"corrected" it to... v1. Not policy/v1. Just v1, dropping the
API group entirely.

Technically a different string. Still completely wrong, since
PodDisruptionBudget doesn't exist under the bare v1 group at
all. The model recognized something needed to change, defaulted to
the most common generic apiVersion it knew, and got it wrong in a
new way.

The fix was explicit, not clever:

prompt := `IMPORTANT: If the error is about a deprecated or invalid
apiVersion, the API GROUP prefix (the part before the slash)
usually stays the same. Only the version number changes. For
example, "policy/v1beta1" should become "policy/v1", NOT just "v1".
Dropping the group prefix entirely produces a different, equally
invalid value.`

Re-validated after that, and it passed correctly.

"But couldn't jsonschema do this?"

Came up when I posted this, and it's a fair question. Mostly: no,
not for the bugs above.

A static schema is pinned to whatever Kubernetes version it was
bundled for. The policy/v1beta1 bug is a real, structurally valid
shape. It's just been removed from newer clusters. A schema
package might still call that "valid" depending on how current it
is. Dry-run against the real cluster has no such lag: it's checking
against exactly what this cluster, right now, actually accepts.

It also covers CRDs for free. If your cluster has cert-manager,
Prometheus operators, anything custom: there's no generic schema
for that, but dry-run validates against whatever's actually
installed with zero extra configuration. And it goes through the
real admission chain (RBAC, any webhooks), which schema validation
never sees, since it's only checking shape in isolation.

jsonschema is faster and works offline, so it's a reasonable
first-pass filter. It's just not sufficient on its own for what I
was trying to catch.

What it doesn't catch

Worth being honest about the ceiling here. Dry-run proves syntax,
not architecture. It will not catch:

A ReadWriteOnce PersistentVolumeClaim shared across 3 replicas (perfectly valid YAML, breaks at pod scheduling)
A model picking a real-but-wrong image for an ambiguous request (asked for "prometheus," once got handed Alertmanager's image, also real, also wrong)
Container-internal startup requirements, like a Bitnami image refusing to start without ALLOW_EMPTY_PASSWORD=yes set. That's application logic baked into the image, completely invisible to the Kubernetes API

That last one is now caught, just not by dry-run. It's caught by a
Dashboard that polls live pod health, and on a crash, pulls the
actual logs and asks the local model to summarize the root cause
in plain English instead of a raw log dump:

🔴 redis-high-availability
0/3 replicas ready
Missing REDIS_PASSWORD leads to startup failure.

That line is genuinely the model's own summary of a much longer,
uglier log output. Different use of the same local model, not
generating infrastructure, just explaining a failure after the
fact.

What's still unverified

I'd rather say this directly than let someone find out the hard
way: there's a third mode, "My Code," that's supposed to build your
own source into an image and deploy it directly into the cluster.
It's implemented. I have not run it end-to-end yet. The README says
exactly that, in those words, instead of pretending otherwise.

"Public Image" mode (describe it, AI drafts it) and "My YAML" mode
(paste your own, skip generation entirely) have both been tested
repeatedly and are confirmed working. The build-from-source path
hasn't earned that yet.

The actual point

This isn't really about Kubernetes, and it isn't really about this
one model. The pattern is just: don't try to make an unreliable
generator more reliable through better instructions alone. Give it
a real, external thing to be checked against, feed the real
failure back in when it's wrong, and let that loop run until
something true comes out the other side.

A 3B model doesn't need to be smart for this to work. It just needs
something honest to check it against.

Code: github.com/shouvik12/ship-happens

DEV Community

Ship Happens: How I Made a 3B Local Model Trustworthy for Kubernetes (By Never Trusting It)

The idea

What this caught, that I didn't expect

1. Missing `---` silently merges resources

2. Placeholder images that pass every check

3. The repair that "fixed" the version by deleting it

"But couldn't jsonschema do this?"

What it doesn't catch

What's still unverified

The actual point

Top comments (0)

The idea

What this caught, that I didn't expect

1. Missing --- silently merges resources

2. Placeholder images that pass every check

3. The repair that "fixed" the version by deleting it

"But couldn't jsonschema do this?"

What it doesn't catch

What's still unverified

The actual point

1. Missing `---` silently merges resources