Originally published on Podo Stack
The last few issues of this newsletter covered individual tools -- image pulling, autoscaling, eBPF networking. All useful on their own. But tools don't help much if your engineers can't find them, use them safely, or provision infrastructure without filing a ticket and waiting three days.
This week we zoom out to the platform layer. The boring stuff that makes everything else work.
The Pattern: Platform Engineering Guardrails
Here's something I see a lot. A team builds a shiny Internal Developer Platform. Self-service. Kubernetes. The works. Then they write a 50-page "Platform Usage Guide" and email it to all engineers.
Nobody reads it. Someone deploys a public S3 bucket. Chaos.
Documentation is not a guardrail. A guardrail is code.
Gates vs Guardrails
Think of a highway. The old model is a tollbooth -- you stop, show your papers, wait for approval. That's a Change Advisory Board. It works, but it kills velocity.
Guardrails are the barriers on the sides of the road. You drive at full speed. If you try to go off the edge, something stops you. No human in the loop.
In practice, this means automated policies that either warn or block -- but never require manual approval when rules are followed.
Three Layers of Defense
Good guardrails exist at every stage of the delivery pipeline:
Design time -- your IDE flags that you're using a banned instance type. Fix it before it even hits Git.
Deploy time -- OPA or Conftest checks your manifests in CI. No memory limits? Pipeline fails with a clear message. You don't find out in production at 2 AM.
Runtime -- Kyverno or Gatekeeper intercepts the API call. Pod running as root? Rejected. The cluster itself says no.
Each layer catches what the previous one missed. Defense in depth, but for platform safety.
Start Soft
One mistake I've made (and seen others repeat): going full enforcement on day one. Engineers feel like a robot is slapping their hands every time they push code. Morale drops. People start looking for workarounds.
Better approach: start with 80% of guardrails in Audit mode. Let people see the warnings, understand the rules, ask questions. Give them a couple of weeks. Then gradually flip to Enforce -- starting with the policies that matter most (security, cost).
You'll get buy-in instead of resentment.
Links:
The Unsexy Tool: Backstage Software Catalog
Nobody gets excited about a catalog. There's no demo that makes the crowd gasp. But here's what happens without one: engineers Slack each other "who owns the payment service?" and nobody knows where the API docs live. Someone built a wiki page six months ago. It's already outdated.
Backstage is a CNCF Incubating project, originally built at Spotify. It's been around since 2020. Not new, not flashy. But it solves the "where is everything?" problem better than anything else I've seen.
The catalog-info.yaml Trick
The key idea is catalog-info.yaml -- a small file that lives next to your code. Developers own it. Backstage auto-discovers it from your Git repos. Here's what it looks like:
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: payment-service
annotations:
github.com/project-slug: acme/payment-service
spec:
type: service
owner: team-alpha
lifecycle: production
providesApis:
- payments-api
That's it. Now Backstage knows this service exists, who owns it, what APIs it exposes, and what it depends on. No separate documentation to maintain. The catalog stays accurate because it lives with the code.
The Entity Model
Backstage organizes everything into entities: Components (services, libraries), APIs, Resources (databases, queues), Groups (teams), and Users. They connect to each other through ownership and dependency relationships.
A team owns a component. That component provides an API. It depends on a database resource. You can trace the full graph in the UI. When something breaks at 3 AM, you know exactly who to page.
Golden Paths via the Scaffolder
Here's where it gets really useful. Backstage's Scaffolder lets you define templates for new services. Need a new microservice? Click a button, fill out a form, and get a repo with CI/CD pipeline, Dockerfile, monitoring dashboards, and catalog-info.yaml -- all pre-configured. Three minutes instead of three days.
The platform team controls the templates, not individual developers. Want to enforce a new security standard? Update the template. Every new service created from that point forward gets it automatically.
That's a golden path. You're not blocking engineers from doing things their own way. You're just making the right way the easiest way.
Links:
The Showdown: Crossplane vs Terraform
Both manage your cloud infrastructure. Completely different philosophies.
Terraform: The Standard
You write HCL files. You run terraform plan. You review the diff. You run terraform apply. Done.
It's simple, well-understood, and has providers for everything. But it's a one-shot operation. Between applies, nothing watches your infrastructure. Someone deletes a resource manually? Terraform doesn't know until your next plan. That could be days. Or weeks.
Crossplane: The K8s-Native Approach
Crossplane runs inside your cluster. You define a custom resource -- say, PostgreSQLCluster -- and Crossplane's controllers continuously reconcile it against reality. Just like how Kubernetes reconciles Deployments.
apiVersion: platform.acme.com/v1alpha1
kind: PostgreSQLCluster
metadata:
name: orders-db
spec:
version: "15"
storageGB: 100
environment: production
The developer doesn't know (or care) whether this creates an RDS instance, a Cloud SQL database, or something else. The platform team defines that mapping in a Composition. Developers get a simple API. Platform engineers keep control.
And if someone manually deletes the RDS instance? Crossplane notices and recreates it. Automatically.
When to Choose What
Terraform -- you have a small team, simple infrastructure, or you're early in your platform journey. It's proven and everyone knows it. Don't overcomplicate things if you don't need to.
Crossplane -- you're building a self-service platform. You want developers to request infrastructure through Kubernetes APIs without filing tickets. You need continuous reconciliation, not just plan-apply.
They're not competitors at the same maturity level. They're tools for different stages of the platform engineering journey. Plenty of teams use both -- Terraform for the foundational stuff, Crossplane for the self-service layer on top.
Links:
The Policy: Require PodDisruptionBudget
Node drain. Three replicas. No PDB. All pods evicted at once. Service down.
I've seen this happen in production more times than I'd like to admit. It's one of those things that doesn't matter until it really, really matters.
This Kyverno policy prevents exactly that. If your Deployment has more than one replica, it must have a matching PodDisruptionBudget. Otherwise, the API server rejects it.
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-pdb
spec:
validationFailureAction: Enforce
background: false
rules:
- name: check-for-pdb
match:
any:
- resources:
kinds:
- Deployment
- StatefulSet
preconditions:
all:
- key: "{{ request.object.spec.replicas }}"
operator: GreaterThan
value: 1
context:
- name: matchingPDBs
apiCall:
urlPath: "/apis/policy/v1/namespaces/{{request.object.metadata.namespace}}/poddisruptionbudgets"
jmesPath: "items[].spec.selector.matchLabels"
validate:
message: >-
Deployment with {{ request.object.spec.replicas }} replicas
requires a matching PodDisruptionBudget.
deny:
conditions:
all:
- key: "{{ request.object.spec.template.metadata.labels }}"
operator: NotIn
value: "{{ matchingPDBs }}"
A few things to notice:
- The
preconditionsblock skips single-replica deployments. You don't need a PDB for a singleton -- there's nothing to disrupt gracefully. - The
apiCallcontext actually queries the cluster for existing PDBs in the namespace, then checks whether any of them match the deployment's labels. - This is a runtime guardrail -- exactly what the first section of this article describes. No documentation needed. The cluster enforces it.
If you're starting with Kyverno, set validationFailureAction: Audit first. Let it report violations for a week. Then flip to Enforce once you've helped teams add their PDBs.
Links:
The One-Liner: Check Kubernetes EOL
curl -s https://endoflife.date/api/kubernetes.json | jq '.[0]'
Platform teams have to track version support. endoflife.date aggregates EOL data for hundreds of products -- it's a free API, no authentication needed. This command shows the latest Kubernetes release with its support dates, end-of-life timeline, and whether it's still getting patches.
Useful for audits, upgrade planning, or just settling the "should we upgrade yet?" debate in Slack.
Bookmark the API. It covers everything -- Node.js, PostgreSQL, Ubuntu, Go, Python, you name it.
What does your platform layer look like? Are you using Backstage, Crossplane, or something else entirely? I'd love to hear what's working (and what isn't) -- drop a comment below.
If you found this useful, consider subscribing to Podo Stack - weekly curation of Cloud Native tools ripe for production.



Top comments (0)