Samson Tanimawo

Posted on Apr 17

Platform Engineering: Building an Internal Developer Platform That Teams Actually Use

#platformengineering #devops #sre #devex

The "Build It and They Won't Come" Problem

Our platform team spent 6 months building an internal developer platform. Beautiful service catalog, automated provisioning, self-service databases. Nobody used it.

Here's what we learned.

Why Platforms Fail

Most internal platforms fail for the same reason: they're built top-down instead of bottom-up.

Top-down: "We decided every team should use this standardized deployment pipeline."
Bottom-up: "We noticed 8 teams solving the same problem differently, so we built a shared solution."

The Paved Road Approach

Instead of mandating tools, offer a paved road. Make the right thing the easy thing.

Paved road (easy):        Off-road (hard but allowed):
─────────────────         ──────────────────────────
Standard CI/CD template   Custom pipeline
Managed Postgres          Self-managed DB
Shared observability       Own monitoring stack
Pre-configured K8s         Custom infrastructure

The key: off-road is allowed but unsupported. You break it, you own it.

What Our IDP Looks Like

# service.yaml — the only file developers need to create
apiVersion: platform/v1
kind: Service
metadata:
  name: checkout-api
  team: payments
  tier: critical
spec:
  language: python
  framework: fastapi

  dependencies:
    - postgres:14
    - redis:7
    - rabbitmq:3

  scaling:
    min: 3
    max: 20
    metric: cpu
    target: 70

  environments:
    staging:
      replicas: 1
    production:
      replicas: 3
      multi_az: true

From this single file, the platform provisions:

Git repo with CI/CD pipeline
Kubernetes namespace and RBAC
Database and connection secrets
Monitoring dashboards (golden signals)
Alerting rules
Log aggregation
Service mesh entry

The Developer Experience Metrics

We track these to know if the platform is working:

Time from idea to production deploy:    Before: 2 weeks  After: 4 hours
Time to provision a new environment:    Before: 3 days   After: 12 minutes
Deploy frequency:                       Before: weekly    After: 5x/day
Change failure rate:                    Before: 18%       After: 4%
Developer satisfaction (quarterly NPS): Before: -10       After: +52

The Self-Service Portal

Our portal has exactly four actions:

Create Service — Generates everything from service.yaml
View My Services — Dashboard of health, deploys, costs
Request Resource — Database, queue, cache (auto-provisioned)
Get Help — Links to docs + Slack channel

That's it. Four buttons. If you need more than four buttons, your platform is too complex.

Adoption Strategy

We didn't mandate adoption. We seduced teams into it:

Week 1-4: Pilot with the friendliest team. Fix everything.
Week 5-8: Add two more teams. Fix more things.
Week 9-12: Success stories in engineering all-hands.
Week 13+: Other teams start asking to join.

By month 6, 80% of teams had migrated voluntarily. The remaining 20% had legitimate edge cases we accommodated.

What Not to Build

Don't build a service mesh if you have < 20 services
Don't build a custom scheduler if standard K8s works
Don't build a custom secret manager — use Vault or cloud-native
Don't build a custom CI system — use GitHub Actions/GitLab CI

Build the glue, not the tools.

If you want a platform that includes AI-powered operations from day one, check out what we're building at Nova AI Ops.

Written by Dr. Samson Tanimawo
BSc · MSc · MBA · PhD
Founder & CEO, Nova AI Ops. https://novaaiops.com

DEV Community