The cost of onboarding a new engineer at a mid-sized cloud-infrastructure org never shows up on a finance dashboard. There's no line item for "hours spent searching for the right runbook" or "Slack threads asking how deployment works this quarter." The cost is real, it's measured in 14 to 22 hours per engineer per week for the first 8 weeks, and it compounds with every hire because the senior engineers who answer the questions lose 4 to 7 hours per week each at the same time.
Six months of IDP work changes the shape of this cost. Not eliminates it. Reshapes it. The 8-week onboarding becomes 4-week onboarding. The 14 hours per week of access requests, runbook hunts, and deployment questions drops to 30 minutes per week of looking things up in the IDP catalog. The senior engineers stop being the canonical source of truth for "how do we deploy this quarter" because the IDP is. The engineering org gets back about $400k per year of engineer time on a 100-engineer team for an investment of $250-350k over the first six months.
The piece walks through what an IDP actually changes, why deployment is the keystone golden path, why templates beat documentation, the 6-month investment shape, the math, and the single instrumentation that tells you whether the IDP is working.
The 14-hour onboarding tax nobody puts on a dashboard
A new engineer at a mid-sized cloud-infra org goes through a fairly predictable cost curve in the first 8 weeks.
| Week | Time spent on onboarding-friction work | Top sources |
|---|---|---|
| 1 | 16-22 hours/week | AWS access, GitHub access, VPN, k8s kubeconfig, on-call rotation join |
| 2-3 | 14-18 hours/week | First service deploy, learning CI conventions, finding the right Terraform repo |
| 4-5 | 10-14 hours/week | Observability setup (logs, metrics, traces, alerts), on-call shadowing |
| 6-7 | 6-10 hours/week | Cross-team integration patterns, edge cases in deploys, "where does this config live" |
| 8 | 3-6 hours/week | Settling in; mostly questions that surface only during real incidents |
Sum it: roughly 100 to 130 hours per engineer over 8 weeks lost to onboarding friction. At a fully loaded $80/hour, that's $8,000 to $10,400 per engineer in lost time, paid every time you hire.
The cost on the senior engineer side is rarely measured. Each new engineer fires roughly 5-15 "how do I" messages per week into engineering Slack channels. Senior engineers context-switch to answer, draft a half-page response, sometimes screen-share for 20 minutes. The aggregate is 4-7 hours per senior engineer per week per onboarding overlap. With 3-4 new engineers in the first month, the senior who happens to know the answers loses a half-day per week to answering them.
This cost compounds. Year-over-year, the team grows. Each new hire fires the same questions because the answers live in senior engineers' heads, in Slack history, in three different wikis with conflicting versions, in a Confluence page somebody updated 18 months ago. The org pays the same onboarding tax for every new hire, plus the senior engineer time, plus the slow drift as conventions change and old answers become wrong.
Nobody dashboards this because nobody wants to. The engineers paying the cost don't want to flag it (they want to look productive). The senior engineers paying the cost don't want to flag it (they want to look helpful). The org chart doesn't include "onboarding friction" as a category. It's a tax that gets paid in invisible time and shows up as "engineering velocity feels slow" without a clear line item.
What an IDP actually changes
A working IDP — Backstage, Port, Cortex, or a homegrown equivalent — collapses the same 8-week onboarding into 4 weeks and the 14 hours per week of friction work into 30 minutes per week of catalog lookups.
| Activity | Pre-IDP | Post-IDP |
|---|---|---|
| Get AWS account access | 2-3 days, 4 Slack threads | 30 min: self-service via IDP request flow with auto-approval rules |
| Find the deployment runbook | 1-2 days, 5 Slack threads | 5 min: deployment golden path is the IDP front page |
| Set up observability for new service | 4-6 hours, 2 senior engineers consulted | 20 min: template generates the right Datadog/Grafana hooks |
| Add on-call rotation membership | 1-2 days, 3 Slack threads, often blocked on PagerDuty admin | 15 min: self-service via IDP rotation manager |
| Get secret manager access for service X | 2-3 days, 2 Slack threads, requires security team approval | 30 min: IDP routes the request with the right context, security approves in batch |
| Create a new service from scratch | 1-2 weeks, learn CI/CD conventions ad hoc | 2 hours: template scaffolds repo + CI + observability + secrets |
The activities don't disappear. The engineer still needs AWS access, still needs to find the deployment runbook, still needs to set up observability. The time per activity collapses because the IDP makes the answer findable and the action self-service. What used to take a week of Slack-thread-driven discovery takes 30 minutes of catalog navigation.
The senior engineer side is what makes the math work. The Slack questions don't go to a senior; they go to the IDP. The IDP answers about 70 percent of them via templates, runbooks, and self-service flows. The remaining 30 percent — the genuinely novel questions — still go to senior engineers, but the rate drops by 60-80 percent. Senior engineers get their week back; new engineers stop blocking on senior availability.
The deployment golden path is the keystone
Six months of IDP work is enough budget to ship deployment + observability + on-call + secrets, in that order. The order matters more than the budget. Deployment first is the keystone; the rest only get traction once engineers trust the IDP to handle deployment correctly.
Why deployment first: an engineer can survive bad observability for a week (the service runs, you'll fix the dashboards later). They cannot ship code without a clear deployment path. If the deployment golden path lives in the IDP and works, the engineer's first IDP interaction is a positive one. They go to the IDP next time something else needs doing. The IDP earns trust through use.
Trying to ship observability + secrets + deployment in parallel fails because there's no foothold for engineer trust. The engineer hits the IDP, finds a half-finished observability template that doesn't quite work, gives up, asks Slack. The IDP becomes the place engineers tried once and found broken. That perception is hard to recover from; better to ship one path well than three paths half-done.
The order after deployment is less critical, but observability and on-call go together because they share a workflow (alerts wake an on-call engineer; they consult the dashboards). Secret management can land third because it's a higher-friction problem (security review is involved) and engineers will tolerate the existing process longer. Environment provisioning is usually month 7-9 if scoped at all; many IDPs never ship it because it requires deeper cloud-account integration than the rest.
Templates beat documentation
The deepest mechanism in an IDP isn't the catalog or the docs. It's the template that generates the right thing instead of describing how to make the right thing.
| Mechanism | Drift | Enforcement | Time to first success | Maintenance cost |
|---|---|---|---|---|
| Documentation | Drifts within weeks; nobody updates | None — engineer can ignore | 2-6 hours of reading + iterating | Low to write, high to keep accurate |
| Template (golden path) | Doesn't drift; the template IS the convention | Strong — produces the right output | 15-30 min from template run to working service | Higher to write, near-zero to keep accurate |
A 5,000-word doc on "how to create a service" describes the right repo structure, the right CI config, the right observability hooks. The next engineer reads the doc, applies it imperfectly, ships a service that's 80 percent compliant with the conventions. Six months later that service has subtle differences from the canonical pattern. The doc gets updated by someone, the existing service doesn't. Drift sets in immediately.
A template that runs in the IDP produces the same artifacts as the doc would describe. The repo is created with the right structure. The CI config is generated from the same source as the docs. The observability hooks are wired by the same template that wires them in every other service. The engineer's "create service" interaction is a form they fill in (service name, owner, language) and a button they click. Two minutes later the service exists, compliant by construction.
The templates also enforce things docs can't. A doc can say "always tag your resources with cost_center." A template adds the tag automatically. A doc can say "always emit the request_id in logs." A template wires the logger to do it by default. The conventions move from "things engineers should remember" to "things the template does for them." Compliance ratios go from the typical 30-60 percent for documented conventions to 95-99 percent for template-enforced ones.
The work to write a template is roughly 3-5x the work to write the equivalent doc. The maintenance cost is the inverse: docs need constant updating to stay accurate; templates only update when the underlying convention changes. Over a 2-year horizon, templates are cheaper than docs even before counting the engineer-time savings.
The 6-month investment shape
The typical IDP rollout for a 100-engineer org consumes roughly one platform engineer's full quarter for the deployment golden path, then a half-time engagement for the next quarter as the other paths land. Total team investment is roughly 0.75 to 1.0 engineer-quarters of platform time plus rotating part-time involvement from two service teams whose flows the IDP is encoding.
| Month | Deliverable | Owner | Dependency |
|---|---|---|---|
| 1 | IDP catalog up; service inventory imported; access flows wired | Platform engineer (full-time) | Backstage/Port instance + GitHub integration |
| 2 | Deployment golden path: template + runbook for new service | Platform engineer + 1 service team part-time | Catalog + CI/CD integration |
| 3 | Deployment golden path: rollout to 5 services as pilots | Platform engineer + 5 pilot teams | Working template from month 2 |
| 4 | Observability golden path: template for Datadog/Grafana hooks | Platform engineer + observability team | Deployment template establishes pattern |
| 5 | On-call golden path: PagerDuty + runbook integration | Platform engineer + SRE team | Observability template for alerts |
| 6 | Secrets golden path: routing through Vault/AWS Secrets Manager | Platform engineer + security team | Trust established from prior 5 months |
Months 7-9 add environment provisioning if scoped. Most orgs don't get to it in the first year because the cloud-account integration is the deepest piece of work and the prior paths produce most of the time savings.
The platform engineer in months 1-3 is mostly heads-down on the deployment path. Months 4-6 the engineer becomes more of a coordinator, working with the observability/SRE/security teams who own the underlying systems. The IDP is the integration layer; it doesn't replace the underlying tools.
The pilot pattern in month 3 is critical. Five services going through the deployment template surfaces every edge case the template missed. Fix the edge cases, then roll out broadly in month 4. Skipping the pilot and rolling broadly in month 3 means the broad rollout hits all the edge cases at once, the template gets blamed, and the IDP loses trust.
The dollar math: $400k recovered, $300k invested
The math is straightforward but politically uncomfortable, because it requires putting a number on engineer time that nobody usually quantifies.
| Input | Value | Notes |
|---|---|---|
| Engineers in onboarding overlap (avg) | 12 | Includes new hires + recent transfers within 8 weeks |
| Hours/week recovered per onboarding engineer | 14 | From 14 hrs/wk of friction to 30 min/wk |
| Senior engineer hours/week recovered | 4 per senior × 5 affected seniors = 20 | Less context-switching to answer questions |
| Total hours/week recovered | 188 | (12 × 14) + 20 |
| Fully loaded hourly cost | $80 | Median for senior engineers in cloud infra |
| Annualized recovered value | $782k | 188 × $80 × 52 weeks |
| Adjustment for non-100% onboarding overlap | × 0.55 | Onboarding overlap isn't always 12 engineers |
| Realistic recovered value | ~$430k/year | Conservative |
| IDP investment year 1 (platform eng + tooling) | $250-350k | One platform engineer + Backstage hosting + integrations |
| Net year-1 ROI | +$80k to +$180k | Positive in year one |
| Year 2+ ROI | +$350-400k/year | Investment drops to ongoing maintenance ($80-120k/year) |
The investment side is more concrete than the recovery side. One platform engineer at fully-loaded $200k for the year, plus $30-50k for Backstage hosting + integrations + tooling, plus part-time involvement from the service teams (call it $70-100k of allocated time across 6 months). Total $300-350k in year one.
The recovery side has the most uncertainty around the "onboarding overlap" number. A 100-engineer org with 20 percent annual hiring has roughly 20 hires per year, with 4-week to 8-week onboarding overlap meaning 4-6 engineers in friction-mode at any given time. The 12 number assumes higher hiring rate or more transfers; adjust accordingly. The dollar value scales linearly.
The argument that lands better than "save $400k" is "recover one half-engineer of capacity per onboarding." Engineering leaders intuitively understand "we get our senior engineers' Mondays back" better than they understand annualized dollar projections.
How to know it worked: the 'how do I' Slack metric
The single instrumentation that tells you the IDP is working is the count of "how do I" messages in engineering Slack channels.
Pre-IDP, a typical 100-engineer org sees 50-100 such messages per week in #engineering, #infrastructure, #platform-help, and similar channels. Each one is a question that should have an answer in the IDP but doesn't, or that's in the IDP but the asker didn't find it.
Post-IDP (month 6 onward), the same channels see 10-20 such messages per week. The 60-80 percent drop is the most reliable signal of golden-path adoption. It's measured in Slack analytics, no instrumentation needed beyond a regex grep on channel history.
The platform team uses the remaining 10-20 messages as the prioritization signal for IDP improvements. Each unanswered question is either a gap in the IDP (add a template or catalog entry) or a discoverability problem (add a search hint or restructure the catalog). The metric drives the work; the work drives the metric down further.
The pattern that fails is letting the IDP roll out without instrumenting. Six months in, the platform team thinks the IDP is great because they built it. The actual signal of success is "are engineers using it instead of asking Slack." Without the Slack metric, the platform team optimizes for things engineers don't actually need; with it, the platform team's roadmap is driven by real friction.
What happens if you don't build it
The opportunity cost of not building an IDP is bounded but real. A growing engineering org without an IDP eventually hires a "developer experience" team to do ad hoc the work an IDP does at scale.
| Year | Without IDP | With IDP |
|---|---|---|
| 1 | Onboarding takes 8 weeks; senior engineers spend 5 hrs/wk answering questions; 1 ad-hoc DX engineer hired | IDP investment + deployment golden path; 2 weeks shorter onboarding by year-end |
| 2 | DX team grows to 3 engineers maintaining scripts, runbooks, on-call docs ad hoc | IDP team is 1 platform engineer maintaining + extending; observability/secrets paths added |
| 3 | DX team is 5 engineers; "how do we deploy" still requires asking; same onboarding tax as year 1 | IDP team is 1-2 engineers; onboarding is 4 weeks; senior engineer time recovered |
| 4 | DX team is 6 engineers; documentation has drifted again; new attempts to "fix it" begin | IDP is the canonical surface; new conventions land as templates; org grows without proportional friction growth |
The DX team isn't a wasted investment — those 5 engineers are doing real work. The work is just less leveraged because it's documentation + scripts + ad-hoc processes instead of templates + catalog + self-service flows. Documentation drifts; templates don't. Scripts get forked; templates get versioned. Ad-hoc processes get replicated badly; self-service flows enforce consistency.
Year 3 is where the divergence becomes obvious. The IDP team is one or two engineers extending the platform; the DX team is five engineers reinventing the same flows for each new service. The IDP org's onboarding tax has stayed flat; the no-IDP org's onboarding tax has grown linearly with team size. Hiring more DX engineers doesn't fix the structural problem; it scales it.
The Backstage / Port / Cortex investment isn't free, and the six-month rollout is real work. But the alternative is paying the same cost as a recurring tax for as long as the org grows, and watching the senior engineers who could be building the next thing instead spend their Mondays answering "how do I." The 14 hours per week per onboarding engineer is the visible cost; the senior engineer time is the hidden one. The IDP recovers both, and the math works on a six-month horizon.


Top comments (0)