DEV Community

Sahil Singh
Sahil Singh

Posted on • Originally published at getglueapp.com

Tribal Knowledge: The $300K Problem Nobody Talks About

I watched a $4.2 million engineering hire fail because of something that never showed up in a single dashboard.

We had recruited a senior architect away from Stripe. Brilliant engineer. Perfect cultural fit. She started on a Monday. By Friday, she had asked the same question to four different people and gotten four different answers about how our payment processing pipeline worked.

By week six, she was spending more time in Slack archaeology than writing code. By month three, she gave her notice. "I can't be effective here," she told me. "The system makes sense to people who built it. I'm not one of them."

The system she was describing had a name: tribal knowledge.

What Tribal Knowledge Actually Is

Tribal knowledge in software development is NOT "stuff we haven't documented yet." That framing makes it sound like a documentation problem with a documentation solution. It's deeper than that.

Tribal knowledge is the gap between what your code does and why it does it that way. It's the architectural decisions made in a meeting three years ago that nobody recorded. It's the workaround in the billing service that prevents a race condition but looks like a bug to anyone who wasn't there when the incident happened.

Every codebase has two layers of meaning:

  • Syntactic: what the code literally does (anyone can read this)
  • Semantic: why the code exists in this form (lives in people's heads)

Tribal knowledge is that second layer. And it's the layer that determines whether a team can move fast or gets stuck in interpretation loops.

Why It Compounds Silently

A product manager asks "can we add real-time notifications?" The engineering lead doesn't say "I don't know." They say "let me check with Marcus." Marcus built the event system two years ago. He spends 45 minutes explaining the constraints. The PM gets a qualified answer three days later.

Everyone treats this as normal. It's not normal. It's a three-day delay on a thirty-minute question, and it happens dozens of times per quarter.

A new engineer gets assigned a bug in the checkout flow. There's a conditional branch that doesn't make sense. They ask on Slack. Someone responds: "Oh, that handles the edge case from the Acme migration. Don't touch it." No documentation. No comment. No test. The new engineer patches around it. Six months later, someone removes the branch. Production breaks on a Saturday night.

The Real Cost

For a 40-person engineering team at a Series B SaaS company:

  • Senior onboarding: 12-16 weeks to full productivity (vs 4-6 weeks at well-documented teams). That's $200K-300K in lost productivity annually for 6 hires.
  • Decision latency: 3-5 days for architectural questions requiring tribal knowledge consultation (vs 2-4 hours when codified).
  • Incident response: MTTR roughly doubles when the on-call engineer doesn't have tribal knowledge about the failing system.
  • Senior engineer time: 30-40% of the week spent answering questions instead of building, because they're the only translator between the code and everyone else.

The Bus Factor Connection

The software industry calls this the bus factor. How many people can disappear before a system becomes unmaintainable?

For most teams, the honest answer for their most critical systems is one. Sometimes zero, because the person who understood it already left.

The bus factor problem creates a perverse incentive: the more tribal knowledge you accumulate, the more indispensable you become, and the less time you have to distribute that knowledge. The bottleneck reinforces itself.

Why Documentation Doesn't Fix It

Knowledge silos don't form because engineers are bad at documentation. They form because the incentive structure makes documentation irrational.

Writing code is visible, measurable, and rewarded. It ships features. It closes tickets. Writing documentation is invisible, unmeasurable, and unrewarded. Nobody gets promoted for a great Architecture Decision Record.

There's a second structural cause: code evolves faster than documentation. You write a systems overview on Monday. By Thursday, two services have been refactored. The overview is now partially wrong. Partially wrong documentation is worse than no documentation because it creates false confidence.

What Actually Works

1. Make knowledge discoverable, not just written. The problem isn't that knowledge doesn't exist — it's that it can't be found. Tools that analyze your codebase and extract understanding automatically (codebase intelligence) create a living knowledge layer that stays current.

2. Pair programming rotations. One hour of pairing transfers more knowledge than a week of documentation. Schedule regular pairing sessions where the knowledge holder works alongside someone learning the system.

3. Architecture Decision Records (ADRs). Document the why behind decisions. When the original author leaves, successors can understand the reasoning, not just the code.

4. Rotate on-call responsibility. If only one person can handle incidents for a system, that's a bus factor of 1. Add people to the rotation gradually, starting with shadow on-call.

5. Require multi-person code review. For critical systems, require at least one reviewer who is not the primary maintainer. This forces knowledge distribution through the review process.

How to Measure It

Track these signals:

  • Time-to-first-commit for new engineers
  • "Ask X" frequency in Slack (how often people defer to one person)
  • Cross-training coverage — how many people can independently debug each critical system
  • PR review concentration — are reviews always assigned to the same 2-3 people?

If your knowledge silos score is high and your bus factor is low, you have a tribal knowledge problem. The good news: it's fixable. The bad news: it won't fix itself.


Originally published on getglueapp.com/blog/tribal-knowledge-software-teams

Glue automatically detects knowledge silos and bus factor risks from your git history — no surveys, no manual tracking.

Top comments (0)