DEV Community

Daniel Westgaard
Daniel Westgaard

Posted on • Originally published at riftmap.dev

Your senior engineer just gave notice. Most of what they knew was in the repos all along.

Tribal knowledge is two different things wearing one name. The half everyone panics about losing was declared in your Terraform, your Dockerfiles, and your CI config the whole time.

It usually starts with a calendar invite that has no agenda. Thirty minutes, your senior platform engineer, no subject line. You half know before you sit down. They have been here six years. They are leaving in a month.

The first day you feel it as a personal loss, because it is one. The operational version arrives later, usually in a standup. Someone proposes bumping the base image that half the services build from. Routine work. Then somebody asks who actually knows everything that pulls from it, and the room goes quiet, and every face turns very slightly towards the person who is leaving.

That quiet is the sound of a team discovering its bus factor in real time. The reflex that follows is always the same. Get it out of their head before they go. Book the knowledge-transfer sessions. Start a wiki page. Pair them with someone for the notice period and hope.

I want to argue that this reflex is half right, and that the half it gets wrong is the expensive half.

The word that hides two different problems

We call it tribal knowledge, and we say it as though it were one thing. It is not. Two very different kinds of knowledge shelter under that one phrase, and the panic about a departing engineer conflates them, which is why the panic so often spends its energy in the wrong place.

The first kind is genuinely tacit. It is the why. Why the payments service retries three times and not five. Which of the two cloud accounts the staging environment actually bills to, and the historical accident that explains it. Who to call at the vendor when a certificate renewal fails silently, because you have learned the hard way that the support queue will not help you. The incident two years ago whose scar tissue is the reason one config flag exists and must never be flipped. None of this is written down, and most of it cannot be derived from anything. It lives in one person. It will leave with them. This kind of knowledge is real, it is valuable, and getting it out before someone walks out the door is worth doing.

The tools built for this are good at it. Swimm, Confluence, Notion, a decent internal wiki, an afternoon of recorded walkthroughs. The whole category exists to move the contents of a person's head into a form the organisation can read later, and for tacit knowledge that is the right move. There is a reason it so rarely happens, and it is not that teams do not care. It is that the person holding the knowledge does not know they are holding anything unusual. To them, the field that two services name differently for the same value, so that mixing them produces output that is wrong but does not error, is not a secret worth recording. It is just how the thing works. You cannot ask someone to write down what they do not know is worth writing down.

So far, so familiar. Here is the part the panic misses.

Half of it was never tribal

The second kind of knowledge hiding under tribal knowledge is the structural map. Which repositories depend on which. What breaks if the base image moves. Where the shared Terraform module is consumed, and by whom. Which pipelines pull the CI template you are about to edit. Which services still pin the old tag and will fail their next rebuild the moment you ship.

This is what the standup was really asking for when the room went quiet. And it feels identical to the tacit kind, because it also lived in one person's head, and because losing the person feels like losing all of it at once. But it has a property the tacit kind does not, and the whole argument turns on this property.

It was already written down.

Not in a wiki. In the manifests. Every edge your departing engineer carried in their head was declared somewhere in the source, by someone, on purpose. The base-image relationship is a FROM line in a Dockerfile. The module relationship is a source block in Terraform. The chart dependency is a value reference in a Helm chart. The pipeline relationship is an include in GitLab CI or a reusable workflow in GitHub Actions. The library relationship is a require in a go.mod or a line in a lockfile. None of these are tacit. They are facts in plain text, in repositories you already own, waiting for someone to read them.

So why did it ever feel like tribal knowledge? Because nobody else had read all of it. Reading every FROM line and every source block and every include across two hundred repositories, and holding the result in your head as one connected graph, is most of a person's job for a very long time. Your senior did not do it in a sitting. They accreted it, one incident and one migration and one code review at a time, over six years, until they had quietly become the index. When they leave, the index leaves. But the thing the index pointed at, the actual declared structure, is sitting in the repos exactly where it was this morning, entirely unchanged by their resignation.

That is the difference that matters. You genuinely cannot regenerate the why from the source. You can absolutely regenerate the what-depends-on-what from the source, because it was never anywhere else to begin with. One is a memory problem. The other is a parsing problem. The panic treats them as one problem, reaches for a memory solution, the wiki and the handover session, and points it at the thing that was a parsing problem all along.

I want to be precise about the boundary, because this audience will catch me if I am not. Not every coupling between two systems is declared in a manifest. If one service calls another over an internal endpoint that appears in neither side's configuration, no parser will find that edge, and your senior may well have carried it too. That sort of runtime coupling belongs closer to the tacit pile, and it is worth getting onto a diagram while you still can. But the heavy, expensive structure, the build and deploy and infrastructure substrate that everything else stands on, is overwhelmingly declared. That is the part that looks lost when someone leaves and is not.

The handover is the wrong place to rebuild a map

Watch what most teams do with the weeks they have left. They put the departing engineer in a room and ask them to draw the dependency diagram. Map the services. List what depends on the shared module. Write the runbook for the base-image bump. It feels responsible. It is mostly waste, for three reasons.

The first we have already met. They do not know which edges are load-bearing, because to them every edge is just true. They will lovingly document the interesting parts, the clever bits they are proud of, and they will not think to mention the dull tag pin in a sleepy repository that has not changed in a year and will take production down the first time someone bumps the image. The boring edges are the ones that bite. The boring edges are exactly the ones a human brain-dump skips.

The second is that the diagram is stale the moment it is drawn. It is accurate on the day. Then the first migration after they leave moves something, the diagram does not move with it, and nothing tells you it has drifted. Platform teams have been rediscovering this for years under another name. It is the same reason service catalogs rot, and the same reason so many Backstage rollouts quietly stall, which I went through in detail when writing about developer portals. A hand-maintained model of how the system fits together is only ever as accurate as the last person who remembered to update it, and people stop remembering. Developer portals solve real problems and the teams that adopt them are not naive. The catalog rots anyway. A dependency map drawn by hand is a service catalog with a bus factor of one, drawn by the very person who is about to leave.

The third reason is the one that actually matters, and it is why this is not just a tooling preference. The notice period is the single most scarce resource you will have for a long time, and you are spending it on the one kind of knowledge a machine could have reconstructed for nothing, while short-changing the kind that genuinely needed a human. Every hour your senior spends drawing boxes and arrows a parser could have produced in a minute is an hour they are not spending on the why. The vendor contact. The incident scar tissue. The flag that must never flip. That is the knowledge that walks out for good, that the handover should exist to protect, and that gets crowded out because everyone is busy rebuilding a map which was in the repositories the entire time.

Keep them for what only they know

The fix is not a better wiki and it is not a more disciplined handover. It is to stop treating two different problems as one. Separate the piles.

The tacit pile, the why, is what the human's last weeks are for. Sit with them. Record it. Ask the awkward questions about the flag and the vendor and the account. That time is irreplaceable, you will not get it back, so protect it from being eaten by box-drawing.

The structural pile, the what-depends-on-what, does not need the human at all. It needs something to read the manifests across the whole organisation and assemble them into the graph your senior had been assembling by hand. The edges are declared. The only thing ever missing was someone, or something, that had read all of them at once, and kept reading after the person left.

This is the part of the problem I build for. Riftmap reads the declared dependencies across an entire GitHub or GitLab organisation, Terraform, Docker, Helm, CI, package manifests, and builds the cross-repo dependency graph from the source itself, with one read-only token and no catalog to maintain. It is the map your senior held, reconstructed deterministically, and kept current after they are gone, because it re-reads the repositories rather than trusting a diagram somebody drew in their final week. Ask it what breaks if you bump the base image and the answer comes from what the repositories declare today, not from what anyone remembered to write down in March. This is less a new idea than an obvious one once you see the split. Even Meta, with effectively unlimited engineers, landed in the same place on their own pipelines and generated a cross-repo dependency index rather than asking people to maintain a map by hand.

There is a second thing the graph gives you, and if you are the one who just received the resignation it is the thing I would lead with. Once the structure is parsed, you can ask a question the departing engineer could never have answered honestly about themselves. Which of the repositories that everything else depends on are maintained by exactly one person. The high-blast-radius, single-maintainer substrate. The build image, the shared CI template, the base module the whole organisation leans on, that it turns out precisely one human has touched in a year. That is your next resignation, visible before it arrives. An engineer I have a lot of respect for, Owen Zanzal, pushed me towards this framing, and it is worth a post of its own, which is coming. The short version is that the moment you have the dependency graph, ownership stops being a question of who wrote the code and becomes a question of who maintains the things everyone else is standing on.

The map did not leave

When the person who understood how everything fit together hands in their notice, it feels as though the map is leaving with them. It is not. The map was in your manifests the whole time. They were simply the only one who had read all of it. Keep their last weeks for the things only they know. The rest was never theirs to take.


Related reading

Top comments (0)