Opinion December 30, 2024

The Bus Factor Problem in Legacy Codebases

Your most important codebase is understood by one person. If they leave, you have a problem you cannot hire your way out of quickly.

A single developer at a desk surrounded by stacks of documentation and legacy code printouts symbolising concentrated knowledge

The bus factor is a blunt metric with a morbid name. It asks a simple question: how many people on your team can be removed – by resignation, illness, transfer, or the proverbial bus – before the project stalls? If the answer is one, your project has a bus factor of one, and you are one departure away from a crisis.

In theory, modern engineering practices should prevent this. Code review distributes knowledge. Documentation captures decisions. Shared ownership means multiple people understand each part of the system. In practice, legacy codebases violate all of these assumptions. And legacy codebases are exactly the systems where a bus factor of one is most dangerous, because they are the systems that are hardest to understand from scratch.

Why legacy codebases have a bus factor of one

Legacy codebases develop a bus factor of one through a process that nobody intends and everybody contributes to.

Tribal knowledge replaces documentation. The codebase was written over years, often by a small team or a single developer. Decisions were made in conversations, Slack messages, and mental models that were never written down. Why does the payment module use a different database connection than the rest of the application? Because there was a performance issue in 2019 and the lead developer added a second connection pool as a workaround. That decision is not documented anywhere. It lives in one person's memory.

Over time, the number of undocumented decisions grows. Each one is individually small. Collectively, they form a body of knowledge that is essential for maintaining the system and impossible to reconstruct from the code alone. The person who holds this knowledge becomes irreplaceable, not because they are uniquely talented, but because the knowledge they carry has no backup.

Specialisation deepens over time. In any team, people naturally gravitate towards areas of the codebase they know well. The developer who built the reporting module handles all reporting tickets. The developer who understands the authentication flow handles all security changes. This is efficient in the short term. Every ticket goes to the person who can resolve it fastest.

But efficiency today creates fragility tomorrow. After two years of specialisation, nobody else on the team has touched the reporting module. Nobody else knows its quirks, its edge cases, or the reasons behind its unusual architecture. If the reporting specialist leaves, the team inherits a module they have never worked on, do not understand, and cannot safely modify without extensive exploration.

No shared understanding of the whole system. PR-based code review creates knowledge about individual changes, not about the system as a whole. A developer who reviews a PR touching the payment module learns about that specific change. They do not learn how the payment module fits into the broader architecture, what its dependencies are, or why it was designed the way it was.

In a legacy codebase, the whole-system understanding is the critical knowledge. Individual changes make sense only in context. Without that context, developers can review a PR and approve it without understanding whether the change is consistent with the system's broader patterns. The knowledge remains concentrated in the one or two people who built or maintained the system over time.

What happens when that person leaves

When the person with the concentrated knowledge departs, the consequences unfold in stages.

Stage one: immediate questions go unanswered. Within the first week, the team encounters a question about the codebase that only the departed developer could have answered. Why does this module behave differently in production than in staging? What is the correct procedure for migrating the database schema? Why does this function have a seemingly redundant check? The answers are gone.

Stage two: velocity drops. Changes that would have taken the specialist an afternoon now take the remaining team days. They are not less capable. They simply lack the contextual knowledge that made the specialist fast. Every task involves exploration, experimentation, and caution. The fear of breaking something they do not understand slows every decision.

Stage three: workarounds accumulate. Faced with a module they do not fully understand, developers take the safe path. They add code around the existing module rather than modifying it. They duplicate functionality rather than extending what is already there. They write wrappers and adapters instead of integrating properly. The codebase grows more complex, not because the team lacks skill, but because they lack confidence in the existing code.

Stage four: incidents trace to knowledge gaps. A production incident occurs. The root cause is in the legacy module. The team investigates, but the debugging takes three times longer than it would have because nobody understands the module's intended behaviour well enough to distinguish a bug from a feature. A workaround was applied in stage three that interacts unexpectedly with the module's original logic. The workaround is now part of the problem.

Stage five: the rewrite temptation. Frustrated by the mounting complexity and the difficulty of maintaining code they did not write and do not understand, someone proposes a rewrite. The rewrite seems attractive because it promises a fresh start with shared ownership. But rewrites of legacy systems are notoriously risky. They take longer than expected, introduce their own bugs, and often fail to replicate the subtle behaviours that the legacy system had accumulated over years of real-world use.

You cannot hire your way out of this

The instinctive response to a bus factor problem is to hire. Bring in a senior developer who can learn the system. But this is slower than it appears.

A new developer joining a well-documented codebase with good test coverage, consistent patterns, and an active team that can answer questions might reach full productivity in three to six months. A new developer joining an undocumented legacy codebase with low test coverage, inconsistent patterns, and no one available to explain the design decisions might take twelve months or longer.

During that ramp-up period, the new developer is not just learning. They are also guessing. Without documentation or a knowledgeable team member to consult, they must infer intent from code. And code does not always communicate intent. A function that handles a rare edge case looks the same as a function that handles a common case. A workaround for a long-fixed bug looks the same as intentional logic. The new developer cannot tell the difference, so they treat everything as intentional, preserving complexity that should have been removed.

Solutions: reducing bus factor systematically

Reducing bus factor is not a single action. It is a set of practices that distribute knowledge over time.

Systematic documentation. Not documentation for documentation's sake. Targeted documentation that captures the decisions that would be lost if a key person left. Why was this architecture chosen? What alternatives were considered? What are the known limitations? What workarounds exist and why? This is architecture decision records (ADRs), not API docs. The goal is to capture the “why,” not the “what.”

Rotation and pairing. Deliberately rotate developers across modules. When the reporting specialist goes on holiday, someone else handles reporting tickets. When a complex change is needed in the authentication flow, pair the specialist with another developer. This is slower in the short term and dramatically reduces risk in the long term.

Code review as knowledge transfer. Code review should not be a gate. It should be a teaching moment. When a specialist submits a PR in their area, the reviewer should be someone who does not usually work on that code. The reviewer asks questions. The specialist explains. Both people learn. The knowledge transfers, incrementally, review by review.

Shared reports and codebase visibility. Full-codebase review produces a structured report that everyone on the team can read. The report describes the architecture, identifies patterns and inconsistencies, flags risks, and provides context for each finding. This shared artefact gives the entire team a common understanding of the codebase, even if they have not personally worked on every module.

A team that reads the same review report is a team that shares a baseline understanding. They may not know every detail, but they know the overall structure, the major risk areas, and the design patterns in use. That shared baseline is the antidote to tribal knowledge.

Measuring bus factor

Bus factor is difficult to measure precisely, but you can approximate it. Look at your git history. For each module, count how many developers have committed changes in the past twelve months. If a module has only one contributor, its bus factor is one. If it has five contributors, its bus factor is higher – though not necessarily five, because committing a change is not the same as understanding the module.

A more meaningful measure combines commit history with review history. If a module has one primary author but three people have reviewed changes to it, the knowledge distribution is wider than the commit history alone would suggest.

The goal is not a precise number. It is an awareness of where knowledge is concentrated. If your git history reveals that 40% of your modules have a single contributor, you know where your risk is. You can then prioritise those modules for documentation, rotation, and review.

VibeRails and shared codebase understanding

VibeRails performs full-codebase reviews that produce structured reports accessible to the entire team. The report covers architecture, patterns, inconsistencies, risks, and recommendations – the kind of information that typically lives in one person's head.

When everyone on the team can read a comprehensive analysis of the codebase, the knowledge monopoly breaks. The report does not replace deep expertise – nothing does – but it raises the baseline understanding for everyone. A developer who has never touched the payment module can read the review findings for that module and understand its structure, its known issues, and its relationship to the rest of the system.

That shared understanding is the difference between a team that stalls when someone leaves and a team that continues, perhaps more slowly, but without crisis. The bus factor does not have to be one. It is one because knowledge is concentrated. Distribute the knowledge, and the risk distributes with it.