Opinion June 24, 2024

Why Most Code Review Tools Focus on the Wrong Thing

The entire code review industry is built around pull requests. That means most tools only ever look at the lines that changed – and ignore the vast majority of the codebase where the real risk lives.

Open any list of code review tools and you will notice a pattern. Nearly every one of them is built around the pull request. They integrate with GitHub, GitLab, or Bitbucket. They trigger on PR events. They analyse diffs. They post inline comments on changed lines. The entire workflow assumes that the unit of review is the set of lines that changed between two commits.

This makes intuitive sense. The pull request is the moment when new code enters the codebase. It is the gate through which changes pass. Reviewing changes at the gate feels like the responsible thing to do. And it is – but it is also profoundly incomplete.

When a tool reviews only the diff, it is reviewing the 2% of the codebase that changed today. The other 98% – the code that was already there, the code that nobody has looked at in years, the code that was written by someone who left the company three jobs ago – remains entirely outside the scope of review.

That is the wrong thing to focus on. Not because PR review is bad, but because it creates a false sense of coverage that obscures where the real risk actually lives.

The streetlight effect

There is an old joke about a person searching for their keys under a streetlight. A passer-by asks where they lost them. The person points to the other side of the street. Then why search here? Because the light is better.

PR-scoped review suffers from the same problem. We review diffs because diffs are visible, bounded, and tractable. A pull request has a clear beginning and end. It fits into a workflow. It generates notifications. It has an author and a reviewer and a merge button. Everything about it is designed to be actionable.

The rest of the codebase has none of those properties. There is no event that triggers a review of code that already exists. There is no notification when a module that was written four years ago becomes a liability. There is no merge button for deciding to look at something that has been quietly accumulating problems since before half the team was hired.

So we do not look. We focus on the streetlight – the diff – because the tooling makes it easy and the workflow makes it natural. And we call that coverage.

What diff-scoped review actually catches

To be fair, PR review catches genuine problems. A reviewer looking at a diff can spot logic errors in new code, catch missing error handling in a new function, flag a security issue in a newly introduced endpoint, or notice that a change contradicts an existing convention. These are real, valuable catches.

But consider the categories of problems that diff-scoped review structurally cannot catch:

Systemic inconsistency. Your codebase has three different patterns for database access, two incompatible approaches to error handling, and four ways to validate user input. None of those inconsistencies were introduced in a single PR. They accumulated over months or years, one reasonable-looking change at a time. No diff will show you the systemic picture because no single diff created the problem.

Dormant security vulnerabilities. A SQL injection vulnerability that has existed in a utility function since 2019 will never appear in a diff unless someone happens to modify that function. The vulnerability is not new. It is not changing. It is just sitting there, waiting. PR review cannot find it because PR review only looks at things that move.

Architectural drift. The system was designed with clear boundaries between modules. Over time, those boundaries eroded. Services that should not know about each other now share types. A module that was supposed to be a thin adapter now contains business logic. This drift happened incrementally – each individual PR looked fine. Only the aggregate picture reveals the problem.

Dead code and abandoned paths. Features were built and then abandoned. Configuration options were added for experiments that ended years ago. Entire modules exist that are no longer called from anywhere. None of this dead code generates diffs. It just sits in the repository, confusing new developers and inflating complexity metrics.

Cross-cutting concerns. Logging is inconsistent across services. Authentication checks are applied in some controllers but not others. Rate limiting exists on half the API endpoints. These are not problems that live in one file or one diff. They are patterns – or the absence of patterns – that only become visible when you look at the codebase as a whole.

The false sense of coverage

The most insidious effect of diff-scoped review is not what it misses. It is the confidence it creates. Teams that review every PR believe they have thorough code review. They have metrics: review coverage percentage, time to first review, percentage of PRs with at least one approval. The numbers look good. The process is documented. The dashboards are green.

But those metrics measure process compliance, not codebase coverage. A team can review 100% of pull requests and still have reviewed less than 10% of the codebase in any given year. The remaining 90% – the stable, unchanged, load-bearing code that the entire system depends on – has not been looked at by anyone in months or years.

This creates a specific kind of organisational blind spot. When leadership asks whether the codebase is well-reviewed, the answer is yes. When an incident traces back to a vulnerability in code that was written three years ago, the question becomes: how did this get past review? The answer is that it did not get past review. Review never looked at it. Review was never designed to look at it. The tooling was not built for it.

Why the industry built it this way

It is worth understanding why the entire code review tool market converged on PR-scoped review. There are practical reasons.

First, the PR is the natural integration point. Git hosting platforms expose webhook events for pull requests. Building a tool that triggers on PR creation and reads the diff is straightforward. Building a tool that analyses an entire codebase is orders of magnitude harder.

Second, incremental review scales. Reading a 200-line diff takes minutes. Reading 400,000 lines of existing code takes weeks – for a human reviewer. The economics of manual review make full-codebase review impractical. So the industry optimised for the scope that humans can handle.

Third, the feedback loop is tight. Review a PR, merge it, move on. Full-codebase review has no natural cadence, no obvious trigger, and no built-in workflow. It is harder to sell a tool that does not fit into an existing process.

These are all legitimate constraints. But they are constraints of the tooling, not constraints of what teams actually need. The fact that it is hard to review the full codebase does not mean that reviewing only the diff is sufficient. It means there has been a gap between what is possible and what is necessary.

What changes when you review the whole thing

When you look at the entire codebase – not just the latest diff – the picture changes dramatically.

You can see inconsistencies that span modules. You can identify code that has not been touched in years but sits on a critical path. You can find security vulnerabilities that predate your team. You can map the gap between your documented architecture and what the code actually does.

Full-codebase review does not replace PR review. It complements it. PR review gates new changes. Full-codebase review audits the existing system. Together, they provide actual coverage – not just coverage of things that happened to change this week.

The economics of full-codebase review have changed. What used to require weeks of senior engineer time can now be done by AI in hours. Large language models can read and reason about hundreds of thousands of lines across an entire repository. They can identify patterns, flag inconsistencies, and surface dormant issues that no diff-scoped tool would ever see.

This is not a theoretical improvement. It is a category of review that was previously impractical and is now feasible. The question for teams is whether they are willing to look at the part of the codebase they have been ignoring – or whether they will continue searching under the streetlight because that is where the tools happen to point.

The real risk is what you are not reviewing

Every team has code they are not reviewing. It is the code that was there before the current team arrived. It is the code that works, so nobody has a reason to change it. It is the code that is too complex to modify without a month of context-gathering.

That code is not inert. It is running in production. It is handling user data. It is processing transactions. And the only reason nobody has found the problems in it is that nobody has looked.

Most code review tools focus on the wrong thing. They focus on the change, not the system. They focus on the diff, not the codebase. They provide a narrow, incrementally accurate view that systematically misses the large-scale, structural, pre-existing issues that cause the most expensive incidents.

The diff matters. But it is not the whole picture. And until your review process includes the code that is not changing, you do not have code review. You have change review. The distinction is not semantic. It is the difference between knowing that today's changes look fine and knowing that your system is sound.

Limits and tradeoffs

It can miss context. Treat findings as prompts for investigation, not verdicts.
False positives happen. Plan a quick triage pass before you schedule work.
Privacy depends on your model setup. If you use a cloud model, relevant code is sent to that provider; local models can keep inference on your own hardware.