Why PR Review Alone Isn't Enough

PR review has a structural limitation: it can only evaluate what changed. It can't evaluate what's already there.

A physical metaphor for “Why PR Review Alone Isn't Enough”: a set of simple geometric blocks arranged to show tradeoffs, with a diagram card (boxes/arrows only)

Your team has a solid code review culture. Every pull request gets at least one reviewer. Linting runs in CI. Tests have to pass before merge. You're doing the right things.

And yet, the codebase has problems that none of those practices caught. There are three different logging patterns across modules. There's a utility class with 400 lines of code that nothing calls anymore. The service layer was supposed to be the only thing that talks to the database, but four controllers query it directly. The authentication middleware was last meaningfully updated eighteen months ago.

These problems didn't slip through code review. They're invisible to code review. PR review has a structural blind spot, and understanding that blind spot is the first step toward addressing it.


The diff is the scope

When a reviewer opens a pull request, they see a diff. Changed lines are highlighted. Surrounding context is shown. The reviewer evaluates the change: is it correct, is it clean, does it follow conventions, does it handle edge cases?

This scope is inherent to the process. PR review is designed to evaluate changes, not the codebase itself. The existing code is the backdrop, not the subject. No one opens a PR review and thinks: “Let me also check whether the 50 files that this PR didn't touch are still well-structured.”

That's not a failure of discipline. It's a structural property of the process. PR review evaluates deltas. Everything that isn't a delta is outside its scope.


Blind spot #1: Inconsistent patterns across modules

This is the most common problem that PR review can't catch. Here's how it develops.

In January, a developer introduces an API client that uses async/await with try/catch error handling. The PR is reviewed and approved. The pattern is clean, the code works, the tests pass.

In March, a different developer on a different module builds another API client. This one uses promises with .then()/.catch() chains. The PR is reviewed and approved. The pattern is clean, the code works, the tests pass.

In June, a third developer builds a data sync service. They use callbacks with an error-first convention. Reviewed, approved, shipped.

Each PR was internally consistent. Each reviewer evaluated the code against its immediate context and found it acceptable. But the codebase now has three different async error-handling patterns. No single PR introduced this inconsistency. It emerged from the aggregate of individually reasonable decisions, reviewed in isolation.

The only way to catch this is to look at the codebase across modules simultaneously. PR review, by definition, doesn't do that.


Blind spot #2: Dead code and unused abstractions

Dead code doesn't appear in diffs. That's what makes it invisible to PR review.

Consider a common scenario. A team builds a notification system with a plugin architecture: email, SMS, push, webhook. Over time, the product evolves. SMS notifications get deprecated. The feature flag is turned off. The UI entry point is removed.

But the SMS plugin code – 800 lines of implementation, configuration, and tests – stays in the repository. The PR that removed the UI entry point didn't touch those files. The reviewer of that PR had no reason to look at the SMS module. It just sits there, taking up space in the codebase, confusing new developers who wonder whether it's still active.

The same thing happens with abstractions. A team introduces a base class to handle common repository logic. Over two years, the codebase evolves. Three of the four subclasses are refactored to use a different approach. The base class now has exactly one consumer. The abstraction is no longer justified – it adds complexity without providing the generalization it was designed for.

No PR introduced this problem. The problem is the absence of a removal that nobody thought to make, because nobody was looking at the codebase from high enough to see it.


Blind spot #3: Architectural drift

Every codebase has an intended architecture. Maybe it's a layered architecture with controllers, services, and repositories. Maybe it's a modular monolith with clear domain boundaries. Maybe it's microservices with defined communication patterns.

Architectural drift happens when individual changes are small and pragmatic but their accumulated effect is a codebase that no longer matches its intended structure.

A controller needs a quick database query for a one-off feature. Rather than threading it through the service layer, the developer adds a direct database call. The PR reviewer sees a small, pragmatic change. It ships.

Six months later, twelve controllers have direct database calls. The service layer is still there, but it's now optional – sometimes the codebase uses it, sometimes it doesn't. The architecture diagram on the wiki shows clean layers. The actual code has shortcuts everywhere.

Each individual shortcut was reviewed and approved. None of them, in isolation, was a bad decision. But the aggregate effect is an architecture that no longer constrains anything. The team's mental model of the system and the system itself have diverged.

PR review can't detect this because each PR only shows one shortcut. The pattern – the fact that shortcuts are accumulating and eroding the architecture – is only visible when you look at the whole codebase.


Necessary but not sufficient

None of this is an argument against PR review. PR review is essential. It catches bugs, enforces conventions, spreads knowledge across the team, and maintains a baseline of quality for incoming changes. Keep doing it.

But recognize what it is: a quality gate for changes. It's not a quality assessment of the codebase. Those are different activities with different scopes, and one doesn't substitute for the other.

What fills the gap is periodic full-codebase review: reading every file, evaluating the project as a whole, and identifying the cross-cutting problems that no individual PR review could catch. This is what code audits are supposed to do – and what most teams skip because it used to require weeks of expensive senior time.

AI has changed that equation. A full-codebase review that would have taken a senior consultant two weeks can now be produced in hours. The output is a structured set of findings – inconsistencies, dead code, architectural violations, security gaps – that your team triages using their own judgement.

PR review catches problems at the point of introduction. Full-codebase review catches problems of accumulation. You need both. The first one, your team already does. The second one is now practical.


Limits and tradeoffs

  • It can miss context. Treat findings as prompts for investigation, not verdicts.
  • False positives happen. Plan a quick triage pass before you schedule work.
  • Privacy depends on your model setup. If you use a cloud model, relevant code is sent to that provider; local models can keep inference on your own hardware.