Technical June 17, 2024

The Architecture Review Nobody Does

Your team reviews every pull request. But when was the last time anyone sat down and reviewed whether the architecture still makes sense? For most teams, the answer is never.

A whiteboard architecture diagram with arrows and boxes, some sections faded and outdated while new annotations overlap the original design

Most software teams have a code review process. Pull requests get opened, reviewed, and merged. Comments are left. Approvals are given. The mechanics work.

But ask those same teams when they last reviewed their architecture – not a single PR, not a module, but the system as a whole – and you will usually get silence. Architecture review is something everyone agrees is important and almost nobody actually does.

The result is architectural drift: a slow, invisible divergence between what the system was designed to be and what it has actually become. It happens in every codebase. The question is whether you catch it early enough to do something about it or whether you discover it during a production incident.

How architectural drift happens

Nobody designs architectural drift. It emerges from hundreds of individually reasonable decisions made under deadline pressure by people who may not have the full system context.

A developer needs to add a feature. The cleanest place to put it is in Module A, but Module A is owned by another team and getting a change through their review process takes a week. So the developer puts it in Module B, where they have commit access. The PR looks fine. The feature works. The reviewer approves it. And now Module B has a responsibility it was never designed to carry.

Multiply that by a hundred decisions across two years and you have a system where the actual boundaries between modules no longer match the intended boundaries. Services that were supposed to be independent now share database tables. A module labelled “utils” has grown into the most coupled component in the system. The API gateway contains business logic. The data layer makes HTTP calls.

Each individual change was small, reviewed, and approved. The drift was not introduced in any single PR. It accumulated, one reasonable shortcut at a time.

Why PR review cannot catch it

Pull request review is, by design, scoped to the change being proposed. A reviewer looks at the diff and asks: does this change look correct? Is the logic sound? Are there obvious bugs?

Those are the right questions for a PR review. They are the wrong questions for an architecture review. Architectural concerns are not about whether a single change is correct. They are about whether the accumulation of correct changes has moved the system in an unintended direction.

Consider dependency direction. Your architecture says that the presentation layer depends on the business layer, which depends on the data layer. No PR introduces a dependency inversion in isolation. But over time, a utility module in the data layer starts importing a type from the business layer. Then a helper in the presentation layer starts calling a data layer function directly. Each import looks harmless in the context of its PR. The dependency graph, viewed as a whole, has become a tangled mess.

PR review cannot catch this because PR review does not look at the dependency graph. It looks at the lines that changed. The graph is an emergent property of thousands of PRs, and no single reviewer ever sees the whole picture.

What architectural drift costs

The cost of architectural drift is not immediately visible. Systems with drifted architectures still work. Features still ship. Tests still pass. The costs manifest as friction, not failure.

Changes become harder to scope. When module boundaries are unclear, a change in one area has unexpected consequences in another. Developers learn to pad their estimates because they know from experience that what looks like a two-day task will uncover hidden dependencies that turn it into a week.

Onboarding slows down. New developers read the documentation and form a mental model of how the system works. Then they read the code and discover that the documentation describes a system that no longer exists. The time spent reconciling the documented architecture with the actual architecture is pure waste.

Incidents become harder to diagnose. When an error occurs, understanding how data flows through the system is essential for diagnosis. In a system with clear architecture, you can reason about the flow. In a drifted system, the flow is unpredictable. A request might pass through components in an order that nobody anticipated because the actual call graph diverged from the designed one.

Refactoring becomes a major project. Fixing architectural drift after it has accumulated for years is a large, risky undertaking. The longer drift goes unaddressed, the more entangled the system becomes, and the harder it is to restore clear boundaries without breaking things.

What an architecture review actually examines

An architecture review is different from a code review in scope, in the questions it asks, and in the kind of findings it produces. Here is what it should cover.

Dependency structure. Do the actual dependencies between modules match the intended architecture? Are there circular dependencies? Are there modules that depend on everything, creating a coupling bottleneck? Visualise the dependency graph and compare it to the architecture diagram. The gaps are your drift.

Responsibility distribution. Does each module have a clear, coherent purpose? Or have modules accumulated responsibilities over time that make them hard to name, hard to test, and hard to reason about? A module named “helpers” or “utils” that has grown to thousands of lines is a symptom of unclear responsibility boundaries.

Data flow. How does data move through the system? Are there unexpected paths where data bypasses the layers it should pass through? Is the same data transformed in multiple places using different logic? Data flow inconsistencies are a common source of bugs that are nearly impossible to find by reviewing individual files.

Cross-cutting consistency. How are concerns like authentication, authorisation, logging, error handling, and input validation implemented across the system? Are they consistent, or does each module implement its own approach? Inconsistency in cross-cutting concerns is one of the most common findings in architecture reviews, and it is almost invisible at the PR level.

Interface contracts. Are the interfaces between modules well-defined and respected? Or have implementations leaked through abstractions, creating tight coupling that defeats the purpose of having interfaces in the first place? When a module reaches into another module's internals rather than going through the defined interface, the architecture has been bypassed.

Why teams skip it

If architecture review is this important, why does almost nobody do it? There are several structural reasons.

There is no natural trigger. PR review happens automatically because PRs happen automatically. Architecture review has no built-in trigger in the development workflow. Nobody opens a ticket that says “review the architecture.” There is no webhook for it. It has to be deliberately scheduled, which means it has to compete with feature work for time and attention. It usually loses.

It requires broad context. A PR can be reviewed by anyone familiar with the module being changed. An architecture review requires someone who understands the system as a whole – the intended design, the current state, and the history of how the two diverged. In many teams, only one or two people have that context, and they are usually the busiest people on the team.

The output is uncomfortable. Architecture review findings tend to be large. They do not fit into a Jira ticket. Fixing them requires coordinated effort across modules and teams. The findings often implicitly criticise years of accumulated decisions. This makes architecture review politically and practically difficult in a way that PR review is not.

It was historically impractical at scale. Manually reviewing the architecture of a large codebase takes weeks of concentrated effort from a senior engineer or architect. For a 500,000-line codebase, the cost of a thorough architecture review can be tens of thousands in engineering time. Most teams cannot justify that cost on a regular cadence.

How to actually do it

The good news is that architecture review does not have to be an all-or-nothing proposition. There are practical approaches that make it feasible.

Start with the dependency graph. Generate a module dependency visualisation for your codebase. Most languages have tools that can produce this. Look for cycles, unexpected dependencies, and modules with unusually high fan-in or fan-out. This takes minutes, not weeks, and it reveals the most critical structural issues immediately.

Compare to your documentation. If you have architecture documentation – even informal notes or an old whiteboard photo – compare it to what the code actually does. The discrepancies are your drift inventory. If you do not have documentation, the review itself becomes the documentation, which is valuable in its own right.

Audit cross-cutting concerns systematically. Pick one concern – error handling, for example – and trace how it is implemented across the entire codebase. Is it consistent? Are there modules that do something completely different? This kind of horizontal review across modules is something PR review never provides.

Use AI to scale the review. Full-codebase AI review can analyse hundreds of thousands of lines and identify architectural patterns, inconsistencies, and structural issues that would take a human reviewer weeks to find. It is not a replacement for human architectural judgement, but it provides the raw analysis that makes human judgement feasible at scale.

Schedule it. Put architecture review on the calendar. Quarterly is a reasonable cadence for most teams. It does not need to be a week-long exercise. A focused half-day review that examines dependency structure, cross-cutting consistency, and module responsibilities will catch the majority of drift before it becomes entrenched.

The review nobody does is the one that matters most

Code review at the PR level is necessary but not sufficient. It ensures that individual changes are reasonable. It does not ensure that the system, as a whole, remains coherent.

Architecture is not defined by any single change. It is the emergent structure of thousands of changes over time. Reviewing it requires stepping back from the diff and looking at the system as a system. That is the review nobody does – and it is the one that determines whether your codebase remains maintainable or slowly becomes an expensive liability.

The architecture of your system is drifting right now. The only question is whether you will notice before it matters.

Limits and tradeoffs

It can miss context. Treat findings as prompts for investigation, not verdicts.
False positives happen. Plan a quick triage pass before you schedule work.
Privacy depends on your model setup. If you use a cloud model, relevant code is sent to that provider; local models can keep inference on your own hardware.