Guide June 10, 2024

How to Prioritise Code Review Findings

You ran a code review and it produced 50 findings. That is the easy part. The hard part is deciding which ones to fix first, which ones to schedule, and which ones to accept as known risks.

A thorough code review – especially a full-codebase review rather than a PR-level review – produces a lot of findings. Dozens, sometimes hundreds. Security issues, performance bottlenecks, inconsistent patterns, dead code, missing validation, tangled dependencies, architectural concerns.

The natural reaction is overwhelm. When everything is flagged, nothing feels urgent. Teams that receive a wall of findings often do one of two things: they try to fix everything at once and burn out, or they look at the volume, feel defeated, and fix nothing. Both responses waste the value of the review.

The solution is triage. Not every finding is equally important, equally urgent, or equally fixable. A structured prioritisation framework lets you extract maximum value from a review by focusing effort where it matters most.

The three dimensions of prioritisation

Every code review finding can be assessed along three independent dimensions: severity, likelihood, and fixability. The combination of these three dimensions determines where a finding should sit in your remediation plan.

Severity measures the worst-case impact if the issue causes a problem. A SQL injection vulnerability has high severity because exploitation could expose your entire database. An inconsistent variable naming convention has low severity because the worst case is developer confusion, not data loss.

Likelihood measures how probable it is that the issue will actually cause a problem. A race condition in a function that is called once during application startup has low likelihood. The same race condition in a function that handles concurrent user requests has high likelihood.

Fixability measures how easy or hard the fix is. Some findings can be resolved in an hour with a straightforward code change. Others require refactoring a core abstraction that touches dozens of files. Fixability matters because it determines the return on investment: a moderate-severity finding that takes thirty minutes to fix may be more worthwhile than a high-severity finding that requires two weeks of refactoring.

The four-bucket system

Using the three dimensions, you can sort findings into four buckets. Each bucket has a different action plan.

Bucket 1: Fix now. High severity, high likelihood, any fixability. These are findings where the potential impact is severe and the problem is likely to manifest. A SQL injection vulnerability on an active endpoint. An authentication bypass in a publicly accessible controller. An unhandled exception that crashes the service under normal load. Regardless of how hard the fix is, these findings go to the top of the queue because the risk of inaction is too high.

Bucket 2: Quick wins. Low to moderate severity, any likelihood, high fixability. These are findings that are easy to fix and reduce overall noise in the codebase even if they are not critical individually. Removing dead code. Adding missing input validation on a non-critical endpoint. Fixing an inconsistent error message. Each fix takes minutes to hours and makes the codebase incrementally better. Batch these into a cleanup sprint or distribute them across developers as warm-up tasks.

Bucket 3: Schedule for later. Moderate to high severity, any likelihood, low fixability. These are real problems, but fixing them is expensive. Refactoring a tightly coupled module. Replacing an insecure-by-default library. Migrating from a deprecated API. These belong on the roadmap as planned work items with clear scoping. The key is to schedule them, not to deprioritise them indefinitely. Findings that stay in the backlog forever are the same as findings that were never made.

Bucket 4: Accept as known risk. Low severity, low likelihood, low fixability. Some findings describe real imperfections that are not worth fixing given the cost. A minor inconsistency in a module that is scheduled for replacement next quarter. A theoretical edge case in a function that processes ten requests a day. Document these as known, accepted risks. This is not the same as ignoring them. It is making an explicit, recorded decision that the cost of fixing exceeds the expected cost of the issue.

How to assess severity

Severity assessment requires thinking about the worst plausible outcome, not the most likely outcome. A finding is high severity if the worst case involves any of the following: data loss or corruption, unauthorised access to sensitive data, service outage affecting users, financial loss, or regulatory non-compliance.

Medium severity findings include: degraded performance under load, incorrect but non-critical functionality, poor error messages that hinder debugging, or inconsistent behaviour that confuses users without causing harm.

Low severity findings include: code style inconsistencies, dead code that does not affect functionality, naming that is unclear but not misleading, or minor inefficiencies that do not affect user experience.

The key is to be honest about severity and resist the temptation to inflate or deflate. Not everything is critical. Not everything is minor. The value of triage depends on accurate assessment.

How to assess likelihood

Likelihood depends on context. The same code pattern can be high likelihood in one situation and low likelihood in another. Consider these factors:

How often is the code executed? A vulnerability in a function that runs on every request is far more likely to be exploited than one in an admin-only migration script.

How exposed is the code? Code that handles external input (HTTP requests, file uploads, API payloads) has higher likelihood of triggering issues than code that processes internal, validated data.

How obvious is the issue to an attacker? A security vulnerability in a well-known framework pattern (like a default configuration that disables CSRF protection) is more likely to be exploited than a custom implementation with a subtle flaw, simply because attackers check for common patterns first.

What has the incident history been? If you have had production incidents traced to a specific category of issue (e.g., unhandled null values in a particular module), then similar findings in that module are high likelihood based on evidence.

How to assess fixability

Fixability is not just about the technical complexity of the change. It includes the blast radius, the testing burden, and the coordination required.

High fixability: the fix is localised to one file or module, requires no changes to interfaces or APIs, has clear test coverage, and can be reviewed and merged independently. These are changes a developer can make in one sitting.

Medium fixability: the fix touches multiple files, may require updating tests, and needs coordination with another developer or team. These are changes that fit into a sprint as a planned work item.

Low fixability: the fix requires changing a core abstraction, affects multiple consumers, may require database migration, involves cross-team coordination, or cannot be done without a feature flag or staged rollout. These are changes that need their own project plan.

Estimating fixability accurately is critical for practical prioritisation. A finding that looks moderate-severity but is highly fixable often delivers more value per hour of engineering time than a critical finding that requires months of coordinated refactoring.

Building the prioritised list

Once you have assessed every finding across the three dimensions, the prioritised list builds itself. Start with Bucket 1 – the must-fix items. Then list the Bucket 2 quick wins. Then scope and schedule the Bucket 3 items. Then document the Bucket 4 accepted risks.

Within each bucket, order findings by the expected value of fixing them. Expected value here means severity multiplied by likelihood, divided by the effort required. A moderate severity, high likelihood, easy-to-fix finding has higher expected value per engineering hour than a high severity, low likelihood, hard-to-fix finding.

Present this as a concrete plan, not a list of problems. Each item should include what the finding is, why it matters, what the fix involves, and an estimated effort. This transforms a wall of findings into a roadmap that engineering leadership can actually act on.

Common mistakes in prioritisation

Treating all findings as equal. A flat list of findings with no prioritisation is barely better than no findings at all. If everything is important, nothing is important. Triage is what turns data into decisions.

Prioritising by category instead of risk. Teams sometimes fix all security findings before looking at anything else. But a low-likelihood security finding in an unused admin endpoint is less urgent than a high-likelihood data integrity issue in the main transaction flow. Prioritise by risk, not by category.

Ignoring quick wins. Easy fixes have low individual impact but high cumulative value. Fixing twenty small issues in a day reduces codebase noise, improves developer confidence, and creates momentum. Do not skip easy wins because they are not individually impressive.

Never revisiting accepted risks. A finding that you accepted as a known risk six months ago may have changed in severity or likelihood. The module that used to get ten requests a day now gets ten thousand. The library that had a theoretical vulnerability now has a published exploit. Review your accepted risks periodically.

Letting the backlog grow indefinitely. If your Bucket 3 items never get scheduled, you do not have a prioritisation system. You have a list. Scheduled means assigned to a sprint or quarter with a named owner. Anything else is aspirational.

From findings to action

A code review that produces findings without prioritisation is an audit report that gathers dust. A code review with structured triage is a remediation plan that gets executed.

The framework is simple: assess severity, likelihood, and fixability. Sort into four buckets. Fix the critical items now, batch the quick wins, schedule the larger items, and document the accepted risks. Then repeat on a regular cadence as the codebase evolves.

The goal is not to fix everything. The goal is to fix the right things in the right order. That is the difference between a team that is overwhelmed by findings and a team that is systematically improving its codebase.

Limits and tradeoffs

It can miss context. Treat findings as prompts for investigation, not verdicts.
False positives happen. Plan a quick triage pass before you schedule work.
Privacy depends on your model setup. If you use a cloud model, relevant code is sent to that provider; local models can keep inference on your own hardware.