Analysis June 2, 2025

AI Hallucinations in Code Review: Risks and Mitigations

AI code review tools can invent vulnerabilities that do not exist, reference dependencies your project does not use, and confidently describe bugs that are not there. Here is how to handle that.

Review desk showing an AI findings report with some items crossed out and a verification checklist beside blurred code

AI hallucination is a well-documented phenomenon in large language models. The model generates text that is fluent, confident, and wrong. In conversational contexts, this produces incorrect facts. In code review contexts, it produces something more dangerous: actionable-sounding findings that lead teams to fix problems that do not exist, while the real problems go unaddressed.

Understanding what hallucinations look like in code review, why they happen, and how to mitigate them is essential for any team that uses AI-assisted analysis. The answer is not to avoid AI code review. It is to build workflows that catch the errors before they waste engineering time.

What hallucinations look like in code review

AI hallucinations in code review take several distinct forms, and each one carries different risks.

False positive findings. The model flags code as problematic when it is actually correct. For example, it might report a SQL injection vulnerability in a query that already uses parameterised statements. Or it might flag a race condition in code that is single-threaded. The finding reads convincingly – it describes the vulnerability category accurately, explains the risk clearly, and even suggests a fix. But the underlying premise is wrong: the vulnerability does not exist in this code.

Invented vulnerabilities. The model describes a security issue that has no basis in the code at all. It might reference a function that does not exist, describe an interaction between modules that never communicate, or claim that user input reaches a database query through a code path that is not present. This is different from a false positive. A false positive misinterprets real code. An invented vulnerability fabricates a scenario entirely.

Phantom dependencies. The model references libraries, frameworks, or APIs that the project does not use. It might warn about a known vulnerability in a package that is not in the dependency tree, or suggest migrating away from a framework that was never adopted. This happens when the model pattern-matches from its training data rather than grounding its analysis in the actual project.

Misattributed severity. The model correctly identifies a real issue but assigns it the wrong severity. A low-risk code style inconsistency gets flagged as a critical security vulnerability. A genuine but minor performance concern gets described as a production-blocking bottleneck. The finding is directionally correct, but the severity is hallucinated.

Why hallucinations happen in code review

Hallucinations in code review are not random. They follow predictable patterns rooted in how large language models process information.

Context window limitations. A codebase is a connected system. A function in one file calls a function in another, which queries a database configured in a third. When the model cannot see the full chain – because the codebase exceeds its context window or because the analysis is scoped to individual files – it fills in the gaps with assumptions. Those assumptions come from patterns in its training data, not from the actual project. The result is findings that would be correct in a different codebase but are wrong in yours.

Pattern matching beyond evidence. Language models are fundamentally pattern-completion engines. When they see code that structurally resembles a known vulnerability pattern, they report the vulnerability even if the specific implementation has safeguards that break the pattern. A function that accepts user input and constructs a database query looks like SQL injection at a surface level, even if the input is validated, sanitised, and parameterised before it reaches the query.

Training data bias. Models have seen millions of code review comments, vulnerability reports, and security advisories during training. This creates a prior: certain code patterns are strongly associated with certain findings. When the model encounters ambiguous code, it defaults to the most common finding associated with that pattern, regardless of whether the specific context warrants it.

Confidence without calibration. Language models do not express uncertainty naturally. They generate text with the same fluency whether the underlying analysis is solid or speculative. A finding based on thorough analysis of visible code reads identically to a finding based on assumptions about code the model never saw. There is no built-in signal that distinguishes high-confidence findings from hallucinated ones.

The cost of unmitigated hallucinations

When hallucinated findings are not caught, they impose real costs on engineering teams.

The most obvious cost is wasted time. A developer spends an hour investigating a reported vulnerability, reading the code, tracing the data flow, and eventually concluding that the finding is wrong. Multiply that by the number of false findings per review, and the time adds up quickly.

The less obvious but more damaging cost is trust erosion. If developers encounter several false positives in their first experience with an AI code review tool, they learn to distrust the tool's output. They start dismissing findings without investigation. When this happens, the tool becomes worse than useless – it creates a false sense of security. The team believes findings are being reviewed, but in practice developers are clicking through them without reading.

Trust erosion is the real risk of hallucinations. The technical cost is hours. The organisational cost is the loss of a capability that could be providing genuine value.

Mitigation 1: Human-in-the-loop triage

The most effective mitigation is also the simplest: never treat AI findings as final. Every finding should pass through human triage before it becomes a ticket, a fix, or a conversation.

Human triage does not mean a developer must investigate every finding in depth. It means a knowledgeable person reviews the finding, assesses whether it is plausible given their understanding of the codebase, and either confirms it, dismisses it, or flags it for deeper investigation.

The triage step is fast when the findings are well-structured. A finding that includes the specific file, the relevant code snippet, the claimed vulnerability, and the suggested fix can be triaged in under a minute by a developer who knows that part of the codebase. A finding that is vague or lacks context takes longer and is more likely to be dismissed without consideration.

The key insight is that human triage is a filter, not a bottleneck. The AI generates findings at a volume and speed that no human could match. The human filters for accuracy. Together, they achieve something neither could alone: comprehensive coverage with reliable accuracy.

Mitigation 2: Severity-based filtering

Not all hallucinations are equally harmful. A false positive on a low-severity style issue wastes a few minutes of developer time. A false positive on a critical security vulnerability can trigger an emergency response, divert an entire team, and erode trust when the alarm turns out to be false.

Severity-based filtering means applying different levels of scrutiny to different severity tiers. Critical and high-severity findings get mandatory human review before any action is taken. Medium-severity findings are reviewed in batch during normal triage. Low-severity findings are available for reference but do not generate notifications or tickets automatically.

This approach concentrates human attention where hallucinations are most costly and lets lower-severity findings flow through with less friction. It also reduces the volume of findings that developers see on any given day, which helps prevent the alert fatigue that leads to blanket dismissal.

Mitigation 3: Dual-model verification

A more technical mitigation is to run the same analysis through two independent models and compare the results. If both models independently identify the same finding, the probability that it is a hallucination drops significantly. If only one model flags an issue, that finding is a candidate for closer scrutiny.

Dual-model verification is not a silver bullet. Two models trained on similar data can hallucinate in similar ways, producing correlated false positives. And running two models doubles the computational cost. But for high-stakes analyses – security audits, pre-acquisition due diligence, compliance reviews – the additional cost is justified by the increased reliability.

In practice, dual-model verification works best as a complement to human triage, not a replacement. The models provide an initial confidence signal. The human makes the final call.

Mitigation 4: Structured triage workflows

The workflow around AI findings matters as much as the findings themselves. A tool that dumps 200 findings into a flat list without structure makes triage painful and error-prone. A tool that organises findings by severity, category, and module – and lets reviewers confirm, dismiss, or escalate each one – makes triage fast and reliable.

VibeRails builds hallucination mitigation directly into its triage workflow. Findings are organised by severity and category. Each finding includes the specific code context, the claimed issue, and the suggested remediation. Reviewers can confirm findings as valid, dismiss them as false positives, or flag them for team discussion – and those decisions persist across reviews, building a feedback loop that improves the signal-to-noise ratio over time.

The triage interface is designed for the reality that some percentage of AI findings will be wrong. Rather than pretending the tool is infallible, it gives teams the structure to efficiently separate signal from noise.

Living with imperfect tools

AI code review tools hallucinate. So do human reviewers. A senior developer reviewing unfamiliar code will occasionally misread a pattern, flag a false positive, or miss context that changes the interpretation of a finding. The question is not whether the tool is perfect. The question is whether the tool, combined with a good workflow, produces better outcomes than the alternative.

For most teams, the alternative to AI code review is not expert human review of the entire codebase. The alternative is no review at all. The code that was written before the current team arrived has never been systematically reviewed. The modules that nobody owns do not get reviewed. The legacy systems that underpin the business are understood by fewer people each year.

An imperfect tool with a good triage workflow catches real issues that would otherwise go undetected indefinitely. The hallucinations are a cost. The findings that are real – the genuine vulnerabilities, the inconsistent patterns, the dead code, the missing tests – are the return on that cost. For most codebases, the return far exceeds the investment.