Technical July 29, 2024

Why Your Linter Misses the Hard Bugs

Your CI pipeline is green. ESLint reports zero warnings. SonarQube shows no critical issues. And your application still has bugs that cost you hours of debugging, customer trust, and production stability. Here is why.

Terminal showing all linter checks passing alongside a production error log with a logic bug

Linters are among the most useful tools in a developer's workflow. They catch unused variables, enforce consistent formatting, flag deprecated API usage, and prevent entire categories of common mistakes before code reaches a reviewer. If you are not using a linter, you should be.

But there is a persistent and dangerous assumption in many engineering teams: that if the linter passes, the code is correct. That a green CI pipeline means the code has been reviewed. That automated checks are a substitute for understanding what the code actually does.

They are not. Linters operate on syntax and structure. The hard bugs – the ones that cause production incidents, data corruption, and subtle misbehaviour that takes weeks to trace – live in a domain that linters cannot reach.

What linters are good at

Before discussing what linters miss, it is worth being precise about what they do well. Linters excel at enforcing rules that can be expressed as pattern matches against an abstract syntax tree. Unused imports. Inconsistent indentation. Variables declared but never referenced. Missing return statements in certain code paths. Deprecated method calls.

These are valuable catches. A linter that flags an unused variable might be pointing at a refactoring that was half-completed. A linter that enforces consistent naming conventions reduces cognitive load for everyone reading the code. A linter that catches a missing await in an async function prevents a specific category of runtime error.

The common thread is that these rules are syntactic. They concern the form of the code, not its meaning. A linter can tell you that a function returns inconsistently. It cannot tell you whether the function returns the right thing.

Logic errors: the bugs that look correct

A logic error is code that is syntactically valid, structurally sound, and functionally wrong. The linter cannot see it because there is nothing wrong with the code's form. The problem is in its intent.

Consider a pricing calculation that applies a discount after tax instead of before tax. The code compiles. The tests pass (because the tests were written to match the implementation, not the specification). The linter reports no issues. But every invoice is slightly wrong, and the error compounds across thousands of transactions over months before anyone notices.

Or consider a sorting function that uses the wrong comparator. It sorts data, just not in the order the rest of the system expects. Downstream components consume the data, produce subtly incorrect results, and nobody connects the symptom to the cause because the sort function itself “works.”

Logic errors are the most expensive category of bug because they are the hardest to detect and the hardest to trace. They do not throw exceptions. They do not crash the application. They simply produce the wrong answer, quietly, consistently, and at scale.

No linter rule can catch these because no linter rule can encode the business requirement that this particular calculation must apply the discount before tax. That understanding requires context, domain knowledge, and the ability to reason about what the code should do, not just what it does do.

Race conditions and timing bugs

Race conditions are among the most insidious bugs in software. They occur when the correctness of a programme depends on the relative timing of operations that are not guaranteed to execute in a particular order. They are intermittent, difficult to reproduce, and nearly impossible to detect through static analysis.

A linter can identify that you have not used await on a promise. It cannot identify that two asynchronous operations modify the same data structure and that the result depends on which one completes first. It cannot tell you that your database read and subsequent write are not atomic, and that another request might modify the record between them. It cannot see that your cache invalidation fires before the write it depends on has committed.

These bugs manifest as intermittent failures. A test suite that passes 99 times out of 100. A user report that cannot be reproduced in development. A production incident that resolves itself when the system is restarted and the timing changes.

Detecting race conditions requires understanding the concurrency model of the system, the ordering guarantees (or lack thereof) of the runtime, and the data flows between components. This is fundamentally a reasoning task, not a pattern-matching one.

Incorrect assumptions

Every piece of code embodies assumptions. A function that parses user input assumes a particular format. A database query assumes that a certain column is indexed. An API client assumes that the upstream service returns errors in a specific shape. A caching layer assumes that the underlying data changes infrequently.

When these assumptions are correct, the code works. When they are wrong – or when they become wrong as the system evolves – the code fails in ways that are difficult to diagnose because the assumption is implicit. It is not written down. It is not tested. It is baked into the logic as something the original author took for granted.

Linters cannot surface incorrect assumptions because assumptions are semantic, not syntactic. A linter sees a function call. It does not know whether the function is being called with parameters that the callee does not actually support, or whether the return value is being interpreted in a way that conflicts with the callee's contract.

This is why codebases accumulate assumption debt over time. Each new feature adds assumptions about how existing code behaves. If nobody reviews those assumptions holistically, they drift apart until something breaks. And by the time it breaks, the distance between the symptom and the cause can span dozens of files and multiple architectural layers.

Architectural inconsistencies

Large codebases almost always contain multiple approaches to the same problem. Two different authentication mechanisms. Three ways to handle errors. Four patterns for database access. These inconsistencies do not violate any linter rule. Each individual implementation may be perfectly well-formed. But in aggregate, they create a system that is harder to understand, harder to maintain, and more likely to harbour bugs.

A new developer joins the team and finds two different patterns for making HTTP requests. They pick one. It happens to be the older one, which does not include the retry logic that the newer pattern adds. Their feature works in development, where network reliability is not an issue. In production, it fails intermittently because the upstream service occasionally returns transient errors that the older pattern does not handle.

No linter can detect that a codebase has two incompatible approaches to the same problem. No linter can flag that one module handles errors with result types while another uses thrown exceptions, and that a third module assumes one pattern when it should assume the other. These are systemic issues that only become visible when you examine the codebase as a whole.

The false confidence problem

The real danger of linters is not what they miss. It is the confidence they create. When a team sees a green pipeline, there is a psychological tendency to interpret that as validation. The code has been checked. The tools approve. It must be fine.

This creates a review culture where automated tools are treated as a substitute for human judgement rather than a complement to it. PR reviews become cursory because the linter already passed. Full codebase reviews are not conducted because the static analysis dashboard looks clean. Technical debt accumulates unnoticed because the metrics that are being tracked do not measure the things that matter most.

The teams that suffer least from hard bugs are not the ones with the most sophisticated linter configurations. They are the ones that understand the boundary between what automated tools can verify and what requires human reasoning – or AI-assisted reasoning that can understand context, intent, and cross-cutting concerns.

Bridging the gap

The answer is not to abandon linters. They are excellent at what they do, and every codebase should use them. The answer is to stop treating linting as code review. They are different activities with different scopes.

Linting verifies form. Code review verifies meaning. Linting catches the bugs that can be expressed as rules. Code review catches the bugs that can only be found through reasoning about what the code does, what it should do, and how it interacts with the rest of the system.

The hard bugs – the logic errors, the race conditions, the incorrect assumptions, the architectural inconsistencies – require something that understands the codebase holistically. That can be a senior engineer with deep context. It can also be an AI system that reads the entire codebase and reasons about cross-cutting concerns. What it cannot be is a set of pattern-matching rules, no matter how comprehensive.

Your linter is not broken. It is doing exactly what it was designed to do. The problem is expecting it to do something it was never designed for. The hard bugs are out there. They are just in the places your linter is not looking.

Limits and tradeoffs

It can miss context. Treat findings as prompts for investigation, not verdicts.
False positives happen. Plan a quick triage pass before you schedule work.
Privacy depends on your model setup. If you use a cloud model, relevant code is sent to that provider; local models can keep inference on your own hardware.