How to Do a Codebase Health Check

Your codebase is not healthy or unhealthy in general. It is healthy or unhealthy in specific, measurable ways. Here is how to find out which.

A physical metaphor for “How to Do a Codebase Health Check”: a set of simple geometric blocks arranged to show tradeoffs, with a diagram card (boxes/arrows only)

Most teams have an intuition about the state of their codebase. Developers say things like “the billing module is a mess” or “the API layer is solid.” These intuitions are often correct in direction but unreliable in magnitude. A codebase health check replaces gut feelings with structured evidence.

The goal is not to produce a single score or a pass/fail verdict. It is to build a detailed picture of where the codebase is strong, where it is weak, and where the risks are concentrated. That picture looks different depending on who needs to act on it – a developer needs file-level detail, a manager needs category-level summaries, and a CTO needs trend lines and risk exposure.

Here is the process, step by step.


Step 1: Audit dependency age and health

Dependencies are the foundation your application stands on, and foundations crack silently. Start by cataloguing every dependency and its current version relative to the latest available release.

The version gap matters, but it is not the only signal. A package that is two major versions behind but actively maintained is in a different category from one that is only one minor version behind but has not had a commit in three years. Check for maintenance status: when was the last release? Are issues being triaged? Has the maintainer signalled abandonment or a successor project?

Flag dependencies with known security vulnerabilities separately. These are not technical debt – they are active risk. Tools like npm audit, pip-audit, or bundle-audit provide the baseline, but manual review catches what automated tools miss: transitive dependencies with vulnerabilities, abandoned packages that will never receive patches, and forks that have diverged from their upstream.

The deliverable from this step is a dependency inventory: a table of every package, its current and latest version, its maintenance status, and any known vulnerabilities. Colour-code by urgency: red for security issues, amber for abandoned or significantly outdated packages, green for healthy dependencies.


Step 2: Measure test coverage honestly

Test coverage percentages are the most commonly cited and most commonly misunderstood code quality metric. A codebase with 80% line coverage can still have critical untested paths if the coverage is concentrated in utility functions while business logic goes unchecked.

Measure coverage at the module level, not just the aggregate. Identify which modules have high coverage and which have none. Then cross-reference that with module importance: a utility for formatting dates can afford lower coverage than the payment processing pipeline.

Beyond line coverage, assess test quality. Are the tests actually asserting meaningful behaviour, or are they exercising code paths without checking results? Do the tests run reliably, or are there flaky tests that the team has learned to ignore? A test suite that fails intermittently teaches developers to distrust the suite, which is worse than having no tests at all.

Record the coverage percentage per module, the number of flaky tests, and the time since the test suite last ran green in CI without retries. These three numbers together paint an honest picture of test health.


Step 3: Identify dead code

Dead code is code that exists in the repository but is never executed. It includes unused functions, unreachable branches, commented-out blocks, feature flags that were never cleaned up, and entire modules that were replaced but never removed.

Dead code is more than a cosmetic problem. It increases cognitive load for every developer reading the codebase. It creates false positives in search results. It can harbour security vulnerabilities that never get patched because nobody realises the code is still present. And it inflates metrics, making the codebase appear larger and more complex than it functionally is.

Estimating the dead code ratio requires a combination of static analysis (which functions are never called?) and runtime analysis (which code paths are never hit in production?). Neither approach is perfect on its own. Static analysis misses dynamically invoked code; runtime analysis misses rarely-used but legitimate code paths.

A rough dead code ratio – even if it is only an estimate – is a valuable health indicator. Codebases where 20% or more of the code is dead have a maintenance problem that compounds over time.


Step 4: Evaluate error handling consistency

Error handling is the dimension of code quality that matters most in production and gets the least attention during development. The health check should assess whether the codebase has a consistent error handling strategy or whether each module improvises its own.

Look for these common inconsistencies: some modules throw exceptions while others return error codes. Some errors are logged and re-thrown, others are logged and swallowed, and others are not logged at all. Some API endpoints return structured error responses while others return bare HTTP status codes. Some database operations have retry logic and others do not.

The question is not whether every module handles errors identically – different layers often warrant different strategies. The question is whether the strategy is deliberate and documented, or whether it evolved accidentally through copy-paste and individual developer preference.

Catalogue the error handling patterns in use across the codebase. Note where the inconsistencies are, and flag the ones that represent production risk: swallowed exceptions in critical paths, missing error responses on user-facing endpoints, and absent retry logic for operations that are known to fail transiently.


Step 5: Check documentation staleness

Documentation that is wrong is worse than documentation that does not exist. Wrong documentation actively misleads developers, sending them down paths that no longer reflect the system's actual behaviour.

Assess documentation at three levels. First, inline comments: are they accurate, or do they describe what the code used to do rather than what it does now? Second, README files and setup guides: can a new developer follow them and get a working development environment, or do they require tribal knowledge to supplement the written steps? Third, architectural documentation: does it reflect the current system topology, or does it describe the architecture from two years ago?

A simple heuristic is to compare the last modification date of documentation files against the last modification date of the code they describe. If the code was substantially changed six months ago and the documentation was last updated eighteen months ago, the documentation is likely stale.

Record the staleness gap for each major documentation artefact. Flag any documentation that contradicts the current code.


Step 6: Review security patterns

Security is not a single checklist item. It is a set of patterns that should be applied consistently across the codebase. The health check should assess whether security patterns are present, correct, and consistently applied.

Key patterns to examine include: input validation on every user-facing endpoint, parameterised queries for all database access, authentication checks on every protected route, authorisation checks that go beyond authentication, secrets stored in environment variables rather than code, HTTPS enforced for all external communication, and appropriate logging that captures enough for incident investigation without leaking sensitive data.

The most dangerous finding is not a missing security control – it is an inconsistently applied one. If 90% of your endpoints validate input but 10% do not, the 10% become the attack surface. Attackers do not need to breach your strongest defences; they only need to find the one endpoint you forgot.


Structuring the output for different audiences

A health check that sits in a spreadsheet and is never acted on is wasted effort. The output needs to be structured for the people who will make decisions based on it.

For developers, provide file-level and module-level detail. Which specific files have zero test coverage? Which functions contain the inconsistent error handling? Where exactly are the dead code blocks? Developers need actionable specifics so they can create tickets and write fixes.

For engineering managers, provide category-level summaries. What percentage of modules have adequate test coverage? How many dependencies have known vulnerabilities? How many distinct error handling patterns exist? Managers need to allocate sprint capacity and prioritise across competing concerns.

For CTOs and executives, provide risk exposure and trend data. What is the overall security posture? What is the estimated cost of accumulated debt? How does the current state compare to the last assessment? Executives need to make investment decisions and communicate risk to the board.

The same underlying data serves all three audiences. The difference is the level of aggregation and the framing: specifics for developers, summaries for managers, narratives for executives.


Automating the health check

Running a codebase health check manually is valuable but expensive. It takes days of senior developer time, and the results start going stale the moment the check is complete.

The better approach is to automate the baseline and reserve human judgement for interpretation. Automated tools can measure dependency age, test coverage, dead code ratios, and security pattern consistency. Humans provide the context that turns measurements into decisions: which findings matter most given the team's roadmap, which risks are acceptable given the product stage, and which improvements will yield the highest return.

VibeRails automates the health check process by scanning your entire codebase using AI analysis and producing structured reports at all three levels – developer detail, manager summaries, and executive risk narratives. Because VibeRails runs locally using your own AI subscription, your code doesn't go through VibeRails servers – requests go directly to your AI provider. And because it analyses the full codebase rather than individual PRs, it catches the systemic issues that incremental review misses.

A codebase health check is not a one-time event. It is a recurring practice. Run it quarterly, track the trends, and use the results to guide where your team invests its limited time. The first health check tells you where you stand. The second one tells you whether you are getting better or worse.


Limits and tradeoffs

  • It can miss context. Treat findings as prompts for investigation, not verdicts.
  • False positives happen. Plan a quick triage pass before you schedule work.
  • Privacy depends on your model setup. If you use a cloud model, relevant code is sent to that provider; local models can keep inference on your own hardware.