Guide September 8, 2025

How to Audit an Acquired Codebase

You bought the company. Now you need to find out what's actually inside the code you just inherited – before integration costs blindside you.

Acquisition due diligence desk with integration paperwork, two architecture maps, and a single merged plan

When a company acquires another company, the deal usually focuses on revenue, customers, team, and product. The codebase is mentioned in passing, if at all. It's treated as an asset that comes along for the ride – something the engineering team will sort out during integration.

This is a mistake. The codebase is often the most expensive thing you inherit, not because of what it cost to build, but because of what it will cost to maintain, integrate, and extend. And unlike revenue or headcount, nobody puts a number on it until it's too late.

A proper code audit during the acquisition process can surface these costs before they become surprises. Here is what to look for and when to do it.

Why M&A code audits matter

The fundamental problem with acquiring a codebase is information asymmetry. The selling team knows every shortcut, every workaround, every system that barely works. The acquiring team sees a running product and assumes the internals match the exterior.

They rarely do. Startups optimise for shipping speed, not maintainability. Enterprise teams accumulate layers of abandoned migrations. Outsourced codebases often have inconsistent patterns across different vendor teams. None of this is visible from a product demo or a due diligence spreadsheet.

The consequences show up months after close. Integration takes three times longer than projected. Key features turn out to be held together by brittle workarounds. The team you acquired spends their first quarter explaining why things are the way they are instead of building new things.

A code audit won't prevent all of this. But it gives you a realistic picture of what you're inheriting, which means you can price it into the deal or plan for it in the integration timeline.

What to look for

A thorough code audit during acquisition should cover five areas. Each tells you something different about the health and cost of the codebase.

Dependencies and supply chain. How many third-party packages does the project rely on? How many are actively maintained? Are there known vulnerabilities in the dependency tree? Outdated or abandoned dependencies are one of the most common sources of post-acquisition remediation work. They're also one of the easiest things to check.

Security posture. Look at authentication mechanisms, data handling patterns, secrets management, and API security. Are credentials hardcoded? Is there a consistent approach to input validation? Security issues discovered after acquisition are expensive – both financially and reputationally.

Architecture and modularity. Is the system structured in a way that allows parts to be modified independently? Or is everything tightly coupled, meaning any change risks cascading effects? Architecture determines integration cost more than any other single factor.

Test coverage and quality signals. What percentage of the code has automated tests? More importantly, what kind of tests? A project with high coverage but only unit tests on utility functions has a different risk profile from one with integration tests covering critical paths. Low test coverage means every change you make during integration is a gamble.

Code consistency and documentation. Does the codebase follow consistent patterns, or does every module look like it was written by a different team? Is there any documentation beyond auto-generated API docs? Inconsistency is a signal that the team either grew quickly, had high turnover, or both. It tells you something about the cost of onboarding your own engineers onto this code.

Timing: do it during diligence, not after close

The most common mistake is treating the code audit as a post-acquisition activity. Teams close the deal, then ask their engineers to spend a few weeks understanding the new codebase. By then, the findings are academic – you already own it.

A code audit done during due diligence serves a fundamentally different purpose. It informs the deal. If the audit reveals significant technical debt, you can negotiate the price, extend the integration timeline, or require remediation as a condition of closing.

The objection is usually access: the selling team doesn't want to give full code access before the deal is signed. This is reasonable. But there are ways to work within that constraint – limited-scope audits, read-only access in a clean room environment, or automated scanning that doesn't require manual code browsing.

How AI makes this practical

Historically, a thorough code audit required a team of senior engineers spending weeks reading code. For a meaningful codebase, that meant tens of thousands of dollars in consultant fees or pulling your best people off their current work.

AI changes the economics dramatically. A frontier model can read every file in a project, analyse the architecture, identify inconsistencies, and produce structured findings in hours rather than weeks. It doesn't replace the judgement calls – you still need humans to decide what matters and what to do about it – but it eliminates the most time-consuming part of the process: the initial read-through.

This means code audits become practical at deal speeds. You don't need to choose between a thorough audit and a fast close. You can have both. The output is a structured report that your engineering leadership can review, share with the deal team, and use to inform integration planning.

If you're acquiring a company and haven't looked at the code, you're negotiating with incomplete information. The question isn't whether the code has problems – every codebase does. The question is whether you know what those problems are before you sign.

Limits and tradeoffs

It can miss context. Treat findings as prompts for investigation, not verdicts.
False positives happen. Plan a quick triage pass before you schedule work.
Privacy depends on your model setup. If you use a cloud model, relevant code is sent to that provider; local models can keep inference on your own hardware.