Local AI Code Review: The Complete Guide

Open models are now good enough to review code locally – which means organisations that cannot send source code to external APIs finally have access to AI code analysis.

A workstation with a GPU tower running a local AI model, terminal output visible on a monitor showing code review findings

For years, AI code review meant one thing: send your source code to an API endpoint in someone else's data centre. The AI model runs on the provider's infrastructure, analyses your code, and returns findings over the network. For many teams, this works fine. For a significant number of organisations, it is not an option at all.

Defence contractors operating under ITAR cannot allow controlled technical data to leave authorised facilities. Government agencies processing CUI (Controlled Unclassified Information) face strict data handling requirements that most cloud AI providers cannot satisfy. Financial institutions with data loss prevention (DLP) policies prohibit source code from leaving the corporate network. Healthcare organisations handling PHI (Protected Health Information) need to control exactly where that data is processed.

For these organisations, the question was never whether AI code review would be useful. It was whether it could be done without exfiltrating source code. In February 2026, the answer is definitively yes. Open models have reached the point where local AI code review is not a compromise – it is a viable, production-quality approach to automated code analysis.


The problem: your code can't leave the building

The constraint is real and it is not going away. Export controls (including ITAR) can apply to certain defense-related technical data, and in some programs source code may be treated as controlled technical data. Sending that code to an AI API can create export-control risk if unauthorized persons might access it during processing. Government agencies face similar constraints under NIST 800-171 and emerging CMMC requirements for CUI handling. Financial institutions enforce DLP at the network perimeter, blocking bulk source code transmission to external endpoints. Healthcare organizations must ensure that any system processing PHI meets HIPAA's technical safeguard requirements.

These are not theoretical concerns. ITAR violations carry criminal penalties. HIPAA breaches trigger mandatory notification and potential fines. DLP violations in financial services can result in regulatory action. The organisations subject to these constraints have security teams and compliance officers who will reject any tool that sends source code to a third-party API – and they are right to do so.

Until recently, this meant these organisations were locked out of AI code review entirely. Static analysers and linters could run locally, but they operate on rules, not understanding. They catch syntax violations and known anti-patterns. They do not catch architectural problems, security logic errors, or the kind of subtle bugs that emerge from understanding what the code is trying to do. That requires a language model. And language models required cloud APIs.


Open models are now good enough

Open-weight coding models have improved quickly. For some organizations, they are now "good enough" for the real goal of code review: identifying concrete issues and giving you a starting point for triage.

If you want a quantitative signal, use a public benchmark like SWE-bench Verified. But treat any single score as directional: results vary with evaluation scaffolds, context length, tool constraints, and how you chunk work across files.

The practical approach is to pick a model that fits your hardware, run VibeRails against a representative subset of your codebase, and compare the findings against what your team considers valuable. For legacy audits, consistency and long-context behavior often matter as much as raw benchmark performance.


How VibeRails makes this work

VibeRails is a desktop Electron application that orchestrates Claude Code CLI for AI-powered code analysis. The key architectural detail is that VibeRails does not call the AI model directly. It invokes Claude Code CLI, which handles the model communication. This matters because Claude Code CLI supports the ANTHROPIC_BASE_URL environment variable, allowing API calls to be redirected to an Anthropic-compatible endpoint – including a local model server.

The simplest setup is to point the CLI at your local server:

export ANTHROPIC_BASE_URL="http://localhost:11434"
export ANTHROPIC_API_KEY="ollama"
export ANTHROPIC_AUTH_TOKEN="ollama"

With these set, every API call that Claude Code CLI makes goes to localhost:11434 instead of Anthropic's cloud API. In a correctly configured environment (local model endpoint, restricted egress), source code can be kept within your controlled boundary. The model runs on your GPU. The pipeline – from reading the codebase to generating structured findings – can run inside your local network.

A critical detail: VibeRails uses Claude Code CLI's stream-json output format to receive structured events from the CLI. As long as you're using the same CLI, the output contract VibeRails expects stays consistent even if the CLI is routing requests to a different endpoint.

Model-name compatibility note. VibeRails requests a model tier via Claude Code (for example --model opus). Depending on your Claude Code version, the CLI may translate that into a specific model identifier in the outbound request. Your local Anthropic-compatible server must either accept that identifier, or map it to the local model you want to run. If your local server is strict about model names, use a compatibility proxy (for example, LiteLLM) to map incoming model IDs to your local model.


Desktop setup with Ollama

The fastest path to local AI code review is Ollama on your workstation. The full step-by-step setup is documented in our Local AI Code Review technical guide. Here is the abbreviated version.

Install Ollama and pull a coding model:

curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen3-coder

Model names and packaging change frequently across local model servers. Treat the command above as an example, and verify the exact model name and size in your chosen runtime (Ollama, vLLM, llama.cpp) before standardising on it.

Configure the environment:

export ANTHROPIC_BASE_URL="http://localhost:11434"
export ANTHROPIC_API_KEY="ollama"
export ANTHROPIC_AUTH_TOKEN="ollama"

Launch VibeRails and run a review. From VibeRails's perspective, nothing changes. It invokes Claude Code CLI the same way it always does. The CLI routes the request to your local Ollama server instead of Anthropic's API. The review runs, findings are generated, and VibeRails displays them in the same interface you would see with a cloud model.

If you want better cross-file reasoning, consider a larger coding model (or a mixture-of-experts model) that still fits on your hardware at an acceptable quantization level. See our hardware guide for a practical hardware decision matrix.


Cloud GPU on-demand

Not every organisation that needs air-gapped code review can justify purchasing $10,000 or more in GPU hardware. The alternative is on-demand cloud GPU rental within a controlled environment. The key is that the cloud environment must be configured to prevent source code from reaching the public internet – creating a virtual air gap.

AWS offers a straightforward path. Start with a GPU instance in a private VPC and run a local model server (Ollama, vLLM, llama.cpp, or a compatibility proxy) on that instance. Pricing varies by region and changes over time, so treat any specific $/hr figures you see online as estimates and validate against current provider pricing before standardizing.

The setup creates a private VPC with no internet gateway. Model weights are loaded from S3 via a VPC endpoint – the data never traverses the public internet. Your source code is uploaded to the instance via a secure channel (SSM session or VPN), the review runs against the local model, and the findings are extracted. The instance can be terminated after the review completes.

The cost per review depends on codebase size, model choice, and how you batch work. For most teams doing periodic audits, the important operational point is simple: you can spin up a GPU instance, run the review, extract the report, and terminate the instance so there is no idle infrastructure cost between runs.

For detailed instructions on setting up a cloud GPU environment for code review, including VPC configuration and model deployment, see our cloud GPU on-demand guide.


What to expect from your first local review

Setting expectations honestly is important. Local AI code review involves real tradeoffs, and understanding them upfront prevents frustration.

Speed. Local inference is typically slower than cloud API calls, especially on consumer hardware. For whole-codebase audits, this is rarely a deal-breaker: local review is usually an overnight, unattended workflow rather than an interactive one.

But the intended use case is not interactive. VibeRails is designed for batch review of entire codebases – the kind of review you start at 6pm and come back to at 9am. A review that runs overnight and produces 47 structured findings with severity ratings, code references, and remediation suggestions is enormously valuable, regardless of whether it took 45 minutes or 4 hours to generate. The speed tradeoff is irrelevant when the alternative is not reviewing the codebase at all.

Quality. Quality depends directly on model choice and on how you chunk your codebase. Larger models and longer context windows tend to improve cross-module reasoning. Smaller models can still surface concrete bugs, obvious security issues, and code quality problems, especially in a legacy codebase that has never had a systematic audit.

Context window. Context window affects how much code the model can analyze in a single pass. VibeRails and Claude Code CLI handle this through chunking, but findings that require cross-file understanding can be less reliable when the model can't see enough related code at once. For critical cross-module analysis, choose a model with a large context window and run targeted reviews on related file groups.

Consistency. The same model and settings will usually produce similar findings run-to-run, but you should expect some variability in natural-language explanations and in which borderline issues are surfaced. For regulated workflows, treat AI findings as inputs to a documented human triage process.


The overnight review workflow

The most effective workflow for local AI code review treats it as an overnight batch process. The setup takes 10 minutes. The review runs unattended. The findings are ready in the morning.

Step 1: Configure your environment. Set the environment variables to point Claude Code CLI at your local model server. This is a one-time setup that persists across sessions.

Step 2: Open your project in VibeRails. Point VibeRails at the codebase you want reviewed. Select the review scope – full codebase, specific directories, or files matching a pattern.

Step 3: Start the review and walk away. VibeRails invokes Claude Code CLI, which begins feeding code to the local model. The progress bar updates as each file or file group is analyzed. Runtime depends on your hardware, model choice, and codebase size. Treat the first run as a calibration: measure how long it takes, then decide whether to adjust batching, model size, or scope.

Step 4: Review findings in the morning. VibeRails presents structured findings categorised by severity (critical, high, medium, low) and type (security, performance, architecture, correctness, maintainability). Each finding includes the affected file and line range, a description of the issue, and a suggested remediation. You can triage findings, mark them as accepted or deferred, and export the results as a report.

This workflow is particularly valuable for legacy codebases that have never had a systematic review. Running a first-pass AI review over a 200,000-line legacy codebase that no human has time to read end-to-end is exactly the scenario where local AI code review delivers the highest return. The review surfaces issues that have accumulated over years – security vulnerabilities introduced when best practices were different, architectural decisions that no longer make sense, error handling gaps that have never been tested.


Getting started

If your organisation has been unable to adopt AI code review because source code cannot leave your network, local AI models have eliminated that barrier. The models are capable. The tooling is mature. The setup is straightforward.

VibeRails can work with local model backends by using Claude Code CLI and configuring where the CLI sends requests. Depending on your local endpoint, you may need to ensure model identifiers match (or use a compatibility proxy). The goal is to keep inference inside your boundary without changing the core VibeRails workflow.

For the full technical walkthrough – including GPU requirements, model comparisons, and step-by-step configuration – see our Local AI Code Review technical guide. For hardware purchasing recommendations, see the hardware guide. To start reviewing your codebase tonight, download VibeRails and follow the Ollama setup above.