The OWASP Top 10 is the most widely referenced classification of web application security risks. It represents the consensus view of the security community on the most critical threats to web applications. Every developer has heard of it. Fewer have systematically looked for these vulnerabilities in their own codebases.
Security code review is the practice of reading source code specifically to identify security vulnerabilities. It differs from general code review in scope and focus: instead of evaluating design, readability, and correctness, security review evaluates whether the code can be exploited. The OWASP Top 10 provides the framework for knowing what to look for.
What follows is a walkthrough of each category, what it looks like at the code level, and why some categories are harder to detect automatically than others.
A01: Broken access control
Broken access control is the number one risk on the OWASP list, and it is also the hardest to detect with automated tools. Access control vulnerabilities occur when a user can perform actions or access data beyond their intended permissions.
In code, this manifests as missing authorisation checks. An API endpoint retrieves a resource by ID but does not verify that the requesting user has permission to access that specific resource. The endpoint works correctly for legitimate requests, and no static analysis rule will flag it, because the logic is not wrong – it is incomplete.
What to look for: API endpoints that accept resource identifiers (user IDs, document IDs, account numbers) without verifying ownership. Administrative functions that check authentication but not authorisation. Horizontal privilege escalation where changing an ID in the URL exposes another user's data. Vertical privilege escalation where a regular user can access admin-only functions by navigating directly to the endpoint.
This category is inherently context-dependent. The tool needs to understand what the application is supposed to allow, not just what the code does. That is why rule-based scanners consistently miss access control issues – they lack the semantic understanding to evaluate whether a given access pattern is intentional or a vulnerability.
A02: Cryptographic failures
Cryptographic failures cover everything from using weak algorithms to storing sensitive data without encryption. In code, look for hardcoded encryption keys, use of deprecated algorithms like MD5 or SHA-1 for security purposes, passwords stored in plain text or with reversible encryption, and sensitive data transmitted over unencrypted connections.
Static analysis can catch some of these: detecting MD5 usage or finding hardcoded strings that look like keys. But it struggles with the contextual cases. Using SHA-256 to hash a password is better than MD5 but still wrong – passwords should use bcrypt, scrypt, or Argon2. Detecting that distinction requires understanding what the hash is being used for, not just which function is being called.
A03: Injection
Injection vulnerabilities – SQL injection, command injection, LDAP injection, XSS – occur when user-supplied data is incorporated into commands or queries without proper sanitisation. This is the category where static analysis performs best, because the pattern is relatively mechanical: trace data from an input source to a dangerous sink, and check whether it is sanitised along the way.
However, modern applications often use ORMs and templating engines that provide built-in protection. The injection risks shift to the edge cases: raw SQL queries used for complex operations that the ORM cannot express, template rendering with explicitly disabled escaping, or shell commands constructed with user input for administrative features.
What to look for: any code that constructs SQL, HTML, shell commands, or LDAP queries by concatenating strings that include external input. Any use of raw query methods that bypass the ORM's parameterisation. Any template rendering with auto-escaping disabled.
A04: Insecure design
Insecure design is fundamentally different from the other categories. It is not about implementation bugs – it is about architectural decisions that make the application inherently vulnerable. A perfectly implemented feature can be insecure by design.
Examples include a password reset flow that relies on security questions (easily guessable), an API that returns more data than the client needs (information leakage by design), or a rate-limiting approach that can be bypassed by rotating IP addresses. These are not coding mistakes. They are design decisions that create vulnerabilities.
Static analysis cannot find insecure design because the code is doing exactly what it was written to do. Finding these issues requires understanding the application's purpose, its threat model, and the ways an attacker might interact with it. This is where human review – or AI review that can reason about intent – becomes essential.
A05: Security misconfiguration
Security misconfigurations are settings and defaults that leave the application unnecessarily exposed. In code, this includes default credentials left in configuration files, debug mode enabled in production, overly permissive CORS policies, missing security headers, unnecessary features enabled, and verbose error messages that leak internal details to users.
What makes misconfigurations tricky is that they are often correct for development but wrong for production. A CORS policy that allows all origins is convenient during local development but dangerous in production. Debug logging that includes request bodies is useful for troubleshooting but a data leak risk when deployed. The code is not wrong – it is wrong for the environment.
Review configuration files, environment variable handling, and framework setup code with production deployment in mind. Look for anything that assumes a trusted environment.
A06 through A10: The remaining categories
A06: Vulnerable and outdated components. Dependencies with known CVEs. Check your dependency manifest against vulnerability databases. Automate this with npm audit, pip-audit, or equivalent tools. The challenge is not detection but prioritisation: which of the 47 flagged vulnerabilities actually affects your usage of the library?
A07: Identification and authentication failures. Weak password policies, missing multi-factor authentication, session tokens that do not expire, credentials transmitted without encryption. Look for session management code, authentication middleware, and password handling functions. Verify that sessions invalidate on logout and that tokens have reasonable expiration times.
A08: Software and data integrity failures. Code that trusts external data without verification: deserialising untrusted objects, auto-updating from unsigned sources, CI/CD pipelines that pull from unverified repositories. The recent wave of supply chain attacks falls squarely in this category.
A09: Security logging and monitoring failures. Not a vulnerability in itself, but a failure to detect and respond to vulnerabilities. Look for authentication events that are not logged, failed access attempts that generate no alerts, and audit trails that can be tampered with. If an attacker breaches your system and the logs do not record it, you will not know until the damage is visible.
A10: Server-side request forgery (SSRF). Code that makes HTTP requests based on user-supplied URLs without validation. An attacker can use SSRF to access internal services, cloud metadata endpoints, or other resources that are not exposed to the public internet. Look for any code that fetches a URL provided by the user and ensure it validates the destination against an allowlist.
Why static analysis is necessary but not sufficient
Static analysis tools are excellent at finding certain categories of vulnerabilities. Injection flaws, known vulnerable dependencies, and some cryptographic misuses are well within their capabilities. These tools apply deterministic rules to code patterns, and for the patterns they know about, they are fast and reliable.
But at least four of the OWASP Top 10 categories – broken access control, insecure design, security misconfiguration, and identification failures – require contextual understanding that rule-based tools cannot provide. Is this endpoint supposed to be publicly accessible? Is this CORS policy intentionally permissive? Is this session timeout appropriate for the risk level of the application?
These questions require understanding the application's purpose, its users, and its threat model. They are judgement calls, and making them correctly requires the kind of reasoning that static rules cannot encode.
How LLM-based review addresses the gap
Large language models can reason about code in ways that rule-based tools cannot. They can evaluate whether an authorisation check is missing by understanding what the endpoint does and who should be allowed to call it. They can assess whether a configuration is appropriate for production by understanding the implications of each setting. They can identify insecure design patterns by reasoning about how an attacker might interact with the system.
This does not mean LLM-based review replaces static analysis. Static analysis is faster, more deterministic, and better at exhaustive pattern matching. LLM-based review is better at the contextual, judgement-intensive categories that static analysis misses. The combination provides broader coverage than either approach alone.
VibeRails includes a dedicated security category in its analysis that maps directly to the OWASP framework. When it reviews your codebase, it identifies not just the pattern-matchable vulnerabilities but also the context-dependent issues: missing authorisation checks, insecure design patterns, configuration problems that only matter in production, and authentication flows with logical weaknesses. Each finding includes the reasoning behind it, so your team can evaluate whether the concern applies to your specific context.
Security code review is not optional. The OWASP Top 10 provides the map. The question is whether your team has the time and tooling to navigate it systematically across your entire codebase – not just the code that changed this week, but all of it.
Limits and tradeoffs
- It can miss context. Treat findings as prompts for investigation, not verdicts.
- False positives happen. Plan a quick triage pass before you schedule work.
- Privacy depends on your model setup. If you use a cloud model, relevant code is sent to that provider; local models can keep inference on your own hardware.