Guide December 16, 2024

Error Handling Patterns That Scale

Every team has error handling. Few teams have an error handling strategy. Here are the patterns that work at scale – and the anti-patterns that silently make things worse.

A structured grid of warning and error icons sorted into categories, representing organised error handling patterns

In a codebase with ten files, error handling is a local problem. Each function handles its own errors, and the developer who wrote the function understands the failure modes. The approach does not need to be consistent because there is not enough code for inconsistency to cause confusion.

In a codebase with ten thousand files, error handling is a systemic problem. Dozens of developers write error handling code independently. Without a shared strategy, each module handles errors differently. Some throw exceptions. Some return error codes. Some log and continue. Some swallow errors silently. The result is a codebase where no one can predict what happens when something goes wrong.

Ad-hoc error handling is one of the most common and most damaging forms of technical debt. It does not announce itself. It does not cause obvious failures during development. It causes unpredictable failures in production, where the consequences are highest and the debugging context is lowest.

Why ad-hoc error handling breaks at scale

The problems with inconsistent error handling are not theoretical. They appear in specific, measurable ways.

Inconsistent user experience. When each module handles errors differently, users see different error messages and behaviours depending on which part of the system fails. One endpoint returns a structured JSON error with a helpful message. Another returns a 500 with a stack trace. A third returns a 200 with an empty body. The user experience is unpredictable, and the front-end team cannot build consistent error handling because the back-end does not provide consistent errors.

Lost context. When an error is caught and re-thrown without preserving the original context, the stack trace becomes useless. The production log shows where the error was re-thrown, not where it originated. Developers spend hours tracing the path from symptom to cause because the error handling code between them discarded the information they needed.

Silent failures. The most dangerous error handling pattern is the one that catches an error and does nothing. No log. No re-throw. No return value indicating failure. The caller does not know the operation failed and continues as if it succeeded. Data is corrupted. State is inconsistent. And the failure only becomes visible much later, when the corrupted state causes a different, seemingly unrelated error.

Log noise. Without structured error handling, teams compensate by logging everything. Every catch block writes a log message. The result is a log stream with thousands of entries per minute, most of them irrelevant, many of them redundant (the same error logged at every layer it passes through), and few of them containing the context needed to diagnose the problem. When everything is logged, nothing is findable.

Patterns that work

The following patterns have been proven across codebases of varying sizes and languages. They are not mutually exclusive – most mature codebases use several in combination.

Result types

Instead of throwing an exception or returning null, a function returns a result object that explicitly represents either success or failure. In TypeScript, this might be { ok: true, value: T } | { ok: false, error: E }. In Rust, it is the built-in Result<T, E> type. In Go, it is the convention of returning (value, error) pairs.

The advantage of result types is that the caller cannot accidentally ignore the error case. The return type forces them to handle both possibilities. This eliminates an entire category of bugs – the ones where a function fails but the caller proceeds as if it succeeded because the error was communicated through a side channel (an exception) that the caller did not catch.

Result types work best for domain-level errors – the kinds of failures that are expected and meaningful, like a validation error, a record not found, or a permission denied. They are less suitable for truly exceptional conditions (out of memory, network partition) where the calling code cannot meaningfully recover.

Error boundaries

An error boundary is a defined layer in the application where unhandled errors are caught, processed, and translated into a consistent format. In a web application, the outermost error boundary might be Express middleware that catches any unhandled exception and returns a structured JSON error response. In React, it is a component that catches rendering errors and shows a fallback UI.

Error boundaries prevent unhandled errors from reaching the user in unpredictable forms. They also provide a single place to add logging, alerting, and error tracking. Instead of every module implementing its own error reporting, the boundary handles it once, consistently.

The key design decision is where to place the boundaries. Too few boundaries and errors propagate too far before being caught. Too many and the boundaries interfere with each other, catching errors that should have been handled by a more specific handler. A typical architecture has boundaries at the API layer, the service layer, and the data access layer, each translating errors into the appropriate format for its consumers.

Centralised error handlers

A centralised error handler is a shared module that all error boundaries delegate to. It receives an error object and decides what to do: log it, report it to an error tracking service, transform it into a user-facing message, or some combination. The logic for how errors are processed lives in one place, not scattered across every catch block in the codebase.

Centralised handlers also enforce consistency. When a new error category is added – say, a rate limit error from a third-party API – it is handled in one place. Without centralisation, every team has to independently discover and handle the new error type, and some will get it wrong.

Structured error codes

Human-readable error messages are for users. Structured error codes are for machines. An error code like AUTH_TOKEN_EXPIRED or PAYMENT_INSUFFICIENT_FUNDS allows the front-end to map specific error conditions to specific user experiences without parsing error message strings. It also makes error tracking and alerting more reliable – you can set up an alert for DB_CONNECTION_TIMEOUT without worrying about message wording changes.

The best error code systems are hierarchical. AUTH is a category. AUTH_TOKEN is a subcategory. AUTH_TOKEN_EXPIRED is a specific error. This hierarchy allows both precise matching (alert on this specific error) and broad matching (show the auth troubleshooting page for any AUTH error).

Error context chains

When an error propagates through multiple layers, each layer should add context rather than replacing it. If a database query fails, the data access layer adds the query details. The service layer adds the business operation that required the query. The API layer adds the request details. The final error object contains a chain of context that tells the full story: which request triggered which operation, which required which query, which failed with which database error.

In JavaScript, the cause property on Error objects (introduced in ES2022) supports this pattern natively. In other languages, wrapper exceptions or structured error objects achieve the same result. The key principle is never to discard context. Every catch-and-rethrow should add information, not remove it.

Anti-patterns to avoid

Catch and ignore. The empty catch block – catch (e) {} – is the most dangerous line of code in any codebase. It converts a visible error into an invisible one. The system continues executing with incorrect state, and the eventual failure is impossible to trace back to the original cause. If you genuinely intend to ignore an error, add a comment explaining why. If you cannot explain why, you should not be ignoring it.

Generic catch-all. A single try/catch wrapped around an entire function body catches every possible error, including ones the developer did not anticipate. The handler cannot know what went wrong, so it logs a generic message and returns a generic error. This masks the specific failures that would have been easy to fix if they had been allowed to surface. Catch specific error types. Let unexpected errors propagate to the error boundary.

Error as flow control. Using exceptions for expected conditions – throwing a UserNotFoundException as the normal way to check if a user exists – makes the codebase harder to read and debug. Exceptions should represent exceptional conditions. Expected outcomes should be represented by return values. When exceptions are used for control flow, every function call becomes a potential branch point, and debuggers with break-on-exception enabled become unusable.

How to assess your current error handling

Most teams know their error handling is inconsistent but do not know where the inconsistencies are or how severe they are. A systematic assessment requires looking at every module, identifying the error handling patterns in use, and mapping the inconsistencies.

Manual assessment is possible but tedious. Searching for catch blocks, try statements, and error return patterns across hundreds of files produces a list of locations but not an analysis of consistency. The developer still has to read each one and determine whether it follows the team's intended pattern.

VibeRails detects inconsistent error handling across modules automatically. It identifies where different parts of the codebase use different strategies, where errors are caught and silently discarded, where context is lost during propagation, and where error handling is entirely absent. The result is a structured map of error handling patterns across the entire codebase – not a list of linting violations, but a semantic analysis of how errors actually flow through the system.