API Error Handling: What to Return and What to Swallow

Production Ready — Part 6 of 30

The Stack Trace That Cost $4.5 Million

A mid-sized fintech company — let's call them what they were, careless — had a payments API running in production. It worked fine for normal requests. But when a malformed payload arrived, their Python/Flask app threw an unhandled exception and returned this to the caller:

HTTP/1.1 500 Internal Server Error
Content-Type: text/html

Traceback (most recent call last):
  File "/app/services/payment_processor.py", line 147, in process_payment
    result = db.execute(f"SELECT * FROM payments WHERE user_id = {user_id}")
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1802
    ...
OperationalError: (psycopg2.OperationalError) could not connect to server: Connection refused
    Is the server running on host "prod-db-01.internal" (10.0.1.45) and accepting
    TCP/IP connections on port 5432?

In one error response, they handed an attacker:

The exact file path and line number of their payment processor
The raw SQL query pattern (and the fact it was injectable)
Their internal database hostname and IP address
The database port
Their Python version and dependency stack

This is not a horror story I made up. According to a 2026 analysis of API security costs, the average API security breach now costs $4.5 million — and 68% of organizations reported at least one API-related security incident in 2025. The Equixly 2025 API incident report documents case after case where excessive data exposure — including in error responses — provided the initial foothold for attackers.

Verbose errors are not a debugging feature in production. They're a reconnaissance service for attackers.

The Two-Layer Error Model

Every production API needs two completely separate error channels:

The external response — what you return to the API caller
The internal log — what you record for your own engineers

These two things should look nothing alike.

The external response exists to help the caller fix their request or understand what kind of failure occurred. It should never contain implementation details.

The internal log exists so your team can diagnose the problem. It should contain everything — stack traces, query parameters, database errors, the full context.

Here's the model visualized:

Incoming Request
       |
       v
  [Your API Handler]
       |
       |-- Error occurs --
       |
       |----> [Internal Logger] <-- Full stack trace, DB errors,
       |                            request context, user ID,
       |                            correlation ID, everything
       |
       v
  [Error Sanitizer]
       |
       v
  [Caller Response] <-- Clean, structured, no internals

The sanitizer is the key piece most vibe coders skip. Let's build it properly.

What to Return: The RFC 9457 Standard

Before I show you code, you need to know that there's an actual internet standard for this: RFC 9457 — Problem Details for HTTP APIs. It was published in July 2023, superseding RFC 7807, and it defines a application/problem+json content type that gives you a consistent, machine-readable error format.

Here's the RFC 9457 shape:

{
  "type": "https://api.example.com/errors/insufficient-funds",
  "title": "Insufficient funds",
  "status": 403,
  "detail": "Your balance is $30.00 but this transaction requires $50.00.",
  "instance": "/transactions/abc-123"
}

Breaking down each field:

Field	Purpose	Example
`type`	A stable URI identifying the class of problem	`https://api.example.com/errors/rate-limited`
`title`	Short human-readable summary (same for all instances of this type)	`"Too many requests"`
`status`	The HTTP status code (mirrors the response code)	`429`
`detail`	Human-readable explanation specific to this occurrence	`"You've made 100 requests in the last 60 seconds"`
`instance`	URI identifying this specific occurrence (optional, but useful)	`"/requests/req-xyz-789"`

The critical RFC 9457 philosophy: problem details are not a debugging tool for your implementation — they're a way to expose details about the HTTP interface itself. Internal implementation details have no place here.

Python Implementation

Here's how to build this properly in a FastAPI service:

# errors.py — your error module
from fastapi import Request, HTTPException
from fastapi.responses import JSONResponse
from fastapi.exceptions import RequestValidationError
import logging
import uuid
import traceback

logger = logging.getLogger(__name__)

# Map internal exception types to safe external messages
ERROR_MAP = {
    "ResourceNotFoundError": (404, "Resource not found", "The requested resource does not exist."),
    "AuthenticationError": (401, "Authentication required", "Valid credentials are required for this endpoint."),
    "AuthorizationError": (403, "Access denied", "You do not have permission to perform this action."),
    "RateLimitError": (429, "Too many requests", "You have exceeded the rate limit. Please retry after 60 seconds."),
    "ValidationError": (422, "Invalid request", "The request data failed validation."),
}

def problem_response(status: int, title: str, detail: str,
                     instance: str, error_type: str = None,
                     extra: dict = None) -> JSONResponse:
    """Return an RFC 9457-compliant problem detail response."""
    body = {
        "type": error_type or f"https://api.example.com/errors/{status}",
        "title": title,
        "status": status,
        "detail": detail,
        "instance": instance,
    }
    if extra:
        body.update(extra)
    return JSONResponse(
        status_code=status,
        content=body,
        media_type="application/problem+json"
    )


async def global_exception_handler(request: Request, exc: Exception) -> JSONResponse:
    # Generate a correlation ID for this specific error occurrence
    correlation_id = str(uuid.uuid4())
    instance_uri = f"/errors/{correlation_id}"

    # LOG EVERYTHING internally — stack trace, request details, the works
    logger.error(
        "Unhandled exception",
        extra={
            "correlation_id": correlation_id,
            "path": request.url.path,
            "method": request.method,
            "exception_type": type(exc).__name__,
            "stack_trace": traceback.format_exc(),
            # Include user context if available
            "user_id": getattr(request.state, "user_id", None),
        }
    )

    # Return ONLY the generic safe message to the caller
    return problem_response(
        status=500,
        title="Internal server error",
        detail="An unexpected error occurred. Use the instance ID to reference this error with support.",
        instance=instance_uri,
    )


async def http_exception_handler(request: Request, exc: HTTPException) -> JSONResponse:
    # HTTP exceptions (404, 403, etc.) are safe to surface — they describe
    # the HTTP interface, not implementation internals
    return problem_response(
        status=exc.status_code,
        title=exc.detail,
        detail=exc.detail,
        instance=request.url.path,
    )


async def validation_exception_handler(request: Request, exc: RequestValidationError) -> JSONResponse:
    # Validation errors: safe to show field names and what was wrong
    # Do NOT echo back the submitted values (could contain sensitive data)
    errors = [
        {"field": ".".join(str(loc) for loc in err["loc"]), "message": err["msg"]}
        for err in exc.errors()
    ]
    return problem_response(
        status=422,
        title="Validation error",
        detail="One or more fields failed validation.",
        instance=request.url.path,
        error_type="https://api.example.com/errors/validation",
        extra={"errors": errors},
    )

Then wire up your handlers in main.py:

# main.py
from fastapi import FastAPI
from fastapi.exceptions import RequestValidationError
from starlette.exceptions import HTTPException
from .errors import global_exception_handler, http_exception_handler, validation_exception_handler

app = FastAPI()

app.add_exception_handler(Exception, global_exception_handler)
app.add_exception_handler(HTTPException, http_exception_handler)
app.add_exception_handler(RequestValidationError, validation_exception_handler)

JavaScript/TypeScript Implementation

Here's the Express.js equivalent with the same two-layer approach:

// errors.ts
import { Request, Response, NextFunction } from 'express';
import { v4 as uuidv4 } from 'uuid';
import { logger } from './logger';

interface ProblemDetail {
  type: string;
  title: string;
  status: number;
  detail: string;
  instance: string;
  [key: string]: unknown;
}

export function problemResponse(
  res: Response,
  status: number,
  title: string,
  detail: string,
  instance: string,
  extras?: Record<string, unknown>
): void {
  const body: ProblemDetail = {
    type: `https://api.example.com/errors/${status}`,
    title,
    status,
    detail,
    instance,
    ...extras,
  };
  res.status(status)
     .type('application/problem+json')
     .json(body);
}

// Global error middleware — must have 4 parameters for Express to treat as error handler
export function globalErrorHandler(
  err: Error,
  req: Request,
  res: Response,
  _next: NextFunction
): void {
  const correlationId = uuidv4();

  // Log everything internally
  logger.error('Unhandled error', {
    correlationId,
    path: req.path,
    method: req.method,
    errorType: err.constructor.name,
    errorMessage: err.message,
    stack: err.stack,
    userId: (req as any).userId ?? null,
  });

  // Return nothing useful to the attacker
  problemResponse(
    res,
    500,
    'Internal server error',
    `An unexpected error occurred. Reference ID: ${correlationId}`,
    req.path
  );
}

What to Swallow (And Why)

Here's a decision framework for every piece of error information:

Always swallow from responses (log internally only):

Stack traces
Database query text or error messages
Internal hostnames, IPs, or ports
File system paths
Third-party service error messages (e.g., raw AWS or Stripe SDK errors)
SQL state codes or ORM-level exceptions
Environment variable names
Dependency versions

Safe to return in responses:

HTTP status code and meaning
Field-level validation errors (field name + what rule failed)
Business rule violations ("balance insufficient", "account suspended")
Rate limit information with retry guidance
A correlation/request ID for support reference

The trap vibe coders fall into: passing error.message directly into the response. The message might be totally safe ("Email already registered") or catastrophically unsafe ("Column 'ssn' doesn't exist in table 'users_v1'") — you can't tell without reading it. Always map exception types to safe messages explicitly, as shown in the ERROR_MAP pattern above.

The Volkswagen Lesson: Error Responses as Recon

The 2025 Volkswagen API incident is a textbook case. According to the Equixly API incident analysis, their APIs returned entire JSON objects to clients, relying on client-side filtering to hide sensitive fields. When a researcher bypassed the client-side interface and called the API directly, the unfiltered responses included plaintext credentials for backend services including Salesforce, payment processors, and internal systems.

This wasn't a stack trace leak — it was business-logic-level over-exposure. The same principle applies: never rely on your frontend to hide what your API returns.

The Levo.ai 2026 API security checklist makes it explicit: "How your API fails often reveals more than how it succeeds." Test your error responses as carefully as you test your success paths. Try intentionally malformed requests, inject SQL characters, pass wrong types, send oversized payloads — and check exactly what comes back.

The Correlation ID Pattern

One thing your error responses should include is a correlation ID — a random UUID generated per-request that links the safe external error to the full internal log entry.

This solves the support problem cleanly:

# What the user sees:
{
  "type": "https://api.example.com/errors/500",
  "title": "Internal server error",
  "detail": "An unexpected error occurred. Reference ID: 7f3d2a1c-9e4b-4f8d-b1c2-0a3e5f7d9b12",
  "instance": "/v1/payments/process",
  "status": 500
}

# What you grep in your logs:
$ grep "7f3d2a1c-9e4b-4f8d-b1c2-0a3e5f7d9b12" /var/log/api.log

# What you find:
{
  "correlation_id": "7f3d2a1c-9e4b-4f8d-b1c2-0a3e5f7d9b12",
  "path": "/v1/payments/process",
  "user_id": "usr_abc123",
  "exception_type": "OperationalError",
  "stack_trace": "...",
  "db_error": "could not connect to host prod-db-01.internal:5432"
}

The user gets a reference they can give to support. You get everything you need to debug. Attackers get nothing useful.

Don't Forget: 404 vs. 403 Information Leakage

There's a subtle but critical error handling decision that catches vibe coders off guard: returning 404 when you should return 403.

Consider a user trying to access /api/users/456/profile when they're only authorized to access /api/users/123/profile.

Bad pattern:

GET /api/users/456/profile  →  403 Forbidden

This tells the attacker: "User 456 exists. I just can't see them yet."

Better pattern:

GET /api/users/456/profile  →  404 Not Found

Return 404 for any resource the caller isn't authorized to know exists. This is called authorization-aware 404 and it's the correct behavior for sensitive resources. The CybelAngel 2025 API threat report noted that 95% of API attacks in 2025 came from authenticated sessions — meaning attackers are already inside, probing what they can discover through response patterns. Don't help them enumerate.

Quick Checklist

Before shipping any API endpoint to production, verify:

Global exception handler exists — no uncaught exceptions ever reach the caller as raw errors
Stack traces are never in response bodies — only in internal logs
Database errors are swallowed — callers see "internal error", not SQL text
Third-party SDK errors are wrapped — raw Stripe/AWS/Twilio messages stay internal
All errors use a consistent JSON structure — ideally RFC 9457 application/problem+json
Correlation IDs link external errors to internal logs — support can look up any incident
Validation errors expose field names but not echoed values — show what was wrong, not what was submitted
404 used for unauthorized resource access — don't confirm existence of resources the caller can't see
Error responses are tested explicitly — malformed requests, auth failures, edge cases all checked
Dev/staging verbose mode is disabled in production — no DEBUG=True or NODE_ENV=development

Ask The Guild

Community prompt: Have you ever shipped a verbose error message to production — or found one in a codebase you inherited? What was leaking, and how did you fix it? Share your war story (sanitized, of course) in the comments. Bonus points if you've implemented RFC 9457 and want to show off your error registry setup.