Trust Nobody, Validate Everything: Runtime Data Validation | Production Ready

Production Ready — Part 19 of 30

In August 2025, a developer deployed a Shopify webhook handler with this line:

const { shopId } = payload.shop_id;

Looks reasonable. You've probably written something similar. The problem: payload.shop_id is a number — 12345 — not an object. JavaScript's destructuring doesn't throw when you try to destructure a primitive. It just gives you undefined. The downstream code then passed undefined into the ORM's delete function. Without a proper WHERE clause constraint, it deleted the entire database table.

One line. One missing validation check. Complete data loss.

Four months later, on December 5, 2025, Cloudflare's older FL1 proxy went down for 25 minutes affecting 28% of all HTTP traffic they carried. The root cause? A Lua code path assumed a field called execute would exist on a rule result object. When configuration changes caused that field to be absent, the proxy tried to index into a nil value and threw an exception on every single request. Cloudflare's own post-mortem noted that "this type of code error is prevented by languages with strong type systems" — and that their new Rust-based FL2 proxy didn't have the bug, precisely because Rust forces you to handle the case where a value might not exist.

This is the core problem we're solving today: the gap between what your types say at compile time and what your data actually looks like at runtime.

This Is Different From Input Validation

Back in Day 15, we covered input validation — sanitizing user data for security, preventing XSS and injection attacks. That's about keeping malicious input out of your system.

Today's topic is different. Runtime data validation is about operational reliability. It's about catching well-intentioned but incorrect data before it corrupts your system. The Shopify webhook was legitimate — properly signed, from a real Shopify event. The payload was just structured differently than the developer assumed.

Runtime validation failures happen at every boundary in your system:

An external API changes its response shape and suddenly a field you expected is missing or renamed
An environment variable is misconfigured and your app starts with invalid settings
A database query returns a null where your code expected a string
A webhook payload has a field as null in one event type but a string in another
A third-party library's types are out of date with the actual API

None of these involve attackers. All of them can bring down your production system.

The Tools: Zod and Pydantic

The JavaScript/TypeScript ecosystem has Zod. The Python ecosystem has Pydantic. Both solve the same problem: they let you define the shape of data you expect, then validate at runtime that the data actually matches. If it doesn't, you get a clear, structured error — not a cryptic TypeError: Cannot read properties of undefined three stack frames later.

As of 2026, Pydantic has reached version 2.12.5 with support for Python 3.14, strict mode, and partial validation. A real-world enterprise case study showed that adopting Pydantic V2's strict validation reduced data-related errors by 78% and improved API response times by 35% in a system handling over 10 million requests per day. And Zod's 2026 guide shows the library has matured to handle the full spectrum of TypeScript validation patterns. These are not experimental tools — they are production infrastructure.

Validating at Every System Boundary

1. API Inputs

This is where most developers start. In Python with FastAPI, Pydantic validation is built-in:

from pydantic import BaseModel, EmailStr, Field
from fastapi import FastAPI

class CreateOrderRequest(BaseModel):
    user_id: int
    product_id: int
    quantity: int = Field(gt=0, le=1000)  # must be 1-1000
    promo_code: str | None = None

app = FastAPI()

@app.post("/orders")
async def create_order(request: CreateOrderRequest):
    # If we get here, the data is valid. No additional checks needed.
    return process_order(request)

If quantity comes in as "lots" or -5, FastAPI rejects it with a clear 422 error before your code ever runs.

In TypeScript with Zod:

import { z } from 'zod';

const CreateOrderSchema = z.object({
  userId: z.number().int().positive(),
  productId: z.number().int().positive(),
  quantity: z.number().int().min(1).max(1000),
  promoCode: z.string().optional(),
});

type CreateOrderRequest = z.infer<typeof CreateOrderSchema>;

// In your route handler:
app.post('/orders', (req, res) => {
  const result = CreateOrderSchema.safeParse(req.body);
  if (!result.success) {
    return res.status(400).json({ errors: result.error.flatten() });
  }
  // result.data is fully typed and validated
  return processOrder(result.data);
});

Note the use of safeParse instead of parse. parse throws on failure; safeParse returns a result object. Prefer safeParse in production — throwing exceptions for expected failure cases (bad user input) is the wrong pattern.

2. Webhook Payloads

This is where the database-deletion incident from August 2025 lived. Webhook payloads come from external systems you don't control. Their shape can change. Always validate.

import { z } from 'zod';

// Define what Shopify ACTUALLY sends for SHOP_REDACT
const ShopifyRedactWebhookSchema = z.object({
  shop_id: z.number(),             // it's a number, not an object!
  shop_domain: z.string(),
});

app.post('/webhooks/shopify/redact', async (req, res) => {
  const result = ShopifyRedactWebhookSchema.safeParse(req.body);
  
  if (!result.success) {
    console.error('Invalid Shopify webhook payload', {
      errors: result.error.flatten(),
      rawBody: req.body,
    });
    // Return 200 to prevent Shopify from retrying an invalid payload
    return res.status(200).json({ received: true, valid: false });
  }

  const { shop_id } = result.data; // now it's definitely a number
  await deleteShopData(shop_id);
  return res.status(200).json({ received: true });
});

The fix is four lines of schema definition. The validation in the original code was zero lines. The difference is a production database.

3. Environment Variables

Missing or malformed environment variables are one of the most common causes of production startup failures — and they're among the easiest to prevent.

# config.py — validate at startup, fail loudly if misconfigured
from pydantic_settings import BaseSettings
from pydantic import AnyHttpUrl, Field

class Settings(BaseSettings):
    database_url: str
    redis_url: str
    api_secret_key: str = Field(min_length=32)  # enforce key strength
    max_workers: int = Field(default=4, gt=0, le=64)
    payment_webhook_url: AnyHttpUrl
    debug: bool = False

    class Config:
        env_file = ".env"

# This line runs at import time. If DATABASE_URL is missing or
# PAYMENT_WEBHOOK_URL isn't a valid URL, the app refuses to start.
settings = Settings()

This pattern — validate at startup and crash loudly — is far better than discovering at 2 AM that MAX_WORKERS was set to "four" and your worker pool silently initialized to zero.

4. External API Responses

This is the validation most developers skip. You call an external API, you get back JSON, you start using it. What could go wrong? Plenty — APIs add fields, remove fields, change types between versions, and return different shapes on error paths.

import { z } from 'zod';

const StripeChargeSchema = z.object({
  id: z.string().startsWith('ch_'),
  amount: z.number().nonnegative(),
  currency: z.string().length(3),
  status: z.enum(['succeeded', 'pending', 'failed']),
  created: z.number(),
  metadata: z.record(z.string()).optional(),
});

async function getCharge(chargeId: string) {
  const response = await stripe.charges.retrieve(chargeId);
  
  const result = StripeChargeSchema.safeParse(response);
  if (!result.success) {
    // Log and alert — the API shape we depend on has changed
    logger.error('Stripe API response failed schema validation', {
      errors: result.error.flatten(),
      chargeId,
    });
    throw new Error('Unexpected Stripe response format');
  }
  
  return result.data; // fully typed, validated
}

When Stripe (or any API you call) changes their response format, you want to know immediately — not after corrupt data has worked its way into your database.

5. Database Query Results

Your ORM types tell you what a record should look like. Your database will happily return null for a field that your type says is string if data was inserted before your NOT NULL constraint was added, or if a migration ran partially.

from pydantic import BaseModel
from typing import Optional

class Order(BaseModel):
    id: int
    user_id: int
    amount_cents: int
    status: str
    stripe_charge_id: Optional[str] = None

def get_order(order_id: int) -> Order:
    row = db.execute(
        "SELECT id, user_id, amount_cents, status, stripe_charge_id "
        "FROM orders WHERE id = %s", 
        (order_id,)
    ).fetchone()
    
    if not row:
        raise ValueError(f"Order {order_id} not found")
    
    # Validate before returning — catch data integrity issues
    return Order(**dict(row))

The Key Principle: Parse, Don't Validate

There's a subtle but important distinction in how to think about this. The goal isn't to check data and then proceed cautiously — it's to transform untyped, untrusted data into typed, trusted data, and then only work with the trusted version.

This is why Zod's parse / safeParse returns a new typed object, and Pydantic's models return instances where you can trust the types. You're not annotating raw data with a note that says "probably fine." You're moving the data from the untrusted world into the trusted world.

The Cloudflare outage was caused by code that assumed a field existed because it usually did. The Argo CD CVE-2025-59537 vulnerability — where a malformed webhook payload could crash the entire CI/CD server — was caused by the same assumption. Neither system validated the data before accessing it. Both paid the price.

Don't check. Parse.

Production Validation Checklist

All HTTP request bodies validated with Zod or Pydantic before touching application logic
All webhook handlers define and validate the expected payload schema
Environment variables validated at application startup (crash on invalid config, not at request time)
All external API responses parsed through a schema before use
Database query results validated for expected shape, especially for nullable fields
Using safeParse (Zod) or try/except with ValidationError (Pydantic) for expected failure cases — not raw throws
Validation errors logged with the raw input for debugging (but sanitized of PII)
Schema validation runs in CI against sample production payloads
No code path trusts a type annotation on external data without a runtime check to back it up
Zero-trust applied to your own database: validate returned rows, don't assume schema constraints held

Ask The Guild

Where did you discover you were missing runtime validation — before or after it bit you in production? Have you found a schema that broke when a third-party API silently changed their response format? Do you have a pattern for validating database results that you'd swear by, or a story about an unvalidated webhook payload that caused chaos? Share your experience in the Guild — and if you've got a Pydantic or Zod pattern you're particularly proud of, paste it in. The best examples might end up in a future lesson.

Tom Hundley is a software architect with 25 years of experience and the author of the Production Ready series. He has personally written, reviewed, and debugged enough unvalidated code to know that the ten minutes it takes to write a schema is always cheaper than the incident it prevents.

Trust Nobody, Validate Everything: Data Validation