Webhook Architecture: Receiving Events from Stripe & Clerk

Architecture Patterns — Part 11 of 30

The Friday Deploy That Charged Customers Twice

A SaaS founder I know shipped a routine backend update on a Friday afternoon. The deploy took about 45 seconds of downtime while the new container spun up. Nothing dramatic. The Stripe dashboard looked fine.

Saturday morning, three support emails arrived. Customers saying they'd been charged twice.

Here's what happened: When his server restarted, Stripe's retry mechanism kicked in. The checkout.session.completed events that had fired during the 45-second window got no 200 OK, so Stripe marked them as failed and requeued them. When the server came back up, Stripe delivered those events again — and his handler processed them again, creating duplicate orders and triggering duplicate fulfillment emails.

Stripe's retry behavior is well-documented: in live mode, it uses exponential backoff and retries for up to 3 days. After continuous failures, Stripe auto-disables your endpoint and sends you an email. But "retrying" is not the problem — the problem is that his handler had no protection against receiving the same event twice.

This is the webhook trap. Every vibe coder hits it. And it's entirely architectural, not a bug you debug with console.log.

This is Part 11 of the Architecture Patterns series. Today we're building the mental model and the implementation playbook for receiving webhooks from Stripe and Clerk — two of the most common event sources in modern SaaS stacks.

What a Webhook Actually Is (And Why It's Harder Than It Looks)

A webhook is just an HTTP POST that someone else sends to your server when something happens on their end. Conceptually simple. In production, it's a distributed systems problem.

The mental model that kills developers: "A webhook is like a function call — it fires once, I handle it, done."

The correct mental model: "A webhook is an at-least-once message delivery over an unreliable network from an external system I don't control."

The phrase "at-least-once" is doing a lot of work there. It means:

The same event can arrive multiple times (retries after timeout, infrastructure hiccups)
Events can arrive out of order (a customer.subscription.updated can beat the customer.subscription.created to your server)
Events can be delayed by minutes or hours
Your endpoint can be unavailable when an event fires

As Creative Software's engineering blog summarizes it bluntly: "Events get retried, duplicated, delayed, or reordered. If your webhook consumer assumes perfect conditions, it will eventually break."

Armed with that model, let's talk about the three architectural decisions that determine whether your webhook system survives production.

Decision 1: Verify the Signature Before Touching Anything

Anyone on the internet can POST to your webhook endpoint. If you process those requests without verifying they actually came from Stripe or Clerk, you're one malicious request away from unauthorized order fulfillment, fake user provisioning, or worse.

Both Stripe and Clerk use HMAC-SHA256 signatures. The pattern is identical: the provider signs the payload with a shared secret, sends the signature in a header, and you verify it before processing.

Stripe Signature Verification

Stripe sends a Stripe-Signature header. Their SDK's constructEvent() handles the cryptographic verification:

// TypeScript / Express
import Stripe from 'stripe';
import express from 'express';

const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!);
const endpointSecret = process.env.STRIPE_WEBHOOK_SECRET!; // starts with whsec_

// CRITICAL: The webhook route MUST come before express.json() middleware
app.post('/webhooks/stripe', express.raw({ type: 'application/json' }), (req, res) => {
  const sig = req.headers['stripe-signature']!;

  let event: Stripe.Event;
  try {
    event = stripe.webhooks.constructEvent(req.body, sig, endpointSecret);
  } catch (err) {
    console.error('Signature verification failed:', err);
    return res.status(400).send(`Webhook Error: ${err.message}`);
  }

  // Signature verified — safe to process
  handleStripeEvent(event);
  res.json({ received: true });
});

The most common failure mode: Express's express.json() middleware parses the body before signature verification, which mutates the byte stream and breaks the HMAC check. This exact issue surfaces constantly in production — and in Next.js App Router, the equivalent pitfall is calling req.json() before using req.text() to get the raw body.

For Next.js App Router:

// app/api/webhooks/stripe/route.ts
import Stripe from 'stripe';

const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!);

export async function POST(req: Request) {
  const body = await req.text(); // raw body — MUST come before any parsing
  const sig = req.headers.get('stripe-signature')!;

  let event: Stripe.Event;
  try {
    event = stripe.webhooks.constructEvent(body, sig, process.env.STRIPE_WEBHOOK_SECRET!);
  } catch (err) {
    return new Response(`Webhook Error: ${err.message}`, { status: 400 });
  }

  // Process event...
  return new Response(JSON.stringify({ received: true }), { status: 200 });
}

Clerk Signature Verification (Svix)

Clerk uses Svix as its webhook delivery infrastructure. The signature lives in three headers: svix-id, svix-timestamp, and svix-signature. Clerk provides a verifyWebhook helper:

// app/api/webhooks/clerk/route.ts
import { verifyWebhook } from '@clerk/nextjs/webhooks';

export async function POST(req: Request) {
  try {
    const evt = await verifyWebhook(req);
    // evt is typed — evt.type, evt.data, etc.
    const { type, data } = evt;

    if (type === 'user.created') {
      await syncUserToDatabase(data);
    }

    return new Response('OK', { status: 200 });
  } catch (err) {
    console.error('Clerk webhook verification failed:', err);
    return new Response('Unauthorized', { status: 401 });
  }
}

The Svix timestamp header also provides replay attack protection — requests older than 5 minutes are automatically rejected, so an attacker can't capture a legitimate webhook and fire it again hours later.

The rule: return a 400 or 401 on signature failure, never a 200. Returning 200 on a bad signature tells the provider "I got it and processed it" — which is a lie that creates false confidence and hides security failures.

Decision 2: Acknowledge Fast, Process Async

Stripe's timeout window is short — a few seconds (the exact value isn't publicly documented, but community experience puts it at 5–10 seconds). If your handler takes longer than that to respond, Stripe marks the delivery as failed and retries.

Here's the trap: your handler might be working — querying the database, sending an email, calling a third-party API — but Stripe doesn't know that. From Stripe's perspective, silence equals failure. So it retries. Now you have two concurrent executions of the same event.

As a developer on r/stripe described it: deploying a new version during a peak payment window left him manually reconciling events that had entered Stripe's retry queue while his service was restarting.

The correct pattern is: verify, enqueue, respond 200 immediately. Do the actual work in a background worker.

# Python / FastAPI example
from fastapi import FastAPI, Request, HTTPException
import stripe
import redis
import json

app = FastAPI()
queue = redis.Redis()

@app.post("/webhooks/stripe")
async def stripe_webhook(request: Request):
    payload = await request.body()
    sig = request.headers.get("stripe-signature")

    try:
        event = stripe.Webhook.construct_event(
            payload, sig, settings.STRIPE_WEBHOOK_SECRET
        )
    except stripe.error.SignatureVerificationError:
        raise HTTPException(status_code=400, detail="Invalid signature")

    # Enqueue the raw event — worker handles the business logic
    queue.lpush("stripe_events", json.dumps({
        "id": event["id"],
        "type": event["type"],
        "data": event["data"]
    }))

    # Respond immediately — do NOT wait for processing
    return {"received": True}

With this pattern, your endpoint stays fast and predictable regardless of what the downstream processing involves. Stripe gets its 200 OK in milliseconds. The work happens on your timeline, with proper error handling and retries under your control.

Decision 3: Make Your Handlers Idempotent

This is the one most teams skip — and it's the one that caused the double-charge incident at the top of this article.

Idempotency means processing the same event multiple times produces the same result as processing it once. Every webhook handler must be idempotent. Not "should be." Must be.

The mechanism is simple: store the event ID when you first process it, and check for it before processing again.

# Worker that processes enqueued events
def handle_stripe_event(event: dict):
    event_id = event["id"]

    # Idempotency check — have we already processed this?
    if db.query(
        "SELECT 1 FROM processed_webhook_events WHERE event_id = %s",
        (event_id,)
    ).fetchone():
        logger.info(f"Skipping duplicate event: {event_id}")
        return

    # Mark as processing (use a DB transaction for safety)
    with db.transaction():
        db.execute(
            "INSERT INTO processed_webhook_events (event_id, processed_at) VALUES (%s, NOW())",
            (event_id,)
        )

        event_type = event["type"]
        if event_type == "checkout.session.completed":
            fulfill_order(event["data"]["object"])
        elif event_type == "customer.subscription.deleted":
            revoke_access(event["data"]["object"])
        elif event_type == "invoice.payment_failed":
            trigger_dunning(event["data"]["object"])

The key architectural insight: idempotency belongs in the worker, not the endpoint. Your endpoint's job is to verify and enqueue. Your worker's job is to process safely. Mixing these concerns — trying to detect duplicates in the HTTP handler — creates race conditions and slows down your response time.

For Clerk events, the same pattern applies. Clerk uses Svix, which provides a unique svix-id per message (consistent across retries of the same event). Store that ID as your idempotency key.

The Stripe Events You Actually Need to Handle

Most tutorials handle payment_intent.succeeded and call it done. Production SaaS apps need more:

Event	What It Means	What To Do
`checkout.session.completed`	One-time purchase complete	Provision access, fulfill order
`customer.subscription.created`	Subscription started	Activate account tier
`customer.subscription.updated`	Plan changed, trial ended	Sync entitlements
`customer.subscription.deleted`	Subscription cancelled/expired	Revoke access
`invoice.payment_failed`	Renewal charge failed	Start dunning sequence
`invoice.paid`	Renewal succeeded	Extend access, clear dunning

A critical decision: fetch fresh from Stripe's API rather than trusting the payload's object data. Stripe events can arrive out of order. The customer.subscription.updated event in your queue might reflect state from 30 seconds ago, while the actual subscription has since been updated again. Pull the current object state when your worker processes the event:

# Don't do this — stale data
subscription_data = event["data"]["object"]
status = subscription_data["status"]

# Do this — fresh state
subscription = stripe.Subscription.retrieve(event["data"]["object"]["id"])
status = subscription.status

The Clerk Events That Sync Your User Table

Clerk handles authentication, but your database handles business logic. The gap between them is filled with Clerk webhooks:

// Syncing Clerk users to your database
switch (evt.type) {
  case 'user.created': {
    const { id, email_addresses, first_name, last_name } = evt.data;
    await db.users.upsert({
      where: { clerk_id: id },
      create: {
        clerk_id: id,
        email: email_addresses[0].email_address,
        name: `${first_name} ${last_name}`,
        created_at: new Date(),
      },
      update: {} // If it somehow already exists, leave it
    });
    break;
  }
  case 'user.updated': {
    const { id, email_addresses, first_name, last_name } = evt.data;
    await db.users.update({
      where: { clerk_id: id },
      data: {
        email: email_addresses[0].email_address,
        name: `${first_name} ${last_name}`,
      }
    });
    break;
  }
  case 'user.deleted': {
    await db.users.update({
      where: { clerk_id: evt.data.id },
      data: { deleted_at: new Date() }
    });
    break;
  }
}

One architectural note from Clerk's own documentation: webhooks are asynchronous and not guaranteed to arrive immediately. Do not use Clerk webhooks in synchronous onboarding flows. If a user signs up and you immediately redirect them to a dashboard that queries your database for their record, the user.created webhook may not have arrived yet. Either query Clerk's API directly for fresh user data, or build your onboarding flow to handle the case where the local user record doesn't exist yet.

Local Development: The Stripe CLI

You can't develop webhook handlers without receiving webhooks. The Stripe CLI solves this by creating a secure tunnel and forwarding events to localhost:

# Install Stripe CLI
brew install stripe/stripe-cli/stripe

# Authenticate
stripe login

# Forward all events to your local server
stripe listen --forward-to localhost:3000/api/webhooks/stripe

# In another terminal — trigger a test event
stripe trigger checkout.session.completed

The CLI prints a webhook signing secret (starting with whsec_) when it starts. Use that secret — not your Dashboard endpoint's secret — in your local environment. They're different keys for different endpoints, and mixing them up is the most common cause of signature verification failures in development.

For Clerk webhooks locally, use ngrok or a similar tunneling tool, and register your ngrok URL as a webhook endpoint in the Clerk Dashboard.

New in 2025: Stripe's Health Events

Stripe's v2 event API introduced v2.core.health.event_generation_failure events — a signal that Stripe itself failed to generate an event for a payment action. Stripe's documentation describes these as rare but real: if a payment_intent.requires_action event fails to generate, your system is now out of sync with Stripe's state without knowing it.

The recovery pattern: when you receive a health failure event, poll the relevant Stripe API resource directly to resync your state. This is another argument for the "fetch fresh from API" pattern — it also serves as your resync mechanism when event delivery breaks down.

Production Checklist

Before shipping any webhook integration to production:

Signature verification is implemented and returns 400/401 on failure — never 200
Raw body is passed to signature verification, before any JSON parsing middleware
Endpoint responds 200 immediately and enqueues work — no blocking operations in the handler
Idempotency checks are in the worker using the event ID as the key
Processed event IDs are stored in the database (not just in memory or Redis — you need durability)
Stripe CLI is used for local development with the correct local secret
Fresh object fetching from the Stripe API on event processing, not trusting stale payload data
Clerk user.created/updated/deleted handlers are idempotent and use upsert operations
Dead letter queue or alerting exists for events that fail processing after max retries
Manual replay tested in Stripe Dashboard and Clerk Dashboard (Svix) for recovery scenarios
Separate secrets for test and production webhook endpoints
Webhook endpoint not behind authentication middleware (it uses signature verification, not session auth)

The Decision Framework in One Sentence

A webhook endpoint has exactly three jobs: verify the signature, enqueue the event, and respond 200 — everything else is the worker's problem.

Every architectural failure in webhook systems comes from collapsing those three jobs into one unprotected, synchronous, non-idempotent handler. Don't do that.

Ask The Guild

Community Prompt: What's the worst webhook incident you've shipped to production? Was it a double-charge, a failed signature check that took down your auth flow, or something even more exotic? Share your war story — and the fix — in the Guild. The messiest incidents teach the most architecture.