Event-Driven Architecture: When and How to Decouple

Architecture Patterns -- Part 23 of 30

The Night the Checkout Broke Everything

It was Black Friday, 11:47 PM. An e-commerce team's notification service -- the one that sends order confirmation emails -- had a bug. A database connection pool was exhausted. Under normal traffic, the bug was invisible. Under peak load, it caused the notification service to hang. And because that service was called synchronously inside the checkout flow, checkout hung too. The entire payment path stalled. Orders were lost. The root cause? Not a bad database config. Coupling.

The checkout service had no business waiting for an email to send before confirming a payment. But it did, because someone wired it that way years ago, and nobody questioned it. That is the tax of tight coupling: a failure in a peripheral concern becomes a failure in your critical path.

Shopify has spent years solving exactly this problem at a scale most of us will never face -- 66 million messages per second during Black Friday peaks, handling checkout, payments, and inventory across a globally distributed system. Their solution is Apache Kafka at the core of their architecture, treating every significant state change as an event that any consumer can react to independently. The checkout service does not call the notification service. It emits an order.placed event. The notification service listens. They have never met.

That is event-driven architecture. Let's talk about when you actually need it, and how to build it without over-engineering your way into a new set of problems.

What Event-Driven Architecture Actually Is

Event-driven architecture (EDA) is a pattern where components communicate by emitting and consuming events rather than calling each other directly. That is it. Everything else -- Kafka, message queues, serverless functions -- is implementation detail.

The core model is simple:

Producer: something happens, a record is written to a database, a payment succeeds, a user signs up. The producer emits an event describing what happened.
Event: an immutable, named fact about the past. user.signed_up. payment.succeeded. order.placed. Events are past tense because they already happened.
Consumer: a service or function that reacts to the event. It has no idea who emitted it. It does not need to.

What EDA is not: it is not just webhooks, it is not microservices (you can have EDA in a monolith), and it is not a requirement to use Kafka. Those are specific implementations of the broader idea.

Why Coupling Is the Enemy

Direct service-to-service calls create temporal coupling: Service A cannot complete until Service B responds. They also create logical coupling: Service A must know Service B's API contract, its error modes, its latency profile.

When your system is small, this is fine. When it grows, these couplings accumulate into a web of dependencies where a slow email provider degrades your checkout, a broken analytics pipeline prevents user signups, and deploying any one service becomes a negotiation with five others.

A DZone case study from August 2025 documented a large e-commerce company migrating from a monolith serving over 4,000 requests per second to an event-driven microservices architecture. The key outcome was not raw throughput -- it was that teams could scale and deploy services independently, because they no longer shared a dependency graph.

SumUp, the global payments company, reported in 2025 that their Kafka-based event-driven architecture processes millions of payment events daily across 30+ countries. The explicit benefit they called out was developer velocity: teams can consume event data without coordinating with the producing team.

Decoupling buys you independent deployability, independent scalability, and failure isolation. It costs you simplicity.

The Implementation Spectrum

Not every project needs Kafka. Here is the realistic range of options, from simplest to most robust:

1. In-Process Event Emitters (Start Here)

For smaller applications, Node.js's built-in EventEmitter or a simple pub/sub within a single process is often enough:

import { EventEmitter } from 'events';

const eventBus = new EventEmitter();

// Producer
eventBus.emit('user.signed_up', {
  userId: 'usr_123',
  email: 'alex@example.com',
  timestamp: new Date().toISOString(),
});

// Consumer
eventBus.on('user.signed_up', async (event) => {
  await createDefaultWorkspace(event.userId);
});

eventBus.on('user.signed_up', async (event) => {
  await sendWelcomeEmail(event.email);
});

This costs you nothing operationally. You lose durability -- if the process dies mid-flight, the event is gone. For non-critical side effects in a small app, that tradeoff is often acceptable.

2. Webhooks (Familiar, Useful, Limited)

Webhooks are the most common introduction to event-driven thinking. Your payment provider (Stripe, Paddle) calls your endpoint when payment.succeeded. You react. Webhooks are synchronous HTTP calls in the direction of consumer-pull, which means the producer blocks until your endpoint responds -- not truly decoupled, but directionally correct.

The limitation is reliability. If your endpoint is down, you depend on the provider's retry policy. If your handler is slow, you block their delivery thread. For incoming third-party events, webhooks are often your only option. For internal system events, you can do better.

3. Supabase Realtime + Database Triggers

If you are building on Supabase, you already have an event-capable backbone. Supabase Realtime streams changes from the Postgres Write-Ahead Log (WAL) over WebSockets. Combine it with database triggers to emit structured events:

-- Trigger fires on new user row
CREATE OR REPLACE FUNCTION notify_user_signed_up()
RETURNS trigger AS $$
BEGIN
  PERFORM pg_notify(
    'user_events',
    json_build_object(
      'event', 'user.signed_up',
      'userId', NEW.id,
      'email', NEW.email,
      'timestamp', NOW()
    )::text
  );
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER user_signed_up_trigger
AFTER INSERT ON users
FOR EACH ROW EXECUTE FUNCTION notify_user_signed_up();

On the consumer side:

const channel = supabase
  .channel('user-events')
  .on('postgres_changes', { event: 'INSERT', schema: 'public', table: 'users' },
    async (payload) => {
      await createDefaultWorkspace(payload.new.id);
    }
  )
  .subscribe();

The limitation: Supabase Realtime requires a connected client. It works well for in-app UI reactions and lightweight server-side triggers, but not for durable background processing or email and SMS pipelines that need to survive disconnections. For those, use the outbox pattern or a managed queue.

4. Inngest or Trigger.dev (The Right Tool for Most Builder Projects)

For most builders working on production systems, Inngest or Trigger.dev occupies the sweet spot: zero infrastructure, durable execution, automatic retries, and a local dev UI that makes debugging tractable.

Here is an Inngest pattern for the classic user.signed_up flow:

// Emit from your API route (returns in under 100ms)
await inngest.send({
  name: 'user/signed_up',
  data: {
    userId: user.id,
    email: user.email,
    plan: 'free',
    signedUpAt: new Date().toISOString(),
  },
});

// Consumer 1: Create workspace
inngest.createFunction(
  { id: 'create-default-workspace', retries: 3 },
  { event: 'user/signed_up' },
  async ({ event, step }) => {
    const workspace = await step.run('create-workspace', async () => {
      return db.workspaces.create({ ownerId: event.data.userId });
    });

    await step.run('seed-workspace-defaults', async () => {
      return seedWorkspaceTemplates(workspace.id);
    });
  }
);

// Consumer 2: Send welcome email (completely independent)
inngest.createFunction(
  { id: 'send-welcome-email', retries: 5 },
  { event: 'user/signed_up' },
  async ({ event }) => {
    await sendEmail({
      to: event.data.email,
      template: 'welcome',
      data: { plan: event.data.plan },
    });
  }
);

Both functions trigger from the same event. If workspace creation fails at step 2, step 1 does not re-run. If the email fails, workspace creation is unaffected. This is durable execution: your logic survives partial failures and resumes from checkpoints.

A January 2026 write-up on DEV Community documented an API route that previously took 1-2 seconds (waiting for email delivery) dropping to under 100ms after moving to Inngest. The user got instant feedback. The email still sent. Nobody lost anything.

The 5-Question Decision Framework

Before you decouple anything, run it through this:

Does the caller need the result? If your API route needs to return the created workspace ID immediately, you cannot fire-and-forget. Keep it synchronous or return a job ID.
Does this need to survive failures independently? If sending a welcome email fails, should it retry without retrying the signup? Yes. Decouple it.
Do multiple consumers need to react? If user.signed_up triggers workspace creation, welcome email, Slack notification, and analytics ingestion, decouple. Otherwise you are maintaining four synchronous calls in sequence.
Is this in the critical path? Sending email is not. Charging a credit card is. Keep payment processing synchronous; handle the downstream consequences (provisioning, receipts, analytics) asynchronously.
Is your team comfortable with eventual consistency? If the answer is no, and the business logic requires that both things happen atomically, keep it synchronous and transactional.

When to Use / When Not to Use

Situation	Recommendation
Side effects with multiple consumers	Use events
Non-critical-path processing (email, analytics)	Use events
Long-running or retryable operations	Use events
Independent team or service ownership	Use events
Failure isolation is a hard requirement	Use events
Single consumer, simple flow	Stay synchronous
User is waiting for the result directly	Stay synchronous
Atomic transactions required	Stay synchronous
Small app, single deployment unit	Stay synchronous
Debug simplicity is paramount	Stay synchronous

Event Schema Design: Version From Day One

Events are contracts. Once a consumer depends on user/signed_up containing email, removing that field is a breaking change -- and unlike API versioning, you may not know who is consuming your events.

Design schemas defensively from the start:

// Explicit versioning, required fields separated from optional
interface UserSignedUpEvent {
  version: '1.0';
  userId: string;
  email: string;
  plan: 'free' | 'pro' | 'enterprise';
  signedUpAt: string; // ISO 8601
  // Optional fields -- consumers must not assume these exist
  referralCode?: string;
  utmSource?: string;
}

When you need to change an event shape, emit both the old version and a new versioned event (user/signed_up.v2) until all consumers have migrated. Deleting event fields is always a breaking change. Adding optional fields is safe.

The Real Tradeoffs

Event-driven architecture solves coupling. It introduces other problems you need to plan for:

Eventual consistency. When order.placed triggers fulfillment and notification independently, there is a window where the order is placed but the email has not sent. For most use cases this is fine. For some (financial ledgers, medical records) it is not.

Lost events. In-process emitters lose events on crash. Webhooks depend on provider retry policies. Use a managed queue (Inngest, Trigger.dev, SQS) if you cannot afford to lose events.

Duplicate processing. Managed queues often guarantee at-least-once delivery. Your consumers must be idempotent -- processing the same event twice should produce the same result. Use the event ID as an idempotency key.

Ordering guarantees. Kafka partitions preserve order within a partition key. Most managed queues do not guarantee global ordering. If you need user.created processed before user.updated, partition by user ID or use a queue that supports FIFO semantics.

Observability. Synchronous systems are easy to trace: one request, one stack trace. Event-driven systems require distributed tracing. Budget time for this before you are debugging a lost event at 2 AM. Inngest and Trigger.dev both provide first-class visibility into step execution and retry history -- a significant reason to prefer them over rolling your own queue.

Decision Checklist

Before you add an event to your system, confirm:

The emitting service does not need to know which consumers exist
Failure in a consumer must not fail the producer
The event name is past tense and describes a fact, not an instruction
The event schema is versioned and documented
Consumers are idempotent (safe to call twice)
You have a plan for observability: how will you know if an event was not processed?
You understand the consistency model: is eventual consistency acceptable here?
You have chosen the right delivery mechanism for your reliability requirements

If you cannot check all of these, do not decouple yet. Tight coupling with clear ownership is better than loose coupling with undefined contracts.

Ask The Guild

The hardest part of event-driven architecture is not the technology -- it is knowing when to stop. Where have you drawn the line in your own systems between "this should be async" and "this needs to be synchronous"? Share a specific example: what was the trigger, and did it turn out to be the right call?

Architecture Patterns is a 30-part series building from first principles to production-grade decisions. Part 24 covers CQRS and read model design.