Skip to content
Architecture Patterns — Part 9 of 30

API Versioning: Change Without Breaking Clients

Written by claude-sonnet-4 · Edited by claude-sonnet-4
api-versioningbreaking-changesapi-designbackward-compatibilitystripedeprecationrest-apiarchitecture-patterns

Architecture Patterns — Part 9 of 30


The Monday Morning Everything Broke

The team had done everything right. Three months of planning. A clean /api/v2/ URL structure. A 90-day migration window with documentation. They deployed on Friday afternoon when traffic was low. Monitoring looked green all weekend.

Monday, 9:47 AM: 10,000 users couldn't log in. Mobile apps crashed on open. A payment processor that had been quietly running for two years — undocumented, handling $50,000 in daily transactions — started failing. Payment success rate dropped from 99% to 12%.

The war room lasted 6 hours. By the time it was over: $47,000 in lost payments, 400 support hours, and their best backend engineer had started sending out his resume. Three major clients opened evaluations with competitors. The business took six months to recover.

The brutal part? Their API versioning strategy was technically correct. They had URL versioning, migration guides, deprecation notices. They had done what every tutorial tells you to do.

What they had missed was the gap between the API they thought they had and the API that was actually running in production. They documented 12 integration patterns. Their logs showed 47. Nobody had gone looking.

This is Part 9 of the Architecture Patterns series. We're not going to talk about which versioning strategy is "best." We're going to build a decision framework for choosing the right approach before you ship — and for avoiding the failure modes that bite teams in production.


What You're Actually Solving

Before picking a strategy, you need to be precise about the problem you're solving. API versioning addresses a single core constraint: you need to change your API without breaking clients who depend on the current behavior.

That constraint sounds simple. It isn't. Three things make it hard:

1. You don't know all your clients. Internal apps, third-party integrations, mobile apps with old versions still installed, scripts written by contractors two years ago. Production traffic patterns are almost always more diverse than the documentation suggests.

2. Clients don't migrate on your schedule. You give 90 days. 8% migrate in the first 60. 23% by day 89. Then you have a choice: break the other 77%, or extend indefinitely. Both are bad.

3. Breaking changes aren't always obvious. Removing a field is obviously breaking. Changing a field from a required string to an optional object is obviously breaking. Changing an HTTP status code from 200 to 201 on success? Also breaking — if a client is checking the exact status code. Reordering fields in a JSON response? Breaking — if someone parsed it positionally.

These aren't hypotheticals. They're the failure modes that appear in post-mortems.


A Taxonomy of Breaking Changes

Before choosing a versioning strategy, you need a shared definition of what requires a version bump in the first place.

Definitely breaking:

  • Removing an endpoint
  • Renaming a field in a request or response
  • Changing a field type (string → integer, object → array)
  • Adding a required field to a request
  • Removing a field from a response
  • Changing HTTP status codes on existing paths
  • Changing authentication schemes
  • Changing error response structure

Probably not breaking (but verify):

  • Adding a new optional field to a response
  • Adding a new endpoint
  • Adding a new optional parameter to a request
  • Adding a new enum value to a response field (can break exhaustive switches)
  • Relaxing validation rules

It depends:

  • Changing behavior without changing the interface (what did clients expect?)
  • Changing pagination defaults
  • Changing sort order
  • Adding new enum values to request fields

The rule: if any existing client could receive an error or unexpected result from the change, it's breaking. When in doubt, treat it as breaking.


The Four Versioning Strategies

There are four main approaches. Here's an honest breakdown:

1. URL Path Versioning

GET /api/v1/users/42
GET /api/v2/users/42

The most common approach. Version lives in the URL — explicit, cache-friendly, easy to route, easy to test in a browser.

# FastAPI example
from fastapi import FastAPI

app = FastAPI()

@app.get("/api/v1/users/{user_id}")
async def get_user_v1(user_id: int):
    user = db.get_user(user_id)
    return {"id": user.id, "name": user.name}  # v1 shape

@app.get("/api/v2/users/{user_id}")
async def get_user_v2(user_id: int):
    user = db.get_user(user_id)
    return {  # v2 shape
        "id": user.id,
        "display_name": user.name,  # renamed field
        "email": user.email,        # new field
        "created_at": user.created_at.isoformat()
    }

When it's right: Public APIs, APIs consumed by many independent teams, APIs where discoverability matters.

The trap: URL versioning doesn't solve the migration problem — it just makes the old version visible. You still have to deprecate and shut down v1 eventually. And every endpoint gets duplicated.

2. Header Versioning

GET /api/users/42
Stripe-Version: 2024-11-20

Version travels in a request header, not the URL. Stripe uses this approach — they've maintained backward compatibility for every version since 2011. The current Stripe API version as of early 2026 is 2026-02-25.

from fastapi import FastAPI, Header
from typing import Optional

app = FastAPI()

@app.get("/api/users/{user_id}")
async def get_user(
    user_id: int,
    api_version: Optional[str] = Header(None, alias="X-API-Version")
):
    user = db.get_user(user_id)
    
    if api_version == "2023-01-01" or api_version is None:
        return {"id": user.id, "name": user.name}
    elif api_version >= "2024-01-01":
        return {
            "id": user.id,
            "display_name": user.name,
            "email": user.email,
            "created_at": user.created_at.isoformat()
        }
    else:
        return {"error": "Unsupported API version"}, 400

When it's right: When you want stable URLs. When you're building an API where developer experience is a competitive advantage. When you can invest in the transformation layer.

The trap: Harder to test in a browser. Invisible in logs unless you explicitly log headers. Requires more discipline from API clients.

3. Query Parameter Versioning

GET /api/users/42?version=2024-01-01

Version as a query parameter. Low friction to implement. Popular for internal APIs and APIs where the clients are controlled.

// Express.js example
app.get('/api/users/:id', (req, res) => {
  const version = req.query.version || '2023-01-01';
  const user = db.getUser(req.params.id);
  
  if (version >= '2024-01-01') {
    res.json({
      id: user.id,
      displayName: user.name,
      email: user.email
    });
  } else {
    res.json({
      id: user.id,
      name: user.name
    });
  }
});

When it's right: Internal APIs. APIs used by a small number of known clients. Exploratory or experimental APIs.

The trap: Query params show up in URLs, logs, and browser history — problematic for sensitive operations. Caching is messier. Clients often omit the parameter and expect your default to always match what they tested against.

4. Content Negotiation (Media Type Versioning)

GET /api/users/42
Accept: application/vnd.yourapi.v2+json

Version expressed through the HTTP Accept header using custom media types. The most RESTful approach, the least common in practice.

When it's right: APIs where fine-grained content negotiation is genuinely needed. Hypermedia APIs. Academic REST purists.

The trap: Almost no one does this correctly. The Stackademic case study above is instructive: they implemented Accept header versioning, and zero production clients used it. All of them relied on the URL. Don't build infrastructure nobody will use.


The Decision Framework

Choose your strategy against four axes:

Axis URL Path Header Query Param Content Negotiation
Visibility High Low Medium Very Low
Caching Simple Complex Moderate Complex
Client effort Low Medium Low High
Infra cost High (duplicate routes) Medium (transform layer) Low High

Use URL path versioning if:

  • Your API is public and consumed by many external teams
  • You need humans to easily identify which version they're hitting
  • You're doing big, structural changes (v1 → v2 is a genuine redesign)

Use header (date-based) versioning if:

  • Developer experience is a competitive moat for you
  • You're making frequent incremental changes
  • You can build and maintain a response transformation layer
  • You want to follow the Stripe model

Use query parameter versioning if:

  • You control all your clients (internal API)
  • You need something working fast
  • You're not serving public third-party integrations

How Stripe Actually Does It

Stripe's versioning approach has become the industry reference point for good reason: they've shipped almost 100 backwards-incompatible upgrades and maintained compatibility with every version since 2011. The seven lines of Ruby written to charge a card in 2011 still work today.

The architecture has three key pieces:

1. Account pinning. The first time you make an API request to Stripe, your account is automatically pinned to the latest available version. Every subsequent call uses that version — you never accidentally receive a breaking change.

2. Per-request override. You can override the version for any single request via the Stripe-Version header. This lets you test new versions before committing to an upgrade.

3. The transformation layer. This is the hard part. When Stripe generates a response, they first format the data using the current schema — the latest version. Then they walk backwards through time, applying transformation modules in reverse, until the response matches the shape expected by your pinned version.

Each transformation module encapsulates a single backwards-incompatible change: what it affects, how to transform data to match the older behavior, and which API resource types it applies to. They're declarative, composable, and isolated from each other.

# Conceptual illustration of Stripe's transformation approach
class VersionTransformer:
    def __init__(self):
        # Ordered from newest to oldest
        self.changes = [
            VersionChange(
                version="2024-11-20",
                description="event.request changed from string to object",
                transform=self._transform_event_request_to_object,
                affects=["Event"]
            ),
            VersionChange(
                version="2023-10-16",
                description="payment_method replaced source in PaymentIntent",
                transform=self._transform_payment_intent_source,
                affects=["PaymentIntent"]
            ),
            # ... many more
        ]
    
    def transform_response(self, data, resource_type, target_version):
        """Walk backwards from current version to target version"""
        result = data.copy()
        
        for change in self.changes:
            if change.version > target_version:
                if resource_type in change.affects:
                    result = change.transform(result)
            else:
                break  # We've gone back far enough
        
        return result

The trade-off Stripe makes explicit: they absorb the complexity so developers don't have to. The transformation layer is expensive to build and maintain. It requires discipline across every team adding new API changes to always write a corresponding transformation module. But for Stripe, developer trust is the product — breaking a client's payment integration isn't just a bug, it's an existential risk.


The Deprecation Strategy Nobody Talks About

The versioning mechanism is only half the problem. The other half is getting clients off old versions.

The Stackademic incident happened precisely because the team trusted the 90-day clock. 77% of clients didn't migrate voluntarily. This is normal. Developers don't migrate until they have to — and "have to" usually means "something broke."

Here's what actually works:

Deprecation headers in every response

Before you remove anything, add deprecation warnings to API responses while clients are still using the old version:

from fastapi import Response
import datetime

@app.get("/api/v1/users/{user_id}")
async def get_user_v1(user_id: int, response: Response):
    # Add deprecation headers to every v1 response
    sunset_date = "2026-06-01T00:00:00Z"
    response.headers["Deprecation"] = "true"
    response.headers["Sunset"] = sunset_date
    response.headers["Link"] = '</api/v2/users/{user_id}>; rel="successor-version"'
    
    user = db.get_user(user_id)
    return {"id": user.id, "name": user.name}

These are IETF-standardized headers (Sunset is RFC 8594). Good API clients can parse them automatically. Bad ones still get the information in their logs if someone goes looking.

Track actual usage before you deprecate

Before touching anything, run a query against your API logs:

-- Who is still hitting v1 endpoints in the last 30 days?
SELECT 
    client_id,
    endpoint,
    COUNT(*) as request_count,
    MAX(request_time) as last_seen
FROM api_access_logs
WHERE 
    api_version = 'v1'
    AND request_time > NOW() - INTERVAL '30 days'
GROUP BY client_id, endpoint
ORDER BY request_count DESC;

This query will find integrations you didn't know existed. The Stackademic team documented 12 integration patterns. Their logs showed 47. The gap between those numbers is where outages live.

Contact the stragglers directly

For any client making significant traffic to a deprecated endpoint, send a direct message — not a mass email, not a changelog entry, not a dashboard notification. A direct message to a specific engineer with the specific endpoint and specific date.

This feels manual. It is manual. It's also the only thing that works reliably for the last 15% who never read documentation.


The Non-Breaking Change Playbook

The best versioning strategy is the one you don't need. Many changes that feel breaking can be made backwards-compatible with the right technique.

Expand-Contract pattern for field renames:

Don't rename a field — add the new field alongside the old one, then remove the old one in a future version.

# Don't do this (breaking):
# Old: {"name": "Tom"}
# New: {"display_name": "Tom"}

# Do this instead:
# Phase 1: Add new field, keep old field
{"name": "Tom", "display_name": "Tom"}  # Both exist

# Phase 2: After all clients migrate to display_name, remove name
{"display_name": "Tom"}

Tolerate unknown fields on input:

class UserUpdateRequest(BaseModel):
    display_name: Optional[str] = None
    email: Optional[str] = None
    
    class Config:
        extra = "ignore"  # Don't reject fields we don't recognize

Return additive changes safely:

Adding new fields to responses is non-breaking — if clients are written to ignore unknown fields. The problem is clients that break on unexpected fields. Document this expectation explicitly in your API contract, and test it.

Use feature flags for behavior changes:

// Gate new behavior behind a flag during transition
const useNewPaginationDefault = featureFlags.isEnabled('new-pagination-default', clientId);
const pageSize = useNewPaginationDefault ? 20 : 10;

The OpenAI Deprecation Model

For contrast, consider how OpenAI handles model deprecations in their API. When they deprecated gpt-4.5-preview in April 2025, they gave 90 days notice, identified the specific organizations that had used the model in the prior 30 days, sent direct emails to those organizations, named a specific replacement (gpt-4.1), and provided a hard shutdown date (July 14, 2025).

The pattern is precise:

  1. Announce with a specific shutdown date (not "in the coming months")
  2. Identify every active user from logs, not from signups
  3. Provide a named replacement, not just "upgrade to the latest"
  4. Send targeted communication, not just a changelog

Notice what they don't do: they don't maintain the old model indefinitely. They pick a date and hold it. The communication overhead would be unsustainable if they tried to keep every model running forever.

This is the right tradeoff for a model API where the underlying compute cost is tied to the specific model being served. It's a different tradeoff than Stripe's date-based versioning, where the old behavior is just a transformation in software. Match your deprecation strategy to your operational costs.


Implementation Checklist

Before you ship a versioning strategy:

Discovery

  • Run a log analysis to find all active clients and integration patterns
  • Cross-reference documented integrations against actual traffic
  • Identify any undocumented internal consumers (cron jobs, scripts, legacy services)
  • Document the specific endpoints and response fields each client depends on

Design

  • Define what constitutes a breaking change for your API (write it down)
  • Choose a versioning strategy that matches your client distribution
  • Design your deprecation header strategy (Deprecation, Sunset, Link)
  • Plan your transformation/compatibility layer if using header versioning
  • Set a deprecation policy: minimum notice period, how you'll communicate, how you'll enforce

Implementation

  • Version routing is explicit — no guessing from defaults
  • Deprecation warnings appear in response headers on deprecated versions
  • Version information appears in API logs for every request
  • You have a rollback plan that doesn't require a full revert
  • Feature flags gate behavior changes during the transition window

Pre-deprecation

  • Query logs for active users of the deprecated endpoint in the last 30/60/90 days
  • Send direct outreach to every active integration owner, not just a mass announcement
  • Run a canary period with a small set of real clients before widening
  • Verify your compatibility layer against production data, not staging data

Monitoring

  • Dashboards show traffic split by API version
  • Alerts fire on unexpected spikes in old-version traffic
  • Error rates are tracked per version, not just aggregate
  • You measure user-level impact (auth failures, payment failures) not just server metrics

Ask The Guild

Here's the question I want you to sit with this week:

What does your API's actual usage look like versus your documented usage?

Pull your API access logs from the last 30 days. Group by client, endpoint, and any version headers. Compare that list against your documented integrations.

How big is the gap?

Share what you find in the comments. I'm specifically interested in: What's the most surprising undocumented integration you've discovered in your own production traffic? And what versioning approach have you found actually gets clients to migrate — not just in theory, but in practice?


Next in the series: Part 10 — Event-Driven Architecture: When to Break the Request-Response Model

Copy A Prompt Next

Think in systems

If this article changed how you think about the problem, copy a prompt that turns that judgment into one safe, reviewable next step.

Matching public prompts

7

Keep the task scoped, copy the prompt, then inspect one reviewable diff before the agent continues.

Need the safest first move instead? Open the curated sample prompts before you browse the broader library.

Foundations for AI-Assisted BuildersFoundations for AI-Assisted Builders

Choosing Your Tech Stack — A Decision Framework

A practical framework for choosing the right tools and technologies for your project — with sensible defaults for AI-assisted builders.

Preview
"Recommend a tech stack for this project.
Project type: [describe it]
Constraints: [budget, hosting, mobile, data, auth, payments, privacy]
My experience level: [describe it]
Give me:
Architecture

Translate this architecture idea into system-level judgment

Architecture articles sharpen judgment. The system-design paths give you the layered context behind the tradeoffs so you can reuse the pattern instead of memorizing a slogan.

Best Next Path

APIs and Integrations

Guild Member · $29/mo

Learn the contracts, webhooks, auth patterns, and integration disciplines that keep production systems composable instead of brittle.

20 lessonsIncluded with the full Guild Member library

Need the free route first?

Start with Start Here — Build Safely With AI if you want the workflow and vocabulary before you dive into the deeper path above.

T

About Tom Hundley

Tom Hundley writes for builders who need stronger technical judgment around AI-assisted software work. The Guild turns production experience into public articles, copy-paste prompts, and structured learning paths that help non-software developers supervise AI agents more safely.

Do this next

Leave this article with one concrete move. Copy the matching prompt, or start with the path that teaches the safest next skill in sequence.