Rate Limiting Your Own API: Protection from Yourself
Part 28 of 30: Production Ready
The call came on a Tuesday morning. A developer I mentor -- let's call him Marcus -- pinged me with a screenshot of his Vercel dashboard. His bill had gone from $40 to $1,100 in a single weekend. He hadn't launched anything new. He hadn't run any campaigns. He'd just gone to sleep Friday night and woken up to a disaster.
The culprit wasn't a hacker. It wasn't a competitor trying to take him down. It was a misconfigured retry loop in his own frontend code that had been silently hammering his own /api/recommendations endpoint for 72 hours straight. Every failed request triggered an immediate retry. No backoff. No limit. Just an infinite waterfall of serverless function invocations, each one billable.
This is what I call getting DDoS'd by yourself.
Marcus is not alone. In early 2025, a developer posted on Reddit about an unexpected $1,100 Vercel bill from a similar runaway process. Another case circulated around the same time where an AI bot -- specifically Claude -- drove a site's monthly requests from 3 million to 40 million, turning a manageable bill into tens of thousands of dollars. Neither of those developers had rate limiting in place. Both wished they had.
Meanwhile, the broader threat landscape keeps escalating: Cloudflare reported a 358% increase in DDoS attacks in Q1 2025 compared to the same period in 2024. Over 700 hyper-volumetric attacks exceeded 1 terabit per second. Even smaller attacks "can saturate a link or knock down unprotected services."
Rate limiting is not optional. It's table stakes.
The Three Algorithms You Need to Understand
Rate limiting algorithms are not interchangeable. Picking the wrong one for your use case creates gaps.
Fixed Window is the simplest. Count requests in a discrete time slot -- say, 100 requests per minute from 12:00 to 12:01. When the minute resets, the counter resets. The problem: a burst of 100 requests at 12:00:58 followed by another 100 at 12:01:02 gives an attacker 200 requests in four seconds. This is the "boundary problem." Fixed window is fine for coarse limits but leaks badly at the seams.
Sliding Window solves this by computing the rate over a rolling window. If the window is 60 seconds and a request arrives at 12:01:30, the algorithm looks back at requests from 12:00:30 to 12:01:30 -- not from the top of the minute. Smoother protection, slightly more storage overhead, but worth it for auth endpoints.
Token Bucket is the most flexible. Imagine a bucket that holds 100 tokens. Every request consumes a token. The bucket refills at a steady rate -- say, 10 tokens per second. Bursting is allowed up to the bucket's capacity, but sustained abuse drains it dry. This is the model most CDNs and API gateways use under the hood, and it's what Vercel's Enterprise WAF exposes directly.
For most Next.js apps on Vercel, sliding window via Upstash is the practical default.
Your Three Implementation Options
Option 1: Vercel's Built-in WAF
Vercel's Web Application Firewall includes native rate limiting, generally available as of late 2024. You configure rules in the dashboard -- no code required, no redeployment. Rules take effect immediately. Pro plans get fixed-window limiting; Enterprise gets token bucket. The key advantage: rate-limited traffic is blocked before it reaches your functions, so you aren't billed for blocked requests.
The limitation: WAF rules operate on infrastructure-level signals like IP, path, and user agent. They can't see application-level context like a user ID or subscription tier. For that, you need code.
Option 2: Upstash Redis + @upstash/ratelimit
This is the workhorse solution for most production Next.js apps. Upstash is a serverless Redis built specifically for edge and serverless environments. Traditional Redis uses persistent TCP connections, which are incompatible with serverless functions that spin up and tear down per request. Upstash uses HTTP -- each request is stateless, no connection pool needed.
The @upstash/ratelimit library ships fixed window, sliding window, and token bucket algorithms. It tracks state in Upstash Redis, which means your limits are consistent across all function instances globally.
Install it:
npm install @upstash/ratelimit @upstash/redis
Option 3: In-Memory Middleware (Dev/Edge-Lite Only)
For local development or simple edge cases, you can implement a fixed-window counter in Next.js middleware using a Map. This works for single-instance scenarios but loses state on cold starts and doesn't scale across multiple instances. Do not rely on this in production.
A Complete Production Example
Here is a full TypeScript implementation using Upstash in a Next.js API route, with proper headers and graceful error handling:
// lib/ratelimit.ts
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";
const redis = Redis.fromEnv();
export const apiRatelimit = new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(100, "1 h"),
analytics: true,
prefix: "rl:api",
});
export const authRatelimit = new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(5, "15 m"),
analytics: true,
prefix: "rl:auth",
});
// middleware.ts
import { NextRequest, NextResponse } from "next/server";
import { apiRatelimit, authRatelimit } from "@/lib/ratelimit";
export async function middleware(request: NextRequest) {
const ip =
request.ip ??
request.headers.get("x-forwarded-for") ??
"127.0.0.1";
const isAuthRoute = request.nextUrl.pathname.startsWith("/api/auth");
const limiter = isAuthRoute ? authRatelimit : apiRatelimit;
const { success, limit, remaining, reset } = await limiter.limit(ip);
const headers = new Headers();
headers.set("X-RateLimit-Limit", limit.toString());
headers.set("X-RateLimit-Remaining", remaining.toString());
headers.set("X-RateLimit-Reset", new Date(reset).toISOString());
if (!success) {
const retryAfter = Math.ceil((reset - Date.now()) / 1000);
headers.set("Retry-After", retryAfter.toString());
return NextResponse.json(
{
error: "Too many requests",
retryAfter,
},
{ status: 429, headers }
);
}
const response = NextResponse.next();
headers.forEach((value, key) => response.headers.set(key, value));
return response;
}
export const config = {
matcher: "/api/:path*",
};
Your environment variables:
UPSTASH_REDIS_REST_URL=https://your-db.upstash.io
UPSTASH_REDIS_REST_TOKEN=your_token_here
What to Rate Limit and What Limits to Set
Not everything needs the same limit. Here is a baseline:
| Endpoint | Limit | Window | Notes |
|---|---|---|---|
| Login / signup | 5 | 15 min | Brute force protection |
| Password reset | 3 | 1 hour | SMS/email cost protection |
| General API | 100 | 15 min | Adjust by user tier |
| File upload | 10 | 1 hour | Storage cost protection |
| AI/LLM calls | 20 | 1 hour | Per-call cost can be significant |
These are starting points, not laws. Monitor your actual traffic patterns for the first week after deployment. If legitimate users are hitting limits, raise them. If abuse is getting through, tighten them.
IP-Based vs. User-Based Limiting
IP-based limiting is the right default for unauthenticated endpoints. It requires no user context and stops most opportunistic abuse. The weakness: shared IP addresses (corporate NAT, IPv6 prefixes, mobile carriers) can penalize groups of legitimate users.
Once a user is authenticated, switch to user-based limiting. Use the user's ID as the rate limit key instead of the IP. This is more accurate, harder to bypass by rotating IPs, and lets you offer tiered limits -- free tier gets 100 requests per hour, paid tier gets 1,000.
// After authentication, use user ID as the key
const identifier = session?.user?.id ?? ip;
const { success } = await apiRatelimit.limit(identifier);
Rate Limit Headers: Tell Clients What Happened
A 429 without context is user-hostile. Always return these headers:
X-RateLimit-Limit: the maximum allowed in the windowX-RateLimit-Remaining: how many are leftX-RateLimit-Reset: ISO timestamp when the window resetsRetry-After: seconds until the client should retry
Good clients will read Retry-After and back off automatically. The code example above sets all four. Make this a habit.
Checklist: Rate Limiting for Production
- Install
@upstash/ratelimitand@upstash/redis - Create a free Upstash Redis database (first 500K commands/month free as of 2025)
- Add environment variables to Vercel project settings
- Set strict limits on auth endpoints: login, signup, password reset, OTP
- Set general API limits scaled to your user tiers
- Add explicit limits to any endpoint that calls an external paid API (OpenAI, Stripe, etc.)
- Return
X-RateLimit-*andRetry-Afterheaders on every response - Switch from IP-based to user-ID-based limiting once users are authenticated
- Enable Vercel WAF rate limiting rules for infrastructure-level protection
- Set a Vercel spend limit so a future spike has a hard ceiling
- Review Upstash analytics after 7 days and calibrate limits
Ask The Guild
What is the most unexpected source of API abuse you have encountered in production -- was it bots, a competitor, or your own code? And what did you put in place after? Share your war story in the Discord.