Prompt of the Day: Add Rate Limiting to Any API Route

Series: Prompt of the Day — Part 9 of 30

The $67 Bill That Arrived on a Monday Morning

A developer posted on the OpenAI community forums in March 2025: their API spend went from a normal $0.10–$1.00 per day to $67 in 48 hours — 5.2 million tokens from models they didn't even use. Rotated keys, filed a report, and spent the rest of the week rebuilding trust with their boss. The culprit? A compromised API key hitting an endpoint with no rate limiting and no budget guard.

That's a relatively cheap lesson. In 2025, a Volkswagen security researcher found that the MyVW app's OTP verification endpoint had no rate limiting at all — a four-digit one-time passcode brute-forced in seconds with a multithreaded Python script. Same missing control, much bigger blast radius.

And it's not just malicious actors. The Levo.ai 2025 API Security Guide documents a common, boring scenario: a partner integration ships with a bug that fires tens of thousands of requests per minute against your backend. No malice, just a misconfigured retry loop — and without rate limiting in place, your database is on its knees and your SLA is in flames.

OWASP has ranked "Lack of Resources and Rate Limiting" in the API Security Top 10 for years. In 2025, API attacks increased 230% year-over-year, with more than 80% of breaches now occurring at the API layer. Rate limiting is not optional plumbing — it is load-bearing.

The good news: it takes one well-crafted prompt to add it properly.

The Prompt

You are a backend security engineer adding production-grade rate limiting to an API route.

Here is my existing API route:

[PASTE YOUR ROUTE CODE HERE]

Tech stack: [Next.js App Router / Express / Fastify / other]
Deployment target: [Vercel Edge / Node.js server / serverless / other]
Auth: [JWT / session / API key / unauthenticated]

Please:
1. Add rate limiting to this route using the best tool for my stack.
   - For Next.js on Vercel: use @upstash/ratelimit with Upstash Redis (sliding window algorithm).
   - For Node.js/Express: use the `express-rate-limit` package with Redis store for multi-instance safety.
   - For unauthenticated routes: identify the client by IP address from x-forwarded-for headers.
   - For authenticated routes: identify by user ID or API key (preferred over IP).
2. Return proper HTTP 429 responses with these headers:
   - X-RateLimit-Limit
   - X-RateLimit-Remaining
   - X-RateLimit-Reset
   - Retry-After
3. Set a sensible default limit for this type of endpoint:
   - Auth/login endpoints: 5 requests per 15 minutes per IP
   - General API: 100 requests per hour per user
   - Expensive/AI endpoints: 10 requests per minute per user
4. Add a comment explaining the limit choice and how to tune it.
5. Show the environment variables I need to set.

Do not change the existing business logic. Only add rate limiting.

Why It Works

This prompt succeeds because it makes three key decisions concrete before the AI writes a single line of code.

It names the algorithm. The HashBuilds Next.js rate limiting guide explains the difference well: a fixed window counter is easy to implement but has a "burst at boundary" problem — a client can fire 100 requests at 11:59, and another 100 at 12:01, for an effective burst of 200. The sliding window algorithm eliminates this by tracking a rolling time window. By specifying slidingWindow for Upstash, you get the better algorithm without needing to explain why.

It uses the right identifier. Identifying clients by IP sounds simple, but it breaks for corporate NATs (thousands of employees, one IP) and is trivially bypassed by rotating proxies. The prompt prioritizes user ID or API key for authenticated routes, falling back to IP only for public endpoints. This is the pattern recommended by both the NextJS Weekly Server Actions guide and production deployments on Vercel.

It specifies different limits per endpoint type. One global limit for all routes is a blunt instrument — too strict and you frustrate real users on your homepage, too loose and your /api/auth/login is a credential-stuffing buffet. The prompt bakes in endpoint-category-aware limits from the start.

Here is what the output looks like for a Next.js App Router route:

// lib/rate-limit.ts
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';

const redis = new Redis({
  url: process.env.UPSTASH_REDIS_REST_URL!,
  token: process.env.UPSTASH_REDIS_REST_TOKEN!,
});

// Sliding window: smoother enforcement, no burst-at-boundary issues
export const rateLimitConfigs = {
  auth: new Ratelimit({ redis, limiter: Ratelimit.slidingWindow(5, '15 m') }),
  api:  new Ratelimit({ redis, limiter: Ratelimit.slidingWindow(100, '1 h') }),
  ai:   new Ratelimit({ redis, limiter: Ratelimit.slidingWindow(10, '1 m') }),
};

// app/api/chat/route.ts
import { rateLimitConfigs } from '@/lib/rate-limit';
import { auth } from '@/lib/auth';
import { NextRequest, NextResponse } from 'next/server';

export async function POST(req: NextRequest) {
  const session = await auth();
  // Prefer user ID for authenticated routes; fall back to IP
  const identifier = session?.user?.id
    ?? (req.headers.get('x-forwarded-for') ?? '127.0.0.1').split(',')[0];

  const { success, limit, remaining, reset } = await rateLimitConfigs.ai.limit(identifier);

  const headers = {
    'X-RateLimit-Limit':     limit.toString(),
    'X-RateLimit-Remaining': remaining.toString(),
    'X-RateLimit-Reset':     new Date(reset).toISOString(),
    'Retry-After':           Math.ceil((reset - Date.now()) / 1000).toString(),
  };

  if (!success) {
    return NextResponse.json(
      { error: 'Too many requests. Please slow down.' },
      { status: 429, headers }
    );
  }

  // ... existing business logic unchanged
}

Environment setup:

# .env.local
UPSTASH_REDIS_REST_URL=https://your-db.upstash.io
UPSTASH_REDIS_REST_TOKEN=your_token_here

# Install dependencies
npm install @upstash/ratelimit @upstash/redis

Upstash Redis is particularly well-suited for Next.js on Vercel and other serverless platforms because it uses HTTP rather than TCP connections — no connection pooling, no cold-start penalties, and per-request pricing that scales down to zero when traffic is idle.

The Anti-Prompt

Here is what vibe coders typically ask instead:

❌ Add rate limiting to my API.

This fails in every dimension:

No stack context. The AI will pick a library at random — probably express-rate-limit even if you're on Next.js App Router, where it doesn't apply.
No algorithm specified. You'll get a fixed window counter that's trivially bypassed with a burst at the minute boundary.
In-memory storage by default. Your rate limits reset on every cold start and are completely ineffective across multiple serverless instances. One restart and your attacker gets a fresh window.
No response headers. Your client has no idea how long to wait before retrying — leading to retry storms that make the problem worse.
One limit for all routes. Your /healthz endpoint will 429 before your /api/admin/delete-account ever gets throttled.

The Stripe legacy API attack documented by Equixly in 2025 was possible precisely because an older endpoint lacked "advanced rate limiting and fraud detection." The attackers flooded it with card-testing requests — the exact attack pattern that proper per-IP rate limiting on an auth-adjacent endpoint would have made economically unviable.

Variations

Once the base prompt is working, use these follow-up prompts:

Middleware-level protection (recommended for Next.js):

Refactor this route-level rate limiting into Next.js middleware so it 
applies automatically to all routes matching /api/*. Apply the 'auth' 
config to routes matching /api/auth/* and the 'api' config to everything 
else. Make it easy to add per-route overrides.

Tiered limits by subscription:

Add a check that reads the user's subscription tier from the JWT claims 
(field: 'tier', values: 'free' | 'pro' | 'enterprise'). Apply these limits:
- free: 20 requests/hour
- pro: 500 requests/hour  
- enterprise: 5000 requests/hour
Fall back to 'free' limits if the tier claim is missing.

Express with Redis (non-serverless):

Replace the Upstash implementation with express-rate-limit v7 using 
rate-limit-redis as the store. Use ioredis for the Redis connection. 
Keep the same 429 response format and headers.

Graceful degradation:

Wrap the rate limit check in a try/catch. If the Redis connection fails, 
log the error and allow the request through (fail-open strategy). Add a 
warn-level log entry so ops can see when rate limiting is degraded.

Quick Wins Checklist

Before shipping any API route to production:

Rate limiting is applied at the route or middleware level (not just documented in a README)
Client identifier is user ID or API key for authenticated routes; IP only for public ones
Algorithm is sliding window (not fixed window) for sensitive endpoints
Rate limit state is stored in Redis or another distributed cache — not in-memory
HTTP 429 responses include X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, and Retry-After headers
Auth and login routes have stricter limits (5–10/min) than general API routes (100+/hr)
AI or compute-heavy endpoints have their own, tighter limit
Rate limiter fails open (allow traffic) rather than failing closed (block everything) if the cache is unreachable
Limit values are documented with rationale, not just magic numbers
You've tested by actually hitting the limit in development

Ask The Guild

This week's community prompt: Have you ever been hit by an unexpected bill, outage, or security incident that better rate limiting would have prevented? What was the endpoint, what happened, and what did you add afterward? Drop your war story (and your fix) in the #prompt-of-the-day channel — the best submissions get featured in the weekly roundup.

Prompt of the Day: Add Rate Limiting to Any API Route

The $67 Bill That Arrived on a Monday Morning

The Prompt

Why It Works

The Anti-Prompt

Variations

Quick Wins Checklist

Ask The Guild

Review and debug

System Prompts — .cursorrules and CLAUDE.md Explained

Turn this workflow advice into a durable operating system

Working With AI Tools

About Tom Hundley