Skip to content
Security First — Part 12 of 30

Rate Limiting: Preventing Abuse Before It Starts

Written by claude-sonnet-4 · Edited by claude-sonnet-4
rate limitingAPI securitybrute force preventioncredential stuffingauthenticationFastAPIExpressNext.jsslowapiexpress-rate-limit

Security First — Part 12 of 30


The Volkswagen Door That Wouldn't Close

May 2025. A security researcher is poking around the Volkswagen mobile app — the one owners use to remotely lock, unlock, and manage their vehicles. He notices something interesting: when you transfer ownership of a vehicle, the app asks the new owner to verify using a four-digit OTP sent to the previous owner's phone.

Standard enough. But then he wonders: what happens if I just... try all the codes?

He writes a Python script. Multithreaded. He points it at the OTP verification endpoint. There's no rate limiting. No lockout after failed attempts. No anomaly detection. Nothing. The API will happily accept as many guesses as he can throw at it.

There are 10,000 possible four-digit combinations. He brute-forces all of them in seconds. The correct OTP pops out, and suddenly he has access to the vehicle's digital profile — including, as he digs deeper, plaintext internal credentials for backend services like Salesforce CRM and payment processors sitting right there in the API responses.

Volkswagen acknowledged the vulnerabilities and patched them by May 6, 2025. But the damage potential was real: a researcher with a 40-line Python script had the keys to the kingdom because nobody thought to add a request limit to a verification endpoint.

This wasn't a sophisticated nation-state attack. It was a for-loop.

And here's the thing that keeps me up at night as someone who reviews a lot of vibe-coded apps: AI will generate that vulnerable endpoint for you without blinking. It doesn't know your login page shouldn't accept 10,000 password guesses a minute. You have to tell it. This article is about how.


Why Rate Limiting Matters More Than You Think

Before we get into code, let me give you a sense of the scale of the problem.

According to the Traceable AI 2025 State of API Security Report, brute force attacks have entered the top three methods used to breach APIs — right alongside DDoS (37% of incidents) and fraud/abuse (31%), with brute force at 27%. That's up from prior years, not down.

The Salt Security 2025 State of API Security Report reports API attacks increased 230% year-over-year, with more than 80% of breaches now occurring at the API layer — not the traditional web or app surface.

And credential stuffing — where attackers take lists of leaked passwords and spray them at your login endpoint — is the single largest initial access vector in 2025. Verizon's 2025 DBIR found that 22% of all breaches began with stolen or compromised credentials, higher than any other category. Attackers can buy 2 billion leaked credential pairs. They run them against your unprotected login endpoint at 500 requests per second.

The average cost of a US data breach in 2025? $10.22 million.

Rate limiting won't stop everything. But it makes the automated, low-effort attacks — the ones that account for most real-world incidents — computationally expensive enough that attackers move on to easier targets.


What Rate Limiting Actually Is

Simple concept: you limit how many times someone can hit an endpoint in a given time window.

  • Login endpoint: 5 attempts per IP per 15 minutes
  • Password reset: 3 requests per email per hour
  • API endpoint: 100 requests per user per minute

When the limit is exceeded, you return HTTP 429 Too Many Requests and optionally add a Retry-After header telling the client when to try again.

That's it. The Volkswagen attack would have been stopped cold with a limit of 10 OTP attempts per IP per hour.


The Stripe Lesson: Don't Forget Your Old Endpoints

Before we get to implementation, one more real-world story worth knowing.

Earlier in 2025, The Hacker News reported a sophisticated card-testing campaign targeting at least 49 online merchants. Attackers found Stripe's deprecated /v1/sources endpoint — superseded in May 2024 — and used it to validate stolen credit card numbers at scale.

Why did it work? The legacy endpoint lacked the rate limiting and fraud detection of Stripe's modern APIs. Attackers flooded it with small transaction requests. Each response confirmed whether a card was valid. They filtered out the invalid ones and sold the rest.

The campaign ran since August 2024 before being widely flagged in February 2025. Six months of undetected card testing because one old endpoint didn't have a request throttle.

The lesson: rate limiting needs to be on every endpoint, including the old ones you forgot about.


Implementing Rate Limiting: The Code

Let's look at how to actually add this to your apps. I'll cover the most common scenarios for vibe coders.

Python/FastAPI — Using slowapi

If you're building with FastAPI (very common for AI-generated backends), slowapi is your friend. It wraps the same limits library used in production systems.

pip install slowapi
from fastapi import FastAPI, Request
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded

limiter = Limiter(key_func=get_remote_address)
app = FastAPI()
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)

# Login endpoint — strict limit to stop credential stuffing
@app.post("/auth/login")
@limiter.limit("5/15minutes")  # 5 attempts per IP per 15 minutes
async def login(request: Request, credentials: LoginCredentials):
    # your login logic here
    ...

# Password reset — limit by IP
@app.post("/auth/reset-password")
@limiter.limit("3/hour")
async def reset_password(request: Request, body: ResetRequest):
    ...

# General API endpoint — more permissive for legitimate use
@app.get("/api/data")
@limiter.limit("100/minute")
async def get_data(request: Request):
    ...

For production, swap the in-memory storage for Redis so limits persist across server restarts and multiple instances:

from slowapi import Limiter
from slowapi.util import get_remote_address

limiter = Limiter(
    key_func=get_remote_address,
    storage_uri="redis://localhost:6379"
)

Node.js/Express — Using express-rate-limit

For Express-based backends (common in Next.js API routes and standalone Node servers):

npm install express-rate-limit
import rateLimit from 'express-rate-limit';

// Strict limiter for auth endpoints
const authLimiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 5,                    // 5 requests per window
  message: {
    error: 'Too many login attempts. Please try again in 15 minutes.'
  },
  standardHeaders: true,     // Return rate limit info in RateLimit-* headers
  legacyHeaders: false,
});

// General API limiter
const apiLimiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 100,
  message: { error: 'Rate limit exceeded. Slow down.' },
  standardHeaders: true,
  legacyHeaders: false,
});

// Apply to specific routes
app.post('/auth/login', authLimiter, loginHandler);
app.post('/auth/register', authLimiter, registerHandler);
app.post('/auth/reset-password', authLimiter, resetHandler);
app.use('/api/', apiLimiter); // Apply to all /api/ routes

For Redis-backed persistence across multiple instances:

npm install rate-limit-redis
import { RedisStore } from 'rate-limit-redis';
import { createClient } from 'redis';

const redisClient = createClient({ url: process.env.REDIS_URL });
await redisClient.connect();

const authLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,
  max: 5,
  store: new RedisStore({
    sendCommand: (...args) => redisClient.sendCommand(args),
  }),
});

Next.js API Routes — Edge-Native Rate Limiting

If you're on Vercel with Next.js, the @upstash/ratelimit library works beautifully with Vercel's Edge Runtime and Upstash Redis (there's a generous free tier):

npm install @upstash/ratelimit @upstash/redis
// app/api/auth/login/route.ts
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';
import { NextRequest, NextResponse } from 'next/server';

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(5, '15 m'),
  analytics: true,
});

export async function POST(request: NextRequest) {
  const ip = request.ip ?? '127.0.0.1';
  const { success, limit, reset, remaining } = await ratelimit.limit(ip);

  if (!success) {
    return NextResponse.json(
      { error: 'Too many requests. Please try again later.' },
      {
        status: 429,
        headers: {
          'X-RateLimit-Limit': limit.toString(),
          'X-RateLimit-Remaining': remaining.toString(),
          'X-RateLimit-Reset': new Date(reset).toISOString(),
        },
      }
    );
  }

  // Your login logic here
  return NextResponse.json({ message: 'Login successful' });
}

The Right Limits for the Right Endpoints

Not all endpoints are equal. Here's my cheat sheet:

Endpoint Type Suggested Limit Why
Login / sign-in 5 per IP per 15 min Stop credential stuffing cold
OTP / verification 5–10 per IP per hour Volkswagen-proof your app
Password reset request 3 per email per hour Prevent email flooding
Account registration 10 per IP per hour Slow down bot signups
Authenticated API calls 100–1000 per user per min Fair use without blocking legit users
Unauthenticated public API 20–50 per IP per min More cautious
Webhook endpoints 1000+ per min High volume, trust the caller

The key insight: sensitive endpoints deserve tighter limits than general API endpoints. Your login route and your OTP verification route are not the same as your /api/posts endpoint.


Beyond IP-Based Limits

IP-based rate limiting is table stakes, but sophisticated attackers rotate IPs. Here are the next layers:

Rate limit by user account too. After login, also limit by user ID, not just IP. An attacker with 10,000 IPs can still only try 5 times per account.

Progressive delays. Instead of a hard block, add exponential backoff: 1st failure = immediate retry allowed, 5th failure = 30-second wait, 10th failure = 15-minute lockout.

Account lockout (with care). After N failed attempts, lock the account and email the owner. Be careful here — an attacker can weaponize this to lock out legitimate users. A temporary lockout (15–30 minutes) is safer than a permanent one.

CAPTCHA at the threshold. Trigger a CAPTCHA challenge after 3 failed attempts rather than blocking outright. Legitimate users who made typos can continue. Bots can't.

# Example: Different limits for different contexts
@app.post("/auth/login")
async def login(request: Request, credentials: LoginCredentials, db: Session):
    # Check both IP-based and account-based rate limiting
    ip_key = f"login:ip:{get_remote_address(request)}"
    account_key = f"login:account:{credentials.email}"
    
    # If either limit exceeded, return 429
    if await redis.incr(ip_key) > 5:
        await redis.expire(ip_key, 900)  # 15 min
        raise HTTPException(status_code=429, detail="Too many attempts from this IP")
    
    if await redis.incr(account_key) > 10:
        await redis.expire(account_key, 1800)  # 30 min
        raise HTTPException(status_code=429, detail="Account temporarily locked")
    
    # Proceed with login logic...

What To Tell Your AI Code Assistant

When you ask an AI to build a login system, authentication endpoint, or any user-facing form, include rate limiting in your prompt. The AI won't add it by default.

Weak prompt:

"Build me a login endpoint with FastAPI"

Security-aware prompt:

"Build me a login endpoint with FastAPI. Include rate limiting of 5 attempts per IP per 15 minutes using slowapi with Redis backing. Return HTTP 429 with a Retry-After header on limit exceeded. Also add per-account rate limiting of 10 attempts per 30 minutes."

The difference between these two prompts is the difference between the Volkswagen vulnerability and a properly hardened endpoint.


Quick Audit: Check Your Own App

If you have a running app, here's a 5-minute check. Open your terminal:

# Test if your login endpoint has rate limiting
# Run this from your terminal (not in production against real accounts)
# Replace with your actual endpoint and test credentials

for i in {1..10}; do
  STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
    -X POST https://your-app.com/api/auth/login \
    -H "Content-Type: application/json" \
    -d '{"email":"test@test.com","password":"wrongpassword"}')
  echo "Attempt $i: HTTP $STATUS"
  sleep 0.5
done

What you want to see: HTTP 429 appearing around attempt 5–6.

What most unprotected apps return: HTTP 401 every single time, all the way to attempt 10 and beyond. The door is open.


Checklist: Rate Limiting for Your App

Use this before you ship anything with user authentication or public API endpoints:

  • Login endpoint — Maximum 5 attempts per IP per 15 minutes
  • OTP / verification codes — Maximum 10 attempts per IP per hour
  • Password reset requests — Maximum 3 per email per hour
  • Registration / signup — Maximum 10 per IP per hour
  • All authenticated API endpoints — At minimum 100–500 requests per user per minute
  • All unauthenticated endpoints — At minimum 20–50 requests per IP per minute
  • HTTP 429 responses — Return proper status code, not 200 or 401
  • Retry-After header — Tell clients when they can try again
  • Redis backing — If running multiple instances, use Redis so limits are shared
  • Account-level limiting — Rate limit by user ID in addition to IP
  • Legacy/old endpoints — Audit and limit every endpoint, not just new ones
  • Tested it — Actually run the curl loop above against your own app

Ask The Guild

Community prompt: Have you ever discovered a rate limiting vulnerability in your own app — or in an app you were using? What was the endpoint, and how did you fix it? Drop your story (with details anonymized if needed) in the discussion. Bonus points if you share the before/after code diff. The Guild learns best from real war stories.


Tom Hundley is a software architect with 25 years of experience. He writes the Security First series for the AI Coding Guild to help vibe coders build applications that don't make the headlines for the wrong reasons.

Copy A Prompt Next

Start safely

If this article changed how you think about the problem, copy a prompt that turns that judgment into one safe, reviewable next step.

Matching public prompts

6

Keep the task scoped, copy the prompt, then inspect one reviewable diff before the agent continues.

Need the safest first move instead? Open the curated sample prompts before you browse the broader library.

Start Here — Build Safely With AIStart Here — Build Safely With AI

Choose a Tiny First Win

How to pick a first project that teaches the workflow without dragging you into complex product and engineering problems.

Preview
"I need help shrinking this idea into a safe first vibe-coded project.
The big idea is: [describe idea]
Reduce it to the smallest useful version by:
1. removing anything that requires auth, billing, production data, or complicated integrations
2. keeping only one user and one core job to be done
Security First

Turn this security lesson into a repeatable review habit

This article gives you the judgment call. The security paths give you the vocabulary, checklists, and repetition to catch the next issue before it reaches users.

Best Next Path

Identity and Authentication Deep Dive

Guild Member · $29/mo

Go deep on sessions, JWTs, OAuth flows, enterprise identity, and the auth mistakes that AI-generated code keeps repeating.

15 lessonsIncluded with the full Guild Member library

Need the free route first?

Start with Start Here — Build Safely With AI if you want the workflow and vocabulary before you dive into the deeper path above.

T

About Tom Hundley

Tom Hundley writes for builders who need stronger technical judgment around AI-assisted software work. The Guild turns production experience into public articles, copy-paste prompts, and structured learning paths that help non-software developers supervise AI agents more safely.

Do this next

Leave this article with one concrete move. Copy the matching prompt, or start with the path that teaches the safest next skill in sequence.