Rate Limiting: Preventing Abuse Before It Starts
Security First — Part 12 of 30
The Volkswagen Door That Wouldn't Close
May 2025. A security researcher is poking around the Volkswagen mobile app — the one owners use to remotely lock, unlock, and manage their vehicles. He notices something interesting: when you transfer ownership of a vehicle, the app asks the new owner to verify using a four-digit OTP sent to the previous owner's phone.
Standard enough. But then he wonders: what happens if I just... try all the codes?
He writes a Python script. Multithreaded. He points it at the OTP verification endpoint. There's no rate limiting. No lockout after failed attempts. No anomaly detection. Nothing. The API will happily accept as many guesses as he can throw at it.
There are 10,000 possible four-digit combinations. He brute-forces all of them in seconds. The correct OTP pops out, and suddenly he has access to the vehicle's digital profile — including, as he digs deeper, plaintext internal credentials for backend services like Salesforce CRM and payment processors sitting right there in the API responses.
Volkswagen acknowledged the vulnerabilities and patched them by May 6, 2025. But the damage potential was real: a researcher with a 40-line Python script had the keys to the kingdom because nobody thought to add a request limit to a verification endpoint.
This wasn't a sophisticated nation-state attack. It was a for-loop.
And here's the thing that keeps me up at night as someone who reviews a lot of vibe-coded apps: AI will generate that vulnerable endpoint for you without blinking. It doesn't know your login page shouldn't accept 10,000 password guesses a minute. You have to tell it. This article is about how.
Why Rate Limiting Matters More Than You Think
Before we get into code, let me give you a sense of the scale of the problem.
According to the Traceable AI 2025 State of API Security Report, brute force attacks have entered the top three methods used to breach APIs — right alongside DDoS (37% of incidents) and fraud/abuse (31%), with brute force at 27%. That's up from prior years, not down.
The Salt Security 2025 State of API Security Report reports API attacks increased 230% year-over-year, with more than 80% of breaches now occurring at the API layer — not the traditional web or app surface.
And credential stuffing — where attackers take lists of leaked passwords and spray them at your login endpoint — is the single largest initial access vector in 2025. Verizon's 2025 DBIR found that 22% of all breaches began with stolen or compromised credentials, higher than any other category. Attackers can buy 2 billion leaked credential pairs. They run them against your unprotected login endpoint at 500 requests per second.
The average cost of a US data breach in 2025? $10.22 million.
Rate limiting won't stop everything. But it makes the automated, low-effort attacks — the ones that account for most real-world incidents — computationally expensive enough that attackers move on to easier targets.
What Rate Limiting Actually Is
Simple concept: you limit how many times someone can hit an endpoint in a given time window.
- Login endpoint: 5 attempts per IP per 15 minutes
- Password reset: 3 requests per email per hour
- API endpoint: 100 requests per user per minute
When the limit is exceeded, you return HTTP 429 Too Many Requests and optionally add a Retry-After header telling the client when to try again.
That's it. The Volkswagen attack would have been stopped cold with a limit of 10 OTP attempts per IP per hour.
The Stripe Lesson: Don't Forget Your Old Endpoints
Before we get to implementation, one more real-world story worth knowing.
Earlier in 2025, The Hacker News reported a sophisticated card-testing campaign targeting at least 49 online merchants. Attackers found Stripe's deprecated /v1/sources endpoint — superseded in May 2024 — and used it to validate stolen credit card numbers at scale.
Why did it work? The legacy endpoint lacked the rate limiting and fraud detection of Stripe's modern APIs. Attackers flooded it with small transaction requests. Each response confirmed whether a card was valid. They filtered out the invalid ones and sold the rest.
The campaign ran since August 2024 before being widely flagged in February 2025. Six months of undetected card testing because one old endpoint didn't have a request throttle.
The lesson: rate limiting needs to be on every endpoint, including the old ones you forgot about.
Implementing Rate Limiting: The Code
Let's look at how to actually add this to your apps. I'll cover the most common scenarios for vibe coders.
Python/FastAPI — Using slowapi
If you're building with FastAPI (very common for AI-generated backends), slowapi is your friend. It wraps the same limits library used in production systems.
pip install slowapi
from fastapi import FastAPI, Request
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded
limiter = Limiter(key_func=get_remote_address)
app = FastAPI()
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)
# Login endpoint — strict limit to stop credential stuffing
@app.post("/auth/login")
@limiter.limit("5/15minutes") # 5 attempts per IP per 15 minutes
async def login(request: Request, credentials: LoginCredentials):
# your login logic here
...
# Password reset — limit by IP
@app.post("/auth/reset-password")
@limiter.limit("3/hour")
async def reset_password(request: Request, body: ResetRequest):
...
# General API endpoint — more permissive for legitimate use
@app.get("/api/data")
@limiter.limit("100/minute")
async def get_data(request: Request):
...
For production, swap the in-memory storage for Redis so limits persist across server restarts and multiple instances:
from slowapi import Limiter
from slowapi.util import get_remote_address
limiter = Limiter(
key_func=get_remote_address,
storage_uri="redis://localhost:6379"
)
Node.js/Express — Using express-rate-limit
For Express-based backends (common in Next.js API routes and standalone Node servers):
npm install express-rate-limit
import rateLimit from 'express-rate-limit';
// Strict limiter for auth endpoints
const authLimiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 5, // 5 requests per window
message: {
error: 'Too many login attempts. Please try again in 15 minutes.'
},
standardHeaders: true, // Return rate limit info in RateLimit-* headers
legacyHeaders: false,
});
// General API limiter
const apiLimiter = rateLimit({
windowMs: 60 * 1000, // 1 minute
max: 100,
message: { error: 'Rate limit exceeded. Slow down.' },
standardHeaders: true,
legacyHeaders: false,
});
// Apply to specific routes
app.post('/auth/login', authLimiter, loginHandler);
app.post('/auth/register', authLimiter, registerHandler);
app.post('/auth/reset-password', authLimiter, resetHandler);
app.use('/api/', apiLimiter); // Apply to all /api/ routes
For Redis-backed persistence across multiple instances:
npm install rate-limit-redis
import { RedisStore } from 'rate-limit-redis';
import { createClient } from 'redis';
const redisClient = createClient({ url: process.env.REDIS_URL });
await redisClient.connect();
const authLimiter = rateLimit({
windowMs: 15 * 60 * 1000,
max: 5,
store: new RedisStore({
sendCommand: (...args) => redisClient.sendCommand(args),
}),
});
Next.js API Routes — Edge-Native Rate Limiting
If you're on Vercel with Next.js, the @upstash/ratelimit library works beautifully with Vercel's Edge Runtime and Upstash Redis (there's a generous free tier):
npm install @upstash/ratelimit @upstash/redis
// app/api/auth/login/route.ts
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';
import { NextRequest, NextResponse } from 'next/server';
const ratelimit = new Ratelimit({
redis: Redis.fromEnv(),
limiter: Ratelimit.slidingWindow(5, '15 m'),
analytics: true,
});
export async function POST(request: NextRequest) {
const ip = request.ip ?? '127.0.0.1';
const { success, limit, reset, remaining } = await ratelimit.limit(ip);
if (!success) {
return NextResponse.json(
{ error: 'Too many requests. Please try again later.' },
{
status: 429,
headers: {
'X-RateLimit-Limit': limit.toString(),
'X-RateLimit-Remaining': remaining.toString(),
'X-RateLimit-Reset': new Date(reset).toISOString(),
},
}
);
}
// Your login logic here
return NextResponse.json({ message: 'Login successful' });
}
The Right Limits for the Right Endpoints
Not all endpoints are equal. Here's my cheat sheet:
| Endpoint Type | Suggested Limit | Why |
|---|---|---|
| Login / sign-in | 5 per IP per 15 min | Stop credential stuffing cold |
| OTP / verification | 5–10 per IP per hour | Volkswagen-proof your app |
| Password reset request | 3 per email per hour | Prevent email flooding |
| Account registration | 10 per IP per hour | Slow down bot signups |
| Authenticated API calls | 100–1000 per user per min | Fair use without blocking legit users |
| Unauthenticated public API | 20–50 per IP per min | More cautious |
| Webhook endpoints | 1000+ per min | High volume, trust the caller |
The key insight: sensitive endpoints deserve tighter limits than general API endpoints. Your login route and your OTP verification route are not the same as your /api/posts endpoint.
Beyond IP-Based Limits
IP-based rate limiting is table stakes, but sophisticated attackers rotate IPs. Here are the next layers:
Rate limit by user account too. After login, also limit by user ID, not just IP. An attacker with 10,000 IPs can still only try 5 times per account.
Progressive delays. Instead of a hard block, add exponential backoff: 1st failure = immediate retry allowed, 5th failure = 30-second wait, 10th failure = 15-minute lockout.
Account lockout (with care). After N failed attempts, lock the account and email the owner. Be careful here — an attacker can weaponize this to lock out legitimate users. A temporary lockout (15–30 minutes) is safer than a permanent one.
CAPTCHA at the threshold. Trigger a CAPTCHA challenge after 3 failed attempts rather than blocking outright. Legitimate users who made typos can continue. Bots can't.
# Example: Different limits for different contexts
@app.post("/auth/login")
async def login(request: Request, credentials: LoginCredentials, db: Session):
# Check both IP-based and account-based rate limiting
ip_key = f"login:ip:{get_remote_address(request)}"
account_key = f"login:account:{credentials.email}"
# If either limit exceeded, return 429
if await redis.incr(ip_key) > 5:
await redis.expire(ip_key, 900) # 15 min
raise HTTPException(status_code=429, detail="Too many attempts from this IP")
if await redis.incr(account_key) > 10:
await redis.expire(account_key, 1800) # 30 min
raise HTTPException(status_code=429, detail="Account temporarily locked")
# Proceed with login logic...
What To Tell Your AI Code Assistant
When you ask an AI to build a login system, authentication endpoint, or any user-facing form, include rate limiting in your prompt. The AI won't add it by default.
Weak prompt:
"Build me a login endpoint with FastAPI"
Security-aware prompt:
"Build me a login endpoint with FastAPI. Include rate limiting of 5 attempts per IP per 15 minutes using slowapi with Redis backing. Return HTTP 429 with a Retry-After header on limit exceeded. Also add per-account rate limiting of 10 attempts per 30 minutes."
The difference between these two prompts is the difference between the Volkswagen vulnerability and a properly hardened endpoint.
Quick Audit: Check Your Own App
If you have a running app, here's a 5-minute check. Open your terminal:
# Test if your login endpoint has rate limiting
# Run this from your terminal (not in production against real accounts)
# Replace with your actual endpoint and test credentials
for i in {1..10}; do
STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
-X POST https://your-app.com/api/auth/login \
-H "Content-Type: application/json" \
-d '{"email":"test@test.com","password":"wrongpassword"}')
echo "Attempt $i: HTTP $STATUS"
sleep 0.5
done
What you want to see: HTTP 429 appearing around attempt 5–6.
What most unprotected apps return: HTTP 401 every single time, all the way to attempt 10 and beyond. The door is open.
Checklist: Rate Limiting for Your App
Use this before you ship anything with user authentication or public API endpoints:
- Login endpoint — Maximum 5 attempts per IP per 15 minutes
- OTP / verification codes — Maximum 10 attempts per IP per hour
- Password reset requests — Maximum 3 per email per hour
- Registration / signup — Maximum 10 per IP per hour
- All authenticated API endpoints — At minimum 100–500 requests per user per minute
- All unauthenticated endpoints — At minimum 20–50 requests per IP per minute
- HTTP 429 responses — Return proper status code, not 200 or 401
- Retry-After header — Tell clients when they can try again
- Redis backing — If running multiple instances, use Redis so limits are shared
- Account-level limiting — Rate limit by user ID in addition to IP
- Legacy/old endpoints — Audit and limit every endpoint, not just new ones
- Tested it — Actually run the curl loop above against your own app
Ask The Guild
Community prompt: Have you ever discovered a rate limiting vulnerability in your own app — or in an app you were using? What was the endpoint, and how did you fix it? Drop your story (with details anonymized if needed) in the discussion. Bonus points if you share the before/after code diff. The Guild learns best from real war stories.
Tom Hundley is a software architect with 25 years of experience. He writes the Security First series for the AI Coding Guild to help vibe coders build applications that don't make the headlines for the wrong reasons.