Rate Limiting: Preventing API Abuse Before It Starts | Security First

Security First — Part 12 of 30

The Volkswagen Door That Wouldn't Close

May 2025. A security researcher is poking around the Volkswagen mobile app — the one owners use to remotely lock, unlock, and manage their vehicles. He notices something interesting: when you transfer ownership of a vehicle, the app asks the new owner to verify using a four-digit OTP sent to the previous owner's phone.

Standard enough. But then he wonders: what happens if I just... try all the codes?

He writes a Python script. Multithreaded. He points it at the OTP verification endpoint. There's no rate limiting. No lockout after failed attempts. No anomaly detection. Nothing. The API will happily accept as many guesses as he can throw at it.

There are 10,000 possible four-digit combinations. He brute-forces all of them in seconds. The correct OTP pops out, and suddenly he has access to the vehicle's digital profile — including, as he digs deeper, plaintext internal credentials for backend services like Salesforce CRM and payment processors sitting right there in the API responses.

Volkswagen acknowledged the vulnerabilities and patched them by May 6, 2025. But the damage potential was real: a researcher with a 40-line Python script had the keys to the kingdom because nobody thought to add a request limit to a verification endpoint.

This wasn't a sophisticated nation-state attack. It was a for-loop.

And here's the thing that keeps me up at night as someone who reviews a lot of vibe-coded apps: AI will generate that vulnerable endpoint for you without blinking. It doesn't know your login page shouldn't accept 10,000 password guesses a minute. You have to tell it. This article is about how.

Why Rate Limiting Matters More Than You Think

Before we get into code, let me give you a sense of the scale of the problem.

According to the Traceable AI 2025 State of API Security Report, brute force attacks have entered the top three methods used to breach APIs — right alongside DDoS (37% of incidents) and fraud/abuse (31%), with brute force at 27%. That's up from prior years, not down.

The Salt Security 2025 State of API Security Report reports API attacks increased 230% year-over-year, with more than 80% of breaches now occurring at the API layer — not the traditional web or app surface.

And credential stuffing — where attackers take lists of leaked passwords and spray them at your login endpoint — is the single largest initial access vector in 2025. Verizon's 2025 DBIR found that 22% of all breaches began with stolen or compromised credentials, higher than any other category. Attackers can buy 2 billion leaked credential pairs. They run them against your unprotected login endpoint at 500 requests per second.

The average cost of a US data breach in 2025? $10.22 million.

Rate limiting won't stop everything. But it makes the automated, low-effort attacks — the ones that account for most real-world incidents — computationally expensive enough that attackers move on to easier targets.

What Rate Limiting Actually Is

Simple concept: you limit how many times someone can hit an endpoint in a given time window.

Login endpoint: 5 attempts per IP per 15 minutes
Password reset: 3 requests per email per hour
API endpoint: 100 requests per user per minute

When the limit is exceeded, you return HTTP 429 Too Many Requests and optionally add a Retry-After header telling the client when to try again.

That's it. The Volkswagen attack would have been stopped cold with a limit of 10 OTP attempts per IP per hour.

The Stripe Lesson: Don't Forget Your Old Endpoints

Before we get to implementation, one more real-world story worth knowing.

Earlier in 2025, The Hacker News reported a sophisticated card-testing campaign targeting at least 49 online merchants. Attackers found Stripe's deprecated /v1/sources endpoint — superseded in May 2024 — and used it to validate stolen credit card numbers at scale.

Why did it work? The legacy endpoint lacked the rate limiting and fraud detection of Stripe's modern APIs. Attackers flooded it with small transaction requests. Each response confirmed whether a card was valid. They filtered out the invalid ones and sold the rest.

The campaign ran since August 2024 before being widely flagged in February 2025. Six months of undetected card testing because one old endpoint didn't have a request throttle.

The lesson: rate limiting needs to be on every endpoint, including the old ones you forgot about.

Implementing Rate Limiting: The Code

Let's look at how to actually add this to your apps. I'll cover the most common scenarios for vibe coders.

Python/FastAPI — Using slowapi

If you're building with FastAPI (very common for AI-generated backends), slowapi is your friend. It wraps the same limits library used in production systems.

pip install slowapi

from fastapi import FastAPI, Request
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded

limiter = Limiter(key_func=get_remote_address)
app = FastAPI()
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)

# Login endpoint — strict limit to stop credential stuffing
@app.post("/auth/login")
@limiter.limit("5/15minutes")  # 5 attempts per IP per 15 minutes
async def login(request: Request, credentials: LoginCredentials):
    # your login logic here
    ...

# Password reset — limit by IP
@app.post("/auth/reset-password")
@limiter.limit("3/hour")
async def reset_password(request: Request, body: ResetRequest):
    ...

# General API endpoint — more permissive for legitimate use
@app.get("/api/data")
@limiter.limit("100/minute")
async def get_data(request: Request):
    ...

For production, swap the in-memory storage for Redis so limits persist across server restarts and multiple instances:

from slowapi import Limiter
from slowapi.util import get_remote_address

limiter = Limiter(
    key_func=get_remote_address,
    storage_uri="redis://localhost:6379"
)

Node.js/Express — Using express-rate-limit

For Express-based backends (common in Next.js API routes and standalone Node servers):

npm install express-rate-limit

import rateLimit from 'express-rate-limit';

// Strict limiter for auth endpoints
const authLimiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 5,                    // 5 requests per window
  message: {
    error: 'Too many login attempts. Please try again in 15 minutes.'
  },
  standardHeaders: true,     // Return rate limit info in RateLimit-* headers
  legacyHeaders: false,
});

// General API limiter
const apiLimiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 100,
  message: { error: 'Rate limit exceeded. Slow down.' },
  standardHeaders: true,
  legacyHeaders: false,
});

// Apply to specific routes
app.post('/auth/login', authLimiter, loginHandler);
app.post('/auth/register', authLimiter, registerHandler);
app.post('/auth/reset-password', authLimiter, resetHandler);
app.use('/api/', apiLimiter); // Apply to all /api/ routes

For Redis-backed persistence across multiple instances:

npm install rate-limit-redis

import { RedisStore } from 'rate-limit-redis';
import { createClient } from 'redis';

const redisClient = createClient({ url: process.env.REDIS_URL });
await redisClient.connect();

const authLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,
  max: 5,
  store: new RedisStore({
    sendCommand: (...args) => redisClient.sendCommand(args),
  }),
});

Next.js API Routes — Edge-Native Rate Limiting

If you're on Vercel with Next.js, the @upstash/ratelimit library works beautifully with Vercel's Edge Runtime and Upstash Redis (there's a generous free tier):

npm install @upstash/ratelimit @upstash/redis

// app/api/auth/login/route.ts
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';
import { NextRequest, NextResponse } from 'next/server';

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(5, '15 m'),
  analytics: true,
});

export async function POST(request: NextRequest) {
  const ip = request.ip ?? '127.0.0.1';
  const { success, limit, reset, remaining } = await ratelimit.limit(ip);

  if (!success) {
    return NextResponse.json(
      { error: 'Too many requests. Please try again later.' },
      {
        status: 429,
        headers: {
          'X-RateLimit-Limit': limit.toString(),
          'X-RateLimit-Remaining': remaining.toString(),
          'X-RateLimit-Reset': new Date(reset).toISOString(),
        },
      }
    );
  }

  // Your login logic here
  return NextResponse.json({ message: 'Login successful' });
}

The Right Limits for the Right Endpoints

Not all endpoints are equal. Here's my cheat sheet:

Endpoint Type	Suggested Limit	Why
Login / sign-in	5 per IP per 15 min	Stop credential stuffing cold
OTP / verification	5–10 per IP per hour	Volkswagen-proof your app
Password reset request	3 per email per hour	Prevent email flooding
Account registration	10 per IP per hour	Slow down bot signups
Authenticated API calls	100–1000 per user per min	Fair use without blocking legit users
Unauthenticated public API	20–50 per IP per min	More cautious
Webhook endpoints	1000+ per min	High volume, trust the caller

The key insight: sensitive endpoints deserve tighter limits than general API endpoints. Your login route and your OTP verification route are not the same as your /api/posts endpoint.

Beyond IP-Based Limits

IP-based rate limiting is table stakes, but sophisticated attackers rotate IPs. Here are the next layers:

Rate limit by user account too. After login, also limit by user ID, not just IP. An attacker with 10,000 IPs can still only try 5 times per account.

Progressive delays. Instead of a hard block, add exponential backoff: 1st failure = immediate retry allowed, 5th failure = 30-second wait, 10th failure = 15-minute lockout.

Account lockout (with care). After N failed attempts, lock the account and email the owner. Be careful here — an attacker can weaponize this to lock out legitimate users. A temporary lockout (15–30 minutes) is safer than a permanent one.

CAPTCHA at the threshold. Trigger a CAPTCHA challenge after 3 failed attempts rather than blocking outright. Legitimate users who made typos can continue. Bots can't.

# Example: Different limits for different contexts
@app.post("/auth/login")
async def login(request: Request, credentials: LoginCredentials, db: Session):
    # Check both IP-based and account-based rate limiting
    ip_key = f"login:ip:{get_remote_address(request)}"
    account_key = f"login:account:{credentials.email}"
    
    # If either limit exceeded, return 429
    if await redis.incr(ip_key) > 5:
        await redis.expire(ip_key, 900)  # 15 min
        raise HTTPException(status_code=429, detail="Too many attempts from this IP")
    
    if await redis.incr(account_key) > 10:
        await redis.expire(account_key, 1800)  # 30 min
        raise HTTPException(status_code=429, detail="Account temporarily locked")
    
    # Proceed with login logic...

What To Tell Your AI Code Assistant

When you ask an AI to build a login system, authentication endpoint, or any user-facing form, include rate limiting in your prompt. The AI won't add it by default.

Weak prompt:

"Build me a login endpoint with FastAPI"

Security-aware prompt:

"Build me a login endpoint with FastAPI. Include rate limiting of 5 attempts per IP per 15 minutes using slowapi with Redis backing. Return HTTP 429 with a Retry-After header on limit exceeded. Also add per-account rate limiting of 10 attempts per 30 minutes."

The difference between these two prompts is the difference between the Volkswagen vulnerability and a properly hardened endpoint.

Quick Audit: Check Your Own App

If you have a running app, here's a 5-minute check. Open your terminal:

# Test if your login endpoint has rate limiting
# Run this from your terminal (not in production against real accounts)
# Replace with your actual endpoint and test credentials

for i in {1..10}; do
  STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
    -X POST https://your-app.com/api/auth/login \
    -H "Content-Type: application/json" \
    -d '{"email":"test@test.com","password":"wrongpassword"}')
  echo "Attempt $i: HTTP $STATUS"
  sleep 0.5
done

What you want to see: HTTP 429 appearing around attempt 5–6.

What most unprotected apps return: HTTP 401 every single time, all the way to attempt 10 and beyond. The door is open.

Checklist: Rate Limiting for Your App

Use this before you ship anything with user authentication or public API endpoints:

Login endpoint — Maximum 5 attempts per IP per 15 minutes
OTP / verification codes — Maximum 10 attempts per IP per hour
Password reset requests — Maximum 3 per email per hour
Registration / signup — Maximum 10 per IP per hour
All authenticated API endpoints — At minimum 100–500 requests per user per minute
All unauthenticated endpoints — At minimum 20–50 requests per IP per minute
HTTP 429 responses — Return proper status code, not 200 or 401
Retry-After header — Tell clients when they can try again
Redis backing — If running multiple instances, use Redis so limits are shared
Account-level limiting — Rate limit by user ID in addition to IP
Legacy/old endpoints — Audit and limit every endpoint, not just new ones
Tested it — Actually run the curl loop above against your own app

Ask The Guild

Community prompt: Have you ever discovered a rate limiting vulnerability in your own app — or in an app you were using? What was the endpoint, and how did you fix it? Drop your story (with details anonymized if needed) in the discussion. Bonus points if you share the before/after code diff. The Guild learns best from real war stories.

Tom Hundley is a software architect with 25 years of experience. He writes the Security First series for the AI Coding Guild to help vibe coders build applications that don't make the headlines for the wrong reasons.

Rate Limiting: Preventing Abuse Before It Starts