Edge vs Serverless vs Server: Where to Run Your Code
Architecture Patterns — Part 12 of 30
The Bill That Woke Everyone Up
In July 2025, a developer posted on X that they'd moved about 20 client-side API calls server-side on a single page of their app. Seemed like the right thing to do — tighten up the architecture, protect API keys, reduce client-side churn. A few weeks later, their Vercel bill jumped from $300 a month to $3,550.
The culprit: serverless function duration. Those 20 calls were hitting Claude/Anthropic and a database — functions that spent most of their time waiting, not computing. And Lambda (the engine underneath Vercel's serverless functions) billed for every millisecond of that wait. At scale, idle time isn't free.
This isn't an isolated story. At AWS re:Invent 2025, a FinTech startup in South Africa presented a case study where a single Lambda function — handling sentiment analysis on call center recordings — was costing them $86.67 per day. Over a year: $31,000. For one function. The root cause was architectural: the function called Amazon Transcribe, waited, called Amazon Comprehend, waited again, and so on. Serverless bills for the wall clock, not the CPU clock.
Both teams made the same mistake: they reached for serverless because it felt like the safe default. It often is. But "safe default" and "right choice" are different things, and this article is about building the decision framework to tell them apart.
This is Part 12 of the Architecture Patterns series. Today we're building a mental model for one of the most consequential deployment decisions in modern software: where does your code actually run?
Three Compute Models, Three Different Bets
Let's establish clear definitions before the framework, because the industry uses these terms loosely.
Traditional Servers (VMs / Containers) — You provision compute, your code runs continuously. Whether you're using a $5 Hetzner VPS, an EC2 instance, a Fly.io machine, or a Kubernetes pod, the model is the same: a process stays alive, handles requests, and you pay for uptime whether you're serving traffic or not.
Serverless Functions — You deploy a function. The cloud provider runs it on-demand, scales it automatically, and bills per invocation and duration. AWS Lambda, Vercel Functions (Node.js runtime), Netlify Functions, Google Cloud Functions. No servers to manage. No idle cost when traffic drops to zero — but potentially surprising cost when functions wait on external services.
Edge Functions — You deploy a function that runs at CDN points of presence, geographically close to your users. Cloudflare Workers, Vercel Edge Runtime, Netlify Edge Functions. These use V8 isolates instead of containers, which means cold starts measured in microseconds rather than seconds. The trade-off: heavily restricted runtimes with no full Node.js access, limited memory (128MB), and no TCP connections.
These aren't a progression from worse to better. They're different tools with different shapes.
The Performance Landscape in 2026
Let's talk hard numbers, because the gap between models has widened significantly.
Cold start latency is where edge wins decisively. Cloudflare Workers initialize in under 5 milliseconds. AWS Lambda with Node.js or Python typically takes 200–400ms. Java or C# on Lambda can take 500ms to 2+ seconds. Benchmark data from late 2025 shows edge functions are roughly 9x faster during cold starts, and 2x faster on warm invocations — real-world Vercel tests clocked 167ms for edge vs 287ms for serverless.
The reason for the gap is architectural. Lambda runs your code in isolated containers. Cloudflare Workers runs your code in V8 isolates — the same engine that powers Chrome's JavaScript execution. V8 isolates spin up in microseconds because there's no OS-level container boot sequence. Cloudflare reports a 99.99% warm start rate across their network through what they call "Shard and Conquer" traffic coalescing.
Geographic latency is the other edge advantage. A user in Tokyo hitting a Lambda function deployed to us-east-1 is sending a request 13,000 km across the Pacific and back. That's 200–300ms of latency before your code even executes. Edge functions eliminate this — Cloudflare's 300+ data centers put your code within 50ms of virtually every internet user on Earth.
But cold starts got more expensive in August 2025. AWS began billing for the Lambda INIT phase — the container initialization time that previously was free. For a 512MB function with a 2-second INIT phase, cost per million cold starts jumped from roughly $0.80 to $17.80. Java and C# functions, which commonly initialize in 1–3 seconds, felt this most. If your workload is bursty (say, 10 requests per hour), you're looking at 30–50% cold start rates. At very low traffic, it can exceed 90%.
The Decision Framework
Stop asking "which is best?" and start asking these four questions:
1. Where are your users, and how much does latency matter?
If you're serving a global audience and your response time is user-visible, edge wins. Authentication middleware, A/B testing, personalization headers, rate limiting, geo-routing — these are all sub-millisecond decisions that execute on every request. Running them at the edge keeps your origin server from seeing unnecessary load and keeps your users from feeling geography.
If your latency SLA is measured in hundreds of milliseconds, not tens, the geographic advantage of edge matters less. A background job processing uploaded files doesn't care if it runs in Virginia or Oregon.
2. What does your function actually do while it's running?
This is the question that burns people. There are two kinds of compute work:
- CPU-bound: Image manipulation, data transformation, encryption, compression, algorithm execution. The function is working the whole time it's running.
- I/O-bound: Waiting for a database query, waiting for an LLM to respond, waiting for an external API call. The function is mostly blocking.
Serverless bills for wall-clock time. If your function is 90% I/O wait, you're paying full price for idle time. This is exactly what killed the South African fintech's budget. The fix was to split the single function into smaller, focused functions with async handoffs via Step Functions — cutting their bill by 97.7%.
Vercel's Fluid Compute, launched in mid-2025, attacks this problem by reusing idle Lambda instances across concurrent requests and billing for active CPU only. The developer who jumped from $300 to $3,550 turned on Fluid Compute and immediately saw improvement. But Fluid Compute is a mitigation — the architecture decision still matters.
For long-running I/O-heavy work: traditional servers win on cost.
3. What are your runtime requirements?
Edge runtimes are lean by design. Cloudflare Workers gives you 128MB of memory, 30 seconds of CPU time on the paid plan, and no TCP connections. No fs module. No native Node.js addons. No access to most npm packages that depend on OS-level APIs.
This means:
- No ORMs that use TCP to connect to Postgres (use HTTP-based databases like PlanetScale, Neon, or Cloudflare D1)
- No image processing libraries that use native binaries
- No heavy ML inference (128MB won't hold most models)
- No file system operations
If your code needs any of these, edge is not available to you without significant refactoring.
4. What is your traffic pattern?
Serverless was invented for variable, unpredictable traffic. If you have an app that gets 50 requests per day with occasional spikes to 5,000, serverless is a perfect fit. You pay near-zero at idle, auto-scale during spikes, and never provision for peak.
If you have a steady, predictable workload — 1,000 requests per minute, 24/7 — a reserved server almost certainly wins on cost. DataBank's 2026 analysis puts it plainly: serverless TCO rises sharply with sustained traffic because billing scales linearly with duration.
Code: The Three Patterns Side by Side
Here's the same "authenticate and proxy" logic implemented in each model:
Edge (Cloudflare Workers) — runs globally, sub-5ms cold start:
// cloudflare-worker.ts
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const authHeader = request.headers.get('Authorization');
// JWT verification using Web Crypto API (available at edge)
if (!authHeader?.startsWith('Bearer ')) {
return new Response('Unauthorized', { status: 401 });
}
const token = authHeader.slice(7);
const isValid = await verifyJWT(token, env.JWT_SECRET);
if (!isValid) {
return new Response('Invalid token', { status: 403 });
}
// Attach user context and proxy to origin
const originRequest = new Request(request);
originRequest.headers.set('X-User-Verified', 'true');
originRequest.headers.set('X-Edge-Region', request.cf?.colo || 'unknown');
return fetch(originRequest);
}
};
Serverless (AWS Lambda / Vercel Function) — runs in a region, 200–400ms cold start:
# lambda_handler.py
import json
import boto3
from jose import jwt
def handler(event, context):
# Has full Python runtime — can use any library
headers = event.get('headers', {})
auth_header = headers.get('authorization', '')
if not auth_header.startswith('Bearer '):
return {'statusCode': 401, 'body': 'Unauthorized'}
token = auth_header[7:]
try:
# Can use python-jose, cryptography, anything pip-installable
payload = jwt.decode(token, SECRET_KEY, algorithms=['HS256'])
except Exception:
return {'statusCode': 403, 'body': 'Invalid token'}
# Can connect to RDS, DynamoDB, anything in your VPC
user_id = payload.get('sub')
db_client = boto3.client('dynamodb') # Real TCP connection to AWS service
return {
'statusCode': 200,
'body': json.dumps({'user_id': user_id, 'verified': True})
}
Traditional Server (Express/Node.js on a VPS) — always warm, no cold starts:
// server.ts — runs on Fly.io, Railway, or a $6/mo VPS
import express from 'express';
import jwt from 'jsonwebtoken';
import { Pool } from 'pg'; // Direct Postgres connection, kept warm
const app = express();
const pool = new Pool({ connectionString: process.env.DATABASE_URL });
// Connection pool stays warm between requests — no reconnect overhead
app.use('/api', async (req, res, next) => {
const token = req.headers.authorization?.split(' ')[1];
if (!token) return res.status(401).json({ error: 'Unauthorized' });
try {
const payload = jwt.verify(token, process.env.JWT_SECRET!) as any;
req.user = payload;
next();
} catch {
res.status(403).json({ error: 'Invalid token' });
}
});
app.get('/api/profile', async (req, res) => {
// Connection pool reuse — ~2ms instead of ~50ms per query
const result = await pool.query(
'SELECT * FROM users WHERE id = $1',
[req.user.sub]
);
res.json(result.rows[0]);
});
app.listen(3000);
Note the database connection in the Express example. On a traditional server, your Pool instance persists between requests — connections stay warm and reuse is free. On Lambda, every cold start creates a new pool and a new TCP handshake to your database. This is why AWS RDS Proxy exists: to manage connection pooling on behalf of Lambda functions that can't maintain persistent connections themselves.
The Architecture I Actually Recommend
For most applications I work with — SaaS products, API-heavy apps, Next.js frontends — the answer is a layered hybrid:
Edge Layer (Cloudflare Workers or Vercel Middleware): JWT validation, rate limiting, geo-routing, A/B test assignment, feature flags. These are stateless, latency-sensitive, globally needed. Keep them under 1ms of CPU time.
Serverless Layer (Lambda or Vercel Functions with Node.js runtime): Business logic, third-party API orchestration, background task triggers. Use this for anything that needs full Node.js or Python capabilities but handles variable/unpredictable traffic. Watch for I/O-heavy functions — break them into smaller async units.
Traditional Server Layer (Fly.io, Railway, Render, or a VPS): Long-running processes, persistent WebSocket connections, heavy computation (video processing, ML inference), database connection-heavy workloads. If you're making 1,000+ database queries per minute, a connection pool on a persistent server will beat Lambda + RDS Proxy on both cost and latency.
The terminal command to deploy a Cloudflare Worker:
# Deploy to 300+ global locations in ~30 seconds
npx wrangler deploy
# Preview locally with real Cloudflare runtime simulation
npx wrangler dev
The Checklist
Before your next deployment decision, run through this:
- Map your traffic pattern. Steady and predictable → lean toward servers. Spiky and variable → lean toward serverless.
- Profile your function's CPU vs. I/O ratio. More than 50% I/O wait? Cost-model it carefully before going serverless.
- Check your runtime requirements. Need TCP connections, native modules, or large packages? Edge is probably off the table.
- Identify your latency-sensitive code paths. Authentication, routing, and personalization belong at the edge. Heavy computation doesn't.
- Set cost alerts before you deploy. AWS Budgets and Cost Anomaly Detection should be the first thing you configure, not an afterthought.
- Test cold start behavior with your real dependencies. A function with 45MB of dependencies has a different cold start profile than a hello-world.
- Don't mix compute models randomly. Document why each service runs where it runs. "It was easy to deploy" is not an architecture decision.
Ask The Guild
Community prompt: Where has your intuition about compute placement been wrong? Have you moved something from serverless to a traditional server (or vice versa) and been surprised by the result — cost, performance, or operational complexity? Share the before/after and what drove the decision. The best architectures in this space come from people who've felt the bill shock, traced the cold start, or watched latency spike from the wrong region. Tell us what you learned.