Skip to content

API Keys, Rate Limits, and Cost Management

Understand AI API pricing, rate limits, budgets, and how to monitor and control your AI spending.

13 min readapi-keys, rate-limits, costs, pricing, budget, ai-operations

You've signed up for an AI API, gotten your key, and started making calls. Then the bill arrives and it's three times what you expected. Or worse — your key leaks, someone uses it to run thousands of calls, and you're on the hook for the cost.

AI APIs are powerful, but they come with operational realities that can surprise you. This lesson covers the practical infrastructure of working with AI APIs: how keys work, what rate limits mean, and how to keep costs from spiraling.

API Keys — Your Identity and Your Liability

An API key is a secret string that identifies you to the AI provider. Every request you make includes this key, and every request gets billed to the account that owns it.

# This key is how the provider knows it's you (and who to bill)
ANTHROPIC_API_KEY=sk-ant-api03-xxxxxxxxxxxx
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxx

Key Security Basics

Never commit keys to git. This is the most common mistake. If your key ends up in a public repository, bots will find it within minutes and start making calls on your account.

# WRONG — key in code
const client = new Anthropic({ apiKey: "sk-ant-api03-real-key-here" });
 
# RIGHT — key from environment
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

Use .env files locally. Store keys in .env and make sure .env is in your .gitignore.

# .env (never committed)
ANTHROPIC_API_KEY=sk-ant-api03-xxxxxxxxxxxx
 
# .gitignore
.env
.env.local
.env.*.local

Use a secrets manager for production. Environment variables on your laptop are fine for development. For production applications, servers, and CI/CD, use a proper secrets manager — Doppler, AWS Secrets Manager, HashiCorp Vault, or your platform's built-in secrets.

Rotate keys regularly. If you suspect a key has been exposed, revoke it immediately and generate a new one. Most providers let you have multiple active keys, so you can rotate without downtime.

Key Scoping

Most providers offer ways to scope keys:

  • Project-level keys that can only access specific resources
  • Read-only keys for applications that only need to fetch data
  • Usage-limited keys with spending caps

Use the most restrictive key that works for your use case. Your CI/CD pipeline doesn't need the same key as your development environment.

Rate Limits — The Throttle

Rate limits control how many requests you can make per minute (RPM) and how many tokens you can process per minute (TPM). They exist to prevent any single user from overwhelming the service.

How Rate Limits Work

Typical rate limits (varies by provider and tier):
- Requests per minute (RPM):   60-4,000
- Tokens per minute (TPM):     40,000-1,000,000
- Tokens per day (TPD):        Varies by tier

When you hit a rate limit, the API returns a 429 Too Many Requests error. Your application should handle this gracefully.

import time
import anthropic
 
client = anthropic.Anthropic()
 
def call_with_retry(prompt, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.messages.create(
                model="claude-sonnet-4-20250514",
                max_tokens=1024,
                messages=[{"role": "user", "content": prompt}]
            )
        except anthropic.RateLimitError:
            wait_time = 2 ** attempt  # Exponential backoff: 1, 2, 4 seconds
            time.sleep(wait_time)
    raise Exception("Rate limit exceeded after retries")

Rate Limit Tiers

Providers increase your rate limits as you spend more. The tiers typically look like:

| Tier | Monthly Spend | RPM | TPM | |------|--------------|-----|-----| | Free | $0 | 5-20 | 20K-40K | | Tier 1 | $5+ | 60-500 | 80K-200K | | Tier 2 | $50+ | 1,000-2,000 | 400K-800K | | Tier 3 | $200+ | 2,000-4,000 | 800K-2M |

If you're hitting rate limits regularly, you might need to move to a higher tier, optimize your usage, or spread calls across time.

Multi-Agent Rate Limits

Running multiple agents simultaneously multiplies your API calls. Three agents each making 20 requests per minute is 60 RPM — potentially enough to hit rate limits on lower tiers.

Plan for this when using multi-agent workflows. Either ensure your rate limit tier is high enough, or stagger agent work to avoid bursts.

Cost Management

AI API costs follow a simple formula:

Cost = (Input tokens x Input price) + (Output tokens x Output price)

But the effective cost depends on how you use the API.

Understanding Your Usage

Most providers offer usage dashboards. Check them regularly — at least weekly when you're actively developing. Look for:

  • Daily spending trends — Is it stable or growing?
  • Which models you're using — Are you using expensive models for simple tasks?
  • Token volumes — Are you sending more context than necessary?

Cost Optimization Strategies

Match the model to the task. This is the single biggest cost lever. Don't use Claude Opus for tasks that Claude Haiku handles fine.

Task: "What's the TypeScript type for a Promise that returns a string?"
Model needed: Haiku (fast, cheap)
Cost: ~$0.001
 
Task: "Refactor this 500-line file to use the repository pattern"
Model needed: Sonnet or Opus (reasoning required)
Cost: ~$0.10-$0.50

Minimize context. Only include files and context that are relevant to the current task. Loading your entire codebase into context for a simple question is expensive and doesn't improve the answer.

Set spending limits. Every major provider lets you set monthly spending caps. Set one. A runaway script or leaked key without a spending limit can generate an unpleasant surprise.

Anthropic Console → Settings → Billing → Set monthly limit
OpenAI Dashboard → Settings → Limits → Set monthly budget

Cache when possible. If you're making the same or similar API calls repeatedly (like in a CI pipeline), consider caching results. Anthropic offers prompt caching that reduces costs for repeated context.

Use streaming wisely. Streaming (getting tokens as they're generated) is great for interactive use but adds overhead for batch processing. For automated pipelines, non-streaming calls are more efficient.

Setting a Budget

For individual developers, a practical monthly budget:

| Usage Level | Budget | What It Gets You | |------------|--------|-----------------| | Exploring | $20/month | 10-20 complex queries/day | | Active development | $100/month | Heavy daily use with one agent | | Power user | $300/month | Multi-agent workflows, CI integration | | Team/production | $500+/month | Multiple developers, automated pipelines |

Start with a lower budget and increase as you understand your usage patterns. It's better to hit a limit and consciously raise it than to discover an unexpected $500 bill.

Try this now

  • Move every AI API key into environment variables and verify none are committed or hardcoded.
  • Set a monthly spending cap before you scale usage.
  • Separate your common tasks by model tier so you know which work deserves expensive models.

Prompt to give your agent

"Help me operationalize AI API usage for this app. I need:

  1. a secure API key handling plan for local, CI, and production
  2. a rate-limit and retry strategy for 429s
  3. a budget recommendation
  4. guidance on which model tier to use for which task
  5. a monitoring checklist so costs do not surprise me"

What you must review yourself

  • Whether keys are stored and rotated like real production secrets
  • Whether rate-limit handling exists before concurrency increases
  • Whether spending caps and dashboards are in place before usage grows
  • Whether expensive models are reserved for tasks that actually need them

Common Mistakes to Avoid

  • Committing API keys to version control. Exposure starts the moment the key lands in history.
  • Not setting spending limits. Budgets are cheaper than surprises.
  • Using the most powerful model for everything. Model-task mismatch is a waste pattern.
  • Ignoring rate limits in application code. 429s are normal enough to deserve design attention.
  • Not monitoring usage. Silent growth becomes expensive growth.

Key takeaways

  • AI APIs are part of your operational surface, not just a neat feature
  • Key handling, retry logic, and cost controls belong in the initial setup
  • Rate-limit strategy becomes important as soon as you add concurrency or automation
  • Budget discipline starts with matching model cost to task value

What's Next

You know how models work, who makes them, and how to manage the infrastructure. Next, we'll explore the three approaches to customizing AI for your specific needs: fine-tuning, RAG, and prompt engineering — and when each approach makes sense.