When AI Writes Your Infrastructure Code: A terraform destroy Story
The Claude Code / terraform destroy incident I described in the database article is worth examining more carefully, because the failure mode wasn't the AI making a mistake. The AI did exactly what it was asked. The mistake was in how the human structured the task.
"Clean up the dev environment" is a perfectly reasonable instruction for a human who knows the difference between dev and prod infrastructure. For an AI agent operating with broad infrastructure permissions, it's an instruction that can be interpreted as "destroy everything tagged as dev" — which, depending on how your tagging is set up, might include things you very much did not want destroyed.
This is the blast radius problem, and it's one of the most important concepts for anyone using AI agents to manage infrastructure.
The Blast Radius Concept
Blast radius is borrowed from incident response terminology. It's the answer to: "If this goes wrong in the worst possible way, how much damage can it do?"
When you give an AI agent credentials with broad permissions, the blast radius of any mistake — including misunderstood instructions — is very large. The agent can affect everything those credentials can reach. When you constrain the agent's permissions to the minimum necessary for the task, the blast radius shrinks dramatically.
In infrastructure terms:
- An agent with
AdministratorAccesson AWS has unlimited blast radius - An agent with
AmazonS3ReadOnlyAccesson a single bucket has near-zero blast radius - An agent with
ec2:DescribeInstancesplusec2:StartInstanceson tagged dev instances has bounded, predictable blast radius
The principle is identical to principle of least privilege, which security engineers have been preaching for decades. AI agents make it more urgent because their actions can be fast, confident, and irreversible.
Environment Variable Mistakes AI Makes
The second category of AI infrastructure mistakes is environment variable handling. AI models generate code that works, and they test it mentally against whatever environment they assume you're running. That assumption is usually development. When the code hits production, it either breaks loudly (if the variable is missing) or behaves dangerously (if the wrong value is used).
Three specific patterns to watch for in AI-generated code:
1. Hardcoded development values as fallbacks
// AI generates this — looks reasonable, is dangerous
const dbUrl = process.env.DATABASE_URL || 'postgres://localhost:5432/mydb_dev';
If DATABASE_URL is misconfigured in production, this silently connects to nothing, or worse, to a dev database that happened to be accessible. The right pattern is to fail loudly:
const dbUrl = process.env.DATABASE_URL;
if (!dbUrl) throw new Error('DATABASE_URL is required');
2. Secrets in code, not in environment
AI assistants will sometimes inline API keys or credentials when they can't resolve a reference. Always scan AI-generated code for anything that looks like a credential before committing.
# Quick scan for common patterns
grep -rE "(api_key|apikey|secret|password|token)\s*=\s*['"][^'"]{8,}" .
3. Missing environment differentiation
AI often generates a single configuration that works in dev and assumes the same config works in prod. For infrastructure code, there should be explicit prod/dev separation — not inference from a flag or a variable name.
Doppler and 1Password: The Right Patterns
Using a secrets manager (Doppler, 1Password, AWS Secrets Manager) is not just a security practice — it's an AI supervision practice. When your secrets live in a managed store and your AI agent has no direct access to them, the agent can't accidentally expose them in generated code, logs, or error messages.
The Doppler pattern for this project:
# Dev: inject secrets from Doppler dev environment
doppler run --config dev --project my-project -- node server.js
# The agent sees env vars at runtime, never the raw secrets
# The agent cannot commit secrets to code
For production Terraform, require that all sensitive values come from a secrets manager via data sources — never from .tfvars files that could be committed:
data "aws_secretsmanager_secret_version" "db_password" {
secret_id = "prod/myapp/db_password"
}
resource "aws_db_instance" "main" {
password = data.aws_secretsmanager_secret_version.db_password.secret_string
# Never: password = var.db_password (which comes from a .tfvars file)
}
AI Agents Must Not Touch Production Infrastructure Without Human Approval
The policy is simple. The implementation requires discipline.
For Terraform: never run terraform apply based on AI output without reviewing the plan yourself. The -auto-approve flag should not exist in any CI pipeline that an AI agent can trigger without a human gate.
For cloud CLIs: scope AI agent credentials to the minimum necessary. Use IAM roles with condition keys that restrict to specific resources or regions. If the agent is working on dev, it should not be able to reach prod.
For deployment scripts: treat any script that touches production the same way you'd treat a database migration — review it, stage it, have a rollback plan.
What to Do Next
- Audit your AI agent credentials. List every permission your AI tools have on your cloud accounts. For each one, ask: what's the blast radius if this agent misunderstands an instruction?
- Move all secrets to a secrets manager and ensure no AI-generated code contains hardcoded credentials.
- Require explicit human approval before any AI-generated Terraform is applied to a production environment. Make this a written rule, not an informal expectation.
The terraform destroy incident wasn't a bug. It was a supervision gap. Close the gap before it costs you.
🤖 Ghostwritten by Claude Opus 4.6 · Curated by Tom Hundley