File Upload Architecture: Presigned URLs and Processing

Architecture Patterns — Part 14 of 30

The 273,000-File Problem Nobody Meant to Create

In late August 2025, security researchers at UpGuard discovered something that became one of the year's most instructive cloud architecture failures. A publicly accessible Amazon S3 bucket was serving up 273,000 PDF documents containing live Indian banking transaction data — account numbers, transaction amounts, names, phone numbers, email addresses. Not archived data. Active transactions. About 3,000 new files were landing in the bucket every day while researchers monitored it.

The company eventually identified was Nupay, an Indian fintech. Their explanation: a "configuration gap." But here's what actually happened at the architecture level: someone built a file processing pipeline, needed temporary write access for a service, set the bucket to public, and that setting was never corrected. The pipeline kept running. The files kept accumulating. Nobody noticed.

This is not a story about a careless developer. This is a story about the default failure mode of file upload architectures that aren't deliberately designed: public access creeps in for convenience, there's no post-upload processing gate, and a live production system quietly becomes a data exfiltration service.

The same architecture failure underlies a different category of incident. In March 2025, Yale New Haven Health had 5.5 million patient records exfiltrated — a breach that led to an $18 million class action settlement — traced to a vulnerability in a third-party file transfer service. Not a sophisticated zero-day attack. A file transfer system that lacked adequate architectural controls.

File uploads are load-bearing infrastructure. Today we build them correctly.

The Core Architecture Decision: Why Presigned URLs

When your application needs to accept file uploads, you have three broad options:

Server-proxied uploads — client uploads to your server, your server uploads to S3
Direct-to-S3 with temporary credentials — client uses AWS STS tokens
Presigned URL uploads — client uploads directly to S3 using a time-limited signed URL

Option 1 is what most vibe-coded apps do by default. It's also wrong for anything beyond trivial file sizes, because every byte passes through your application server, consuming bandwidth, compute, and memory. A 100 MB video upload means 100 MB in and 100 MB out, on your server, synchronously. This doesn't scale and it's expensive.

Option 2 exposes AWS credentials (even temporary ones) to clients and is harder to scope precisely.

Option 3 — presigned URLs — is the right answer for production systems, and understanding why informs all the design decisions that follow.

A presigned URL is a time-limited, scope-limited signature that authorizes a single S3 operation. Your server generates it using your IAM credentials, but the client never sees those credentials. The URL itself encodes the authorization. When it expires, it's useless. When it's used, it only allows the exact operation (PUT to a specific key in a specific bucket) that was signed.

This is the architecture the AWS team, Stripe, Cloudflare, and every major file-handling platform has converged on. As Forward Networks documented in their August 2025 guide, presigned URLs provide security-first sharing (only necessary access, time-limited), reduced risk (no public buckets, no over-permissioned roles), and a full audit trail of who generated what.

But presigned URLs are one piece of a pipeline. The mistake most builders make is treating URL generation as the entire solution. It isn't. A presigned URL tells you a file arrived. It tells you nothing about what the file is, whether it's safe, or whether it should ever be served to anyone.

The Pipeline Architecture: Three Buckets, One Flow

Production file upload architectures don't use one S3 bucket. They use three — and the separation between them is the core security mechanism.

┌──────────┐    presigned PUT    ┌──────────────────┐
│  Client  │ ─────────────────► │  incoming-bucket │
└──────────┘                    └──────────────────┘
                                        │
                                   S3 Event → Lambda
                                        │
                              ┌─────────┴─────────┐
                              ▼                   ▼
                     ┌──────────────┐    ┌────────────────┐
                     │ clean-bucket │    │quarantine-bucket│
                     └──────────────┘    └────────────────┘
                            │                    │
                      serve files           alert + hold

This is the Incoming / Clean / Quarantine Pattern. Here's why it works:

Incoming bucket: private, write-only from presigned URLs. No reads. No public access. This is where files land before any processing.
Clean bucket: files that passed all validation. Private, with controlled read access. Only files that have been verified arrive here.
Quarantine bucket: infected or invalid files. Locked down, accessible only for forensic investigation.

The critical invariant: nothing in the clean bucket has ever been directly accessible without processing. There's no code path where an unvalidated file ends up in a servable state. That invariant is what makes this architecture defensible — you can't forget to scan a file because scanning is required to move the file at all.

Generating Presigned URLs: The Right Way

URL generation seems straightforward but has several decision points that matter.

The Python Implementation

import boto3
from botocore.exceptions import ClientError
import uuid
import os
from typing import Optional

s3_client = boto3.client(
    's3',
    region_name=os.environ['AWS_REGION']
)

def generate_upload_url(
    user_id: str,
    filename: str,
    content_type: str,
    max_size_bytes: int = 10 * 1024 * 1024,  # 10 MB default
    expiry_seconds: int = 300  # 5 minutes
) -> dict:
    """
    Generate a presigned PUT URL for direct S3 upload.
    Returns the URL and the object key for tracking.
    """
    # Sanitize filename — never trust user input for S3 keys
    safe_filename = os.path.basename(filename).replace(' ', '_')
    
    # Namespace by user to prevent key collisions and enable per-user cleanup
    object_key = f"uploads/{user_id}/{uuid.uuid4()}/{safe_filename}"
    
    # Validate content type against allowlist
    allowed_types = {
        'image/jpeg', 'image/png', 'image/gif', 'image/webp',
        'application/pdf',
        'text/plain', 'text/csv',
    }
    if content_type not in allowed_types:
        raise ValueError(f"Content type not allowed: {content_type}")
    
    try:
        presigned_url = s3_client.generate_presigned_url(
            'put_object',
            Params={
                'Bucket': os.environ['INCOMING_BUCKET'],
                'Key': object_key,
                'ContentType': content_type,
                # Lock content length — prevents oversized uploads
                'ContentLength': max_size_bytes,
            },
            ExpiresIn=expiry_seconds
        )
        
        return {
            'upload_url': presigned_url,
            'object_key': object_key,
            'expires_in': expiry_seconds,
        }
    except ClientError as e:
        raise RuntimeError(f"Failed to generate presigned URL: {e}")

Several decisions embedded in this code deserve explanation:

The 5-minute expiry is intentional. Long-lived presigned URLs are effectively public access with extra steps. If a URL is leaked in logs, cached in a proxy, or accidentally shared, a 5-minute window limits the blast radius. For large file uploads that take longer, generate a fresh URL — don't extend the expiry.

The UUID prefix is non-negotiable. Without it, a user who knows the naming pattern can overwrite another user's file by predicting the key. The UUID makes every key unique and unpredictable.

Content type is validated before URL generation, not after. Presigned URLs can be locked to a specific Content-Type header value. If the client sends a different content type header than what was signed, S3 rejects the upload. This is not a complete security check — a client can upload any bytes while claiming the allowed content type — but it eliminates the most obvious abuse.

ContentLength locking prevents oversized uploads from completing. A user can't upload a 4 GB file to your "accepts images" endpoint if the URL is locked to 10 MB.

The TypeScript/Node.js Version

import { S3Client, PutObjectCommand } from "@aws-sdk/client-s3";
import { getSignedUrl } from "@aws-sdk/s3-request-presigner";
import { randomUUID } from "crypto";
import path from "path";

const s3 = new S3Client({ region: process.env.AWS_REGION! });

const ALLOWED_CONTENT_TYPES = new Set([
  'image/jpeg',
  'image/png', 
  'image/gif',
  'image/webp',
  'application/pdf',
]);

export async function generateUploadUrl(
  userId: string,
  filename: string,
  contentType: string,
  expiresIn = 300 // 5 minutes
): Promise<{ uploadUrl: string; objectKey: string }> {
  
  if (!ALLOWED_CONTENT_TYPES.has(contentType)) {
    throw new Error(`Content type not permitted: ${contentType}`);
  }
  
  const safeFilename = path.basename(filename).replace(/[^a-zA-Z0-9._-]/g, '_');
  const objectKey = `uploads/${userId}/${randomUUID()}/${safeFilename}`;
  
  const command = new PutObjectCommand({
    Bucket: process.env.INCOMING_BUCKET!,
    Key: objectKey,
    ContentType: contentType,
  });
  
  const uploadUrl = await getSignedUrl(s3, command, { expiresIn });
  
  return { uploadUrl, objectKey };
}

The Processing Lambda: Validate, Scan, Route

When a file lands in the incoming bucket, an S3 event notification fires a Lambda function. This function is the gatekeeper — nothing reaches the clean bucket without passing through it.

Why Content-Type Headers Aren't Enough

This is the question that came up prominently in a December 2025 AWS community thread on presigned URL security: can you just enforce content type on the presigned URL and call it done?

The answer is definitively no. A content type header is metadata. It's what the client says the file is. Nothing prevents a client from uploading a PHP script with a Content-Type: image/jpeg header. The bytes land in your bucket labeled as a JPEG. Your application downloads and processes it thinking it's an image. Depending on what your processing does, you've just executed a PHP file.

The correct approach layers three checks:

Magic byte validation — check the actual file signature, not the declared type
MIME type re-detection — use python-magic or similar to detect the true type from content
Malware scanning — check the actual content for known threats

import boto3
import magic  # python-magic library
import hashlib
import os
import tempfile
from typing import Optional

s3 = boto3.client('s3')
guardduty = boto3.client('guardduty')

# Magic byte signatures for allowed types
MAGIC_SIGNATURES = {
    b'\xff\xd8\xff': 'image/jpeg',
    b'\x89PNG\r\n\x1a\n': 'image/png',
    b'GIF87a': 'image/gif',
    b'GIF89a': 'image/gif',
    b'RIFF': 'image/webp',  # partial, WebP starts with RIFF....WEBP
    b'%PDF': 'application/pdf',
}

def validate_file_type(file_bytes: bytes, declared_type: str) -> Optional[str]:
    """
    Returns the detected MIME type if valid, raises ValueError if suspicious.
    """
    # Check magic bytes
    for signature, mime_type in MAGIC_SIGNATURES.items():
        if file_bytes[:len(signature)] == signature:
            if mime_type != declared_type:
                raise ValueError(
                    f"File signature mismatch: declared {declared_type}, "
                    f"actual {mime_type}"
                )
            return mime_type
    
    # Fallback: python-magic full detection
    detected = magic.from_buffer(file_bytes[:2048], mime=True)
    if detected != declared_type:
        raise ValueError(
            f"MIME detection mismatch: declared {declared_type}, detected {detected}"
        )
    
    return detected


def lambda_handler(event, context):
    record = event['Records'][0]
    bucket = record['s3']['bucket']['name']
    key = record['s3']['object']['key']
    
    clean_bucket = os.environ['CLEAN_BUCKET']
    quarantine_bucket = os.environ['QUARANTINE_BUCKET']
    
    # Download to /tmp (Lambda ephemeral storage)
    with tempfile.NamedTemporaryFile(delete=False) as tmp:
        s3.download_fileobj(bucket, key, tmp)
        tmp_path = tmp.name
    
    try:
        with open(tmp_path, 'rb') as f:
            file_bytes = f.read()
        
        # Step 1: Size check (defense in depth against ContentLength bypass attempts)
        max_bytes = int(os.environ.get('MAX_FILE_BYTES', 10 * 1024 * 1024))
        if len(file_bytes) > max_bytes:
            _quarantine(s3, bucket, key, quarantine_bucket, reason='size_exceeded')
            return
        
        # Step 2: Magic byte / MIME validation
        # Extract declared type from object metadata
        head = s3.head_object(Bucket=bucket, Key=key)
        declared_type = head['ContentType']
        
        try:
            validate_file_type(file_bytes, declared_type)
        except ValueError as e:
            _quarantine(s3, bucket, key, quarantine_bucket, reason=f'type_mismatch:{e}')
            return
        
        # Step 3: Compute SHA-256 for deduplication / audit trail
        file_hash = hashlib.sha256(file_bytes).hexdigest()
        
        # Step 4: Move to clean bucket with scan metadata tag
        # Note: GuardDuty Malware Protection for S3 can scan asynchronously
        # and tag objects automatically — enable it at the bucket level.
        # This Lambda handles sync validation; GuardDuty handles deep scanning.
        s3.copy_object(
            Bucket=clean_bucket,
            Key=key,
            CopySource={'Bucket': bucket, 'Key': key},
            Tagging=(
                f"validation-status=passed"
                f"&sha256={file_hash}"
                f"&declared-type={declared_type}"
            ),
            TaggingDirective='REPLACE'
        )
        
        # Clean up incoming object
        s3.delete_object(Bucket=bucket, Key=key)
        
        print(f"File validated and moved to clean bucket: {key}")
    
    finally:
        os.unlink(tmp_path)


def _quarantine(s3_client, source_bucket: str, key: str, 
                quarantine_bucket: str, reason: str):
    """Move rejected file to quarantine with reason tag."""
    s3_client.copy_object(
        Bucket=quarantine_bucket,
        Key=key,
        CopySource={'Bucket': source_bucket, 'Key': key},
        Tagging=f"quarantine-reason={reason}",
        TaggingDirective='REPLACE'
    )
    s3_client.delete_object(Bucket=source_bucket, Key=key)
    print(f"File quarantined: {key} — reason: {reason}")

Integrating GuardDuty Malware Protection for S3

In September 2025, AWS expanded GuardDuty Malware Protection for S3 to support files up to 100 GB (up from 5 GB) and archives up to 10,000 files. This is now production-viable for most real-world workloads.

The integration model is event-driven: GuardDuty monitors your incoming bucket, scans objects on upload, and publishes results to EventBridge. You then route those events to SQS → Lambda for remediation. This keeps the hot path (presigned URL generation, user-facing response) fully synchronous while malware scanning happens asynchronously.

# Enable GuardDuty Malware Protection for a specific bucket
# (requires GuardDuty to already be enabled in the account)
aws guardduty create-malware-protection-plan \
  --role arn:aws:iam::ACCOUNT_ID:role/GuardDutyMalwareProtectionRole \
  --protected-resource S3Bucket='{BucketName="incoming-upload-bucket",ObjectPrefixes=["uploads/"]}' \
  --actions '{Tagging={Status="ENABLED",KeyId="GuardDutyMalwareScanStatus"}}'

# Create EventBridge rule to catch malware findings
aws events put-rule \
  --name "guardduty-malware-findings" \
  --event-pattern '{
    "source": ["aws.guardduty"],
    "detail-type": ["GuardDuty Malware Protection Object Scan Result"],
    "detail": {
      "scanResultDetails": {
        "scanResultStatus": ["THREATS_FOUND"]
      }
    }
  }'

The Lambda that handles GuardDuty findings is simpler than your validation Lambda — the hard work is already done:

def handle_guardduty_finding(event, context):
    detail = event['detail']
    bucket = detail['s3ObjectDetails']['bucketName']
    key = detail['s3ObjectDetails']['objectKey']
    threats = detail['scanResultDetails']['threats']
    
    # Move to quarantine
    quarantine_bucket = os.environ['QUARANTINE_BUCKET']
    boto3.client('s3').copy_object(
        Bucket=quarantine_bucket,
        Key=key,
        CopySource={'Bucket': bucket, 'Key': key},
        Tagging=f"quarantine-reason=malware&threats={','.join(t['name'] for t in threats)}",
        TaggingDirective='REPLACE'
    )
    boto3.client('s3').delete_object(Bucket=bucket, Key=key)
    
    # Alert via SNS
    boto3.client('sns').publish(
        TopicArn=os.environ['SECURITY_ALERTS_TOPIC'],
        Subject=f"Malware detected: {key}",
        Message=f"Threats found: {threats}\nBucket: {bucket}\nKey: {key}"
    )

IAM: The Permissions Architecture

The three-bucket architecture only works if the IAM permissions enforce the invariants. Here's the minimal permission set:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowPresignedUrlGeneration",
      "Effect": "Allow",
      "Action": ["s3:PutObject"],
      "Resource": "arn:aws:s3:::incoming-upload-bucket/uploads/*",
      "Condition": {
        "StringEquals": {
          "s3:x-amz-content-sha256": "UNSIGNED-PAYLOAD"
        },
        "NumericLessThanEquals": {
          "s3:content-length": "10485760"
        }
      }
    }
  ]
}

The bucket policy for the incoming bucket must deny all public access and all GetObject — even for authenticated IAM principals that aren't the processing Lambda:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyPublicAccess",
      "Effect": "Deny",
      "Principal": "*",
      "Action": ["s3:GetObject"],
      "Resource": "arn:aws:s3:::incoming-upload-bucket/*"
    },
    {
      "Sid": "AllowProcessingLambdaRead",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::ACCOUNT_ID:role/FileProcessingLambdaRole"
      },
      "Action": ["s3:GetObject", "s3:DeleteObject", "s3:HeadObject"],
      "Resource": "arn:aws:s3:::incoming-upload-bucket/*"
    }
  ]
}

The clean bucket policy should require that objects have been tagged with validation-status=passed before GetObject is allowed:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "RequireValidationTag",
      "Effect": "Deny",
      "Principal": "*",
      "Action": ["s3:GetObject"],
      "Resource": "arn:aws:s3:::clean-upload-bucket/*",
      "Condition": {
        "StringNotEquals": {
          "s3:ExistingObjectTag/validation-status": "passed"
        }
      }
    }
  ]
}

This policy means that even if an object somehow ends up in the clean bucket without the tag — through a bug in your Lambda, a direct S3 CLI copy, anything — it cannot be served. The policy is the safety net that survives code bugs.

The Multipart Upload Decision

For files above ~100 MB, single presigned PUT uploads become unreliable. Network interruptions fail the entire upload. The correct architecture switches to multipart uploads — but the implementation is more complex.

As Bright Inventions documented in May 2025, multipart uploads with presigned URLs require three backend endpoints: initialize upload (returns an UploadId), generate presigned URLs per part, and complete the upload (assembles parts into the final object).

// Backend: Initialize multipart upload
export async function initMultipartUpload(
  userId: string,
  filename: string,
  contentType: string
): Promise<{ uploadId: string; key: string }> {
  const key = `uploads/${userId}/${randomUUID()}/${path.basename(filename)}`;
  
  const { UploadId } = await s3.send(new CreateMultipartUploadCommand({
    Bucket: process.env.INCOMING_BUCKET!,
    Key: key,
    ContentType: contentType,
  }));
  
  return { uploadId: UploadId!, key };
}

// Backend: Generate presigned URLs for each part
export async function generatePartUrls(
  key: string,
  uploadId: string,
  partCount: number
): Promise<string[]> {
  // Each part URL expires in 15 minutes — plenty for one chunk
  return Promise.all(
    Array.from({ length: partCount }, (_, i) =>
      getSignedUrl(s3, new UploadPartCommand({
        Bucket: process.env.INCOMING_BUCKET!,
        Key: key,
        UploadId: uploadId,
        PartNumber: i + 1,
      }), { expiresIn: 900 })
    )
  );
}

// Backend: Complete multipart upload
export async function completeMultipartUpload(
  key: string,
  uploadId: string,
  parts: Array<{ PartNumber: number; ETag: string }>
): Promise<void> {
  await s3.send(new CompleteMultipartUploadCommand({
    Bucket: process.env.INCOMING_BUCKET!,
    Key: key,
    UploadId: uploadId,
    MultipartUpload: { Parts: parts },
  }));
}

Don't forget lifecycle cleanup. Incomplete multipart uploads persist in S3 and accrue storage costs indefinitely unless you configure a lifecycle policy:

aws s3api put-bucket-lifecycle-configuration \
  --bucket incoming-upload-bucket \
  --lifecycle-configuration '{
    "Rules": [{
      "ID": "abort-incomplete-multipart",
      "Status": "Enabled",
      "Filter": {"Prefix": "uploads/"},
      "AbortIncompleteMultipartUpload": {
        "DaysAfterInitiation": 1
      }
    }]
  }'

This single policy eliminates a class of surprise bills that shows up in AWS cost anomaly reports months after launch.

The Architecture Decision Framework

When you're designing a file upload system, work through these questions:

1. What's the maximum file size?

Under 100 MB → single presigned PUT, 5-minute expiry
100 MB to 5 GB → multipart presigned upload, add lifecycle cleanup
Over 5 GB → multipart required; also evaluate whether Lambda is the right processing runtime or if ECS containers are needed

2. What are the files used for?

User-generated content served to others → malware scanning is mandatory
Internal documents, analytics → validation still required, scanning depends on threat model
Files that trigger processing (imports, data uploads) → extra scrutiny on content; path traversal in filenames is a real attack vector

3. What's the latency tolerance for post-upload availability?

Immediate availability required → synchronous validation in Lambda + GuardDuty async scan with a quarantine fallback
Slight delay acceptable → fully async pipeline; respond with "processing" status and notify when ready
Background processing fine → decouple entirely via SQS queue

4. What's the compliance environment?

HIPAA, PCI, SOC 2 → S3 object versioning required, server-side encryption required, access logs required
GDPR → know exactly where files land, region constraints, right-to-deletion path must be designed in
No specific compliance → still use SSE-S3 or SSE-KMS; it's table stakes in 2026

5. Do you need to support resumable uploads?

Yes → multipart is the only viable option; build a state management layer to track UploadId and completed parts across sessions
No → single presigned PUT, simpler

What Actually Goes Wrong in Production

After running file upload pipelines at scale, here are the failure modes that aren't covered in tutorials:

Presigned URL expiry during large file uploads. A user starts a 4 GB upload on a slow connection. The 15-minute presigned URL expires while the upload is 70% complete. Now what? Build a URL refresh endpoint that regenerates presigned URLs for in-progress multipart uploads, keyed by UploadId.

CORS configuration blocking uploads from the correct domain. Your presigned URL works perfectly in Postman. It fails silently in the browser. Add the CORS configuration to the incoming bucket before launch:

aws s3api put-bucket-cors \
  --bucket incoming-upload-bucket \
  --cors-configuration '{
    "CORSRules": [{
      "AllowedHeaders": ["Content-Type", "Content-Length", "x-amz-content-sha256"],
      "AllowedMethods": ["PUT"],
      "AllowedOrigins": ["https://yourdomain.com"],
      "MaxAgeSeconds": 3000
    }]
  }'

Lambda /tmp storage exhaustion. Lambda functions have 512 MB of ephemeral /tmp storage by default (configurable to 10 GB). A 400 MB file processed concurrently across 10 Lambda invocations doesn't create 10 × 400 MB of state — each invocation has its own /tmp. But if your function isn't cleaning up temp files after processing, storage accumulates within a warm Lambda container and eventually throws ENOSPC errors. Always os.unlink() in a finally block.

The zombie multipart problem. A user starts an upload, the network drops, they never complete it. No lifecycle policy means that partial upload sits in S3 forever, invisible in the console, accumulating storage costs. One e-commerce platform I worked with discovered 340 GB of incomplete multipart uploads from a feature that had been deprecated 18 months earlier. Set the lifecycle policy before you accept your first upload.

Double-processing from S3 event retries. S3 event notifications retry on Lambda failure. If your Lambda fails midway through processing — after copying to clean but before deleting from incoming — you'll get a second invocation trying to process a key that no longer exists. Make your Lambda idempotent: check whether the destination key already exists before copying.

Checklist

Use presigned PUTs, never proxy uploads through your application server. Every byte through your app is bandwidth and compute you're paying for unnecessarily.
Implement the three-bucket pattern. Incoming (write-only), clean (validated files only), quarantine (rejected files). Never serve directly from incoming.
Set content type allowlist at URL generation. Reject unsupported types before generating the URL, not after the upload completes.
Validate magic bytes, not just declared content type. A JPEG header check takes 8 bytes of file content. Skip it and you're trusting clients.
Enable GuardDuty Malware Protection for S3 on your incoming bucket. Since September 2025 it handles files up to 100 GB and 10,000-file archives.
Set 5-minute URL expiry for standard uploads. Longer-lived URLs are security debt.
Set AbortIncompleteMultipartUpload lifecycle policy on the incoming bucket. 1 day is reasonable. Do this before accepting your first upload.
Configure CORS on the incoming bucket with explicit allowed origins. Test in a browser, not just via curl or Postman.
Use IAM bucket policies that require the validation-status=passed tag for GetObject on the clean bucket. Policy-level enforcement survives code bugs.
Never trust the filename from the user. Always sanitize, strip path separators, and prepend a UUID. Path traversal via filenames is a real attack.
Make your processing Lambda idempotent. S3 event notifications can retry. Check for existing destination objects before copying.
Enable S3 versioning and access logging on all three buckets for compliance and forensics.
For GDPR/right-to-deletion: make sure your file key schema lets you find and delete all objects belonging to a user (uploads/{user_id}/... is the pattern).
Test your expiry handling. What does the user see when a presigned URL expires mid-upload? It should not be a silent failure.

Ask The Guild

Here's an architectural tension that plays out differently depending on your product:

The synchronous vs. asynchronous processing decision. If your validation Lambda runs synchronously after upload — the client polls for result, gets "processing" → "ready" — you have a clean user experience but you've coupled upload completion to scan completion. If GuardDuty takes 30 seconds on a large file, the user waits 30 seconds before they can use their upload.

The async alternative: immediately return "upload received" and notify the user when processing completes (email, webhook, websocket event). Users get faster feedback that their upload arrived, but they can't immediately use it. For a profile photo, sync is fine. For a 500 MB data import, async is the only viable approach.

The real question is what happens when scanning fails. If your processing Lambda throws an error, does the file stay in incoming forever? Or does it get quarantined? Or does it get a "pending" tag and retry? And who is responsible for noticing that a file has been stuck in "pending" for 48 hours?

Share in the Guild Discord: How do you handle the failure case in your upload pipelines? Have you been surprised by S3 storage costs from orphaned files or incomplete multiparts? And what's your approach to sync vs. async processing — does it match your product's actual latency requirements, or was it the path of least resistance?