Structured Logging: Making Your Logs Searchable

Production Ready — Part 4 of 30

It was 2:47 AM on a Tuesday. I was on a bridge call with six engineers staring at a wall of logs that looked like this:

[INFO] 2025-09-14 02:47:12 - Processing request
[ERROR] 2025-09-14 02:47:13 - Something went wrong
[INFO] 2025-09-14 02:47:13 - Done
[ERROR] 2025-09-14 02:47:14 - Something went wrong

Payments were failing for a subset of users. Revenue was bleeding. The on-call engineer kept repeating the phrase I dread most: "I'm just grepping through logs."

We spent 90 minutes in that call before someone thought to check a different service. Turned out the issue was a card processor returning a new error code we'd never seen. That information was in the logs — buried in a formatted string, invisible to any search tool, impossible to aggregate.

If those logs had been structured, we'd have found the payment_processor_code: DECLINED_NEW_REASON field in under two minutes.

That night cost the company roughly $40,000 in lost transactions. Structured logging would have cost about four hours of developer time to implement.

What Structured Logging Actually Is

Plain-text logging is what most of us start with. It's natural — you're a human writing messages for humans to read:

[2025-09-14 14:23:01] ERROR Payment failed for user 12345, amount $99.99, reason: card declined

The problem is that machines can't reliably parse this. If you want to find all failures for users with amounts over $50, you're writing a regex. If the format changes slightly next sprint, your regex breaks. If you're running 50 microservices, every team has a slightly different format, and now your grep scripts are a mess of special cases.

Structured logging means your log entries are data, not sentences. Each field is a named key with a typed value:

{
  "timestamp": "2025-09-14T14:23:01.412Z",
  "level": "error",
  "service": "payment-service",
  "version": "1.4.2",
  "message": "Transaction failed",
  "user_id": 12345,
  "payment": {
    "amount": 99.99,
    "currency": "USD",
    "processor_code": "CARD_DECLINED",
    "reason": "insufficient_funds"
  },
  "request_id": "req_8f3b2e1a",
  "duration_ms": 234
}

Now you can query: "Show me all payment failures in the last hour where payment.amount > 50 and payment.processor_code is not CARD_DECLINED." That query runs in two seconds instead of ninety minutes.

The Real Cost of Unstructured Logs

A 2025 case study on DEV Community documented a Next.js application that switched from console.log() strings to structured JSON. The before/after numbers are striking:

Metric	Before	After
Monthly observability cost	$10,400	$3,800
Query response time	30–60 seconds	2–5 seconds
Data ingest volume	924 GB/month	340 GB/month
Alert false positives	baseline	reduced 40%
Log retention window	15 days	30 days

That's a 63% cost reduction — not from switching vendors, not from optimizing infrastructure, just from making logs machine-readable so they could be intelligently sampled and filtered.

The key insight: unstructured logs force you to keep everything because you can't easily tell what's noise. Structured logs let you keep 100% of errors and security events, sample routine API requests at 10%, and drop debug logs in production entirely. Same visibility, a fraction of the volume.

Setting Up Structured Logging in Python

Python's built-in logging module can output JSON with the right handler. The most common library is python-json-logger.

Security note: In early 2025, python-json-logger had CVE-2025-27607 — a dependency confusion vulnerability in dev installs on Python 3.13. It's patched in version 3.3.0+. Always pin your dependencies and run pip install --upgrade python-json-logger>=3.3.0.

import logging
import sys
from pythonjsonlogger import jsonlogger

def setup_logging(service_name: str, version: str, environment: str):
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)

    handler = logging.StreamHandler(sys.stdout)

    formatter = jsonlogger.JsonFormatter(
        fmt='%(asctime)s %(levelname)s %(name)s %(message)s',
        datefmt='%Y-%m-%dT%H:%M:%S',
        rename_fields={'levelname': 'level', 'asctime': 'timestamp'}
    )
    handler.setFormatter(formatter)
    logger.addHandler(handler)

    # Bind base context to every log entry
    logging.LoggerAdapter(logger, {
        'service': service_name,
        'version': version,
        'environment': environment
    })

    return logger

logger = setup_logging('payment-service', '1.4.2', 'production')

# Usage — pass structured fields as keyword arguments
logger.info('Transaction completed', extra={
    'event': 'payment_success',
    'user_id': 12345,
    'amount': 99.99,
    'currency': 'USD',
    'duration_ms': 234,
    'request_id': 'req_8f3b2e1a'
})

logger.error('Transaction failed', extra={
    'event': 'payment_failure',
    'user_id': 12345,
    'amount': 99.99,
    'processor_code': 'CARD_DECLINED',
    'request_id': 'req_8f3b2e1a'
})

Output (one line per event, machine-readable):

{"timestamp": "2025-09-14T14:23:01", "level": "INFO", "message": "Transaction completed", "service": "payment-service", "event": "payment_success", "user_id": 12345, "amount": 99.99, "request_id": "req_8f3b2e1a"}

Setting Up Structured Logging in Node.js / TypeScript

For Node.js, Pino is the go-to library — it's fast, outputs JSON by default, and has a great ecosystem.

import pino from 'pino';

const logger = pino({
  level: process.env.LOG_LEVEL || 'info',
  base: {
    service: 'payment-service',
    version: process.env.npm_package_version,
    environment: process.env.NODE_ENV,
  },
  timestamp: pino.stdTimeFunctions.isoTime,
  // In production, don't pretty-print — raw JSON for log aggregators
  transport: process.env.NODE_ENV === 'development'
    ? { target: 'pino-pretty' }
    : undefined,
});

// Create child loggers with shared context for a request
export function createRequestLogger(requestId: string, userId?: number) {
  return logger.child({ request_id: requestId, user_id: userId });
}

// Usage in an Express route
app.post('/payments', async (req, res) => {
  const log = createRequestLogger(req.headers['x-request-id'] as string, req.user?.id);

  log.info({ event: 'payment_initiated', amount: req.body.amount, currency: req.body.currency });

  try {
    const result = await processPayment(req.body);
    log.info({ event: 'payment_success', transaction_id: result.id, duration_ms: result.durationMs });
    res.json(result);
  } catch (err) {
    log.error({
      event: 'payment_failure',
      error_type: err.constructor.name,
      processor_code: err.processorCode,
      err, // Pino serializes Error objects correctly
    }, 'Transaction failed');
    res.status(500).json({ error: 'Payment failed' });
  }
});

Critical detail: Pino serializes Error objects properly — message, stack trace, and any custom properties all appear as structured fields. With console.error(err), a stack trace becomes multi-line text that most log aggregators split into separate events, losing the connection to the original request.

The Three Fields You Cannot Skip

Every log entry needs these, no exceptions:

1. `request_id` (or `trace_id`)

This is what lets you follow a single request across 10 microservices. Generate it at the entry point (API gateway, load balancer, first service) and thread it through every downstream call via HTTP headers.

// Middleware: attach request ID to every log in this request's lifecycle
app.use((req, res, next) => {
  req.requestId = req.headers['x-request-id'] || crypto.randomUUID();
  res.setHeader('x-request-id', req.requestId);
  req.log = logger.child({ request_id: req.requestId });
  next();
});

2. `event` (a machine-readable event name)

Don't just write message: "User logged in". Add event: "user_login_success". This is the field you query dashboards and alerts on. It never changes even if the message text gets rewording.

3. `duration_ms`

Log how long things take. Always. A payment that normally takes 200ms taking 4000ms is a signal — even if it succeeds. You'll only notice the trend if you can aggregate duration_ms across thousands of requests.

Verifying Your Setup

Before you ship, test that your logs are actually valid JSON:

# Start your app and pipe logs through jq
npm start 2>&1 | jq .

# If jq parses it cleanly, you're good. If you see parse errors,
# you have multi-line output or something that isn't JSON.

# Spot-check for the fields you care about
npm start 2>&1 | jq 'select(.level == "error") | {event, user_id, error_type}'

# For Python
python app.py 2>&1 | python3 -m json.tool

If jq returns errors, you almost certainly have a multi-line stack trace leaking through. Fix it before deploying — a broken log format is worse than no log format because it silently corrupts your aggregation pipeline.

What NOT to Log

Structured logging makes it dangerously easy to log too much. JSON is flexible — it's tempting to serialize entire request bodies or user objects.

Don't.

Never log passwords, tokens, or API keys — obvious, but easy to miss in a hurried debug session
Never log full credit card numbers (GDPR and PCI DSS will ruin your day)
Redact PII at the source, not downstream — a pipeline filter is a second line of defense, not the first
Don't log entire request/response bodies unless you have a specific, scoped reason — that's where the leaks live

# Bad — logs the full user object, which might contain PII
logger.info('User updated', extra={'user': user.__dict__})

# Good — log only the fields you need
logger.info('User updated', extra={
    'event': 'user_profile_updated',
    'user_id': user.id,
    'fields_changed': list(changed_fields.keys())  # field names, not values
})

Log Levels: Use Them Deliberately

Structured logs with wrong levels are still a mess. Here's the map I use:

Level	When to use	Production volume
`DEBUG`	Variable values, step tracing	Off by default
`INFO`	Normal business events (login, purchase, export)	Sampled (10–30%)
`WARN`	Expected problems, graceful degradation	100%
`ERROR`	Failures requiring attention	100%
`FATAL` / `CRITICAL`	Service is going down	100%

According to observability best practices documented for 2026, your log levels must be configurable at runtime without a redeploy. When an incident happens, you need to temporarily crank up verbosity on one service without touching the others. If that requires a deployment, you've added 10+ minutes to every incident response.

The Structured Logging Checklist

Before you consider a service production-ready, verify every item:

All log output is valid JSON (test with jq .)
Every entry has: timestamp (ISO 8601, UTC), level, service, version, environment
Every request-scoped entry has a request_id or trace_id
Business events use a consistent event field with machine-readable names
duration_ms is logged for all external calls (DB, APIs, queues)
Errors include error_type and full stack trace as a single escaped string (not multi-line)
No passwords, tokens, or full PII in any log entry
Log levels are configurable at runtime without redeployment
DEBUG logs are off by default in production
Multi-line stack traces are serialized as single escaped strings
You've run the output through jq and confirmed it parses cleanly
python-json-logger is pinned to >= 3.3.0 if used in Python

Ask The Guild

Community prompt: What's the worst debugging experience you've had because of poor logging? Share your story — the chaos of the 3 AM incident, what the logs said (or didn't say), and how you'd write them differently now. Bonus points if you have a before/after of a log line that changed how you debug. Drop it in the comments and let's build a hall of shame (and glory) for log formats.

Tom Hundley is a software architect with 25 years of experience. He has been on more 3 AM bridge calls than he'd like to count, and believes that good observability is an act of kindness toward your future on-call self.

What Structured Logging Actually Is

The Real Cost of Unstructured Logs

Setting Up Structured Logging in Python

Setting Up Structured Logging in Node.js / TypeScript

The Three Fields You Cannot Skip

1. `request_id` (or `trace_id`)

2. `event` (a machine-readable event name)

3. `duration_ms`

Verifying Your Setup

What NOT to Log

Log Levels: Use Them Deliberately

The Structured Logging Checklist

Ask The Guild

Review and debug

Debug This Without Thrashing

Use this production insight inside a full build sequence

DevOps and Deployment

About Tom Hundley

Structured Logging: Making Your Logs Searchable

What Structured Logging Actually Is

The Real Cost of Unstructured Logs

Setting Up Structured Logging in Python

Setting Up Structured Logging in Node.js / TypeScript

The Three Fields You Cannot Skip

1. request_id (or trace_id)

2. event (a machine-readable event name)

3. duration_ms

Verifying Your Setup

What NOT to Log

Log Levels: Use Them Deliberately

The Structured Logging Checklist

Ask The Guild

Review and debug

Debug This Without Thrashing

Use this production insight inside a full build sequence

DevOps and Deployment

About Tom Hundley

1. `request_id` (or `trace_id`)

2. `event` (a machine-readable event name)

3. `duration_ms`