Devops

Logging & Observability

"Where are my logs?" - the most common question when debugging containers. Docker handles logging differently than traditional deployments. This article covers logging drivers, structured logging, metrics, health checks, and building observable containerized applications.

📋 At a Glance

AspectDetails
TopicLogging drivers, structured logging, health checks, metrics
ComplexityIntermediate
PrerequisitesBasic Docker usage, Part 1 (Container Internals)
Key InsightContainers should log to stdout/stderr - Docker handles the rest
Time to Master2-3 hours

🎯 What You'll Learn

  • Logging drivers - json-file, syslog, fluentd, and when to use each
  • Structured logging - JSON logs for machine parsing
  • Health checks - liveness vs readiness, implementing correctly
  • Metrics collection - Prometheus patterns for containers
  • Debugging without logs - when logs aren't enough

🔥 Production Story: The Silent Failure

An application ran fine for weeks, then started dropping requests. No errors in logs. The team added more replicas, but problems persisted.

Investigation:
BASH(6 lines)
Code
Loading syntax highlighter...

Everything looked fine. But checking deeper:

BASH(5 lines)
Code
Loading syntax highlighter...
Root cause: The health check only verified the HTTP server was responding. The database connection pool was exhausted - app could respond to /health but not process actual requests.
The fix:
DOCKERFILE(2 lines)
Code
Loading syntax highlighter...
JAVASCRIPT(10 lines)
Code
Loading syntax highlighter...
Lesson: Health checks must verify actual functionality, not just "is the process running."

🧠 Mental Model: Container Observability Stack

┌─────────────────────────────────────────────────────────────────────────┐
│                    CONTAINER OBSERVABILITY                              │
│                                                                         │
│  ┌──────────────────────────────────────────────────────────────────────┐
│  │                      APPLICATION                                     │
│  │                                                                      │
│  │  Log to stdout/stderr ──────────────────────────────────────────┐    │
│  │  Expose /metrics endpoint ──────────────────────────────────┐   │    │ 
│  │  Implement /health endpoints ─────────────────────────────┐ │   │    │
│  └───────────────────────────────────────────────────────────┼─┼───┼────┘
│                                                              │ │   │
│  ┌───────────────────────────────────────────────────────────┼─┼───┼────┐
│  │                   DOCKER DAEMON                           │ │   │    │
│  │                                                           │ │   │    │
│  │  Logging Driver ◄─────────────────────────────────────────┘ │   │    │
│  │  (json-file, fluentd, etc.)                                 │   │    │
│  │      │                                                      │   │    │
│  │      ├── json-file ──► /var/lib/docker/containers/*/        │   │    │
│  │      ├── syslog ────► syslog server                         │   │    │
│  │      ├── fluentd ───► Fluentd/Fluent Bit                    │   │    │
│  │      └── awslogs ───► CloudWatch                            │   │    │
│  │                                                             │   │    │
│  │  Health Check ◄─────────────────────────────────────────────┘   │    │
│  │  (HEALTHCHECK instruction)                                      │    │
│  │      │                                                          │    │
│  │      └── Updates container status                               │    │
│  │                                                                 │    │
│  └─────────────────────────────────────────────────────────────────┼────┘
│                                                                    │
│  ┌─────────────────────────────────────────────────────────────────┼───┐
│  │                   MONITORING SYSTEM                             │   │
│  │                                                                 │   │
│  │  Prometheus ◄───────────────────────────────────────────────────┘   │
│  │  (scrapes /metrics)                                                 │
│  │      │                                                              │
│  │      └── Grafana (visualization)                                    │
│  │                                                                     │
│  │  Log Aggregator (ELK, Loki, etc.)                                   │
│  │      │                                                              │
│  │      └── Receives logs from driver                                  │
│  │                                                                     │
│  └─────────────────────────────────────────────────────────────────────┘
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘

🔬 Deep Dive

The Twelve-Factor App Logging Principle

Containers follow the 12-factor app methodology for logs:

"
Treat logs as event streams. A twelve-factor app never concerns itself with routing or storage of its output stream.
Translation: Write to stdout/stderr. Let the platform (Docker) handle the rest.
PYTHON(8 lines)
Code
Loading syntax highlighter...
JAVASCRIPT(6 lines)
Code
Loading syntax highlighter...

Logging Drivers

Docker captures stdout/stderr and sends to a logging driver:

BASH(8 lines)
Code
Loading syntax highlighter...
Available drivers:
DriverDescriptionUse Case
json-fileJSON files on disk (default)Development, simple deployments
syslogSystem syslogTraditional infrastructure
journaldsystemd journalLinux with systemd
fluentdFluentd collectorKubernetes, centralized logging
awslogsAWS CloudWatchAWS deployments
gcplogsGoogle Cloud LoggingGCP deployments
noneDisable loggingWhen logs handled elsewhere
Configure default driver:
JSON(8 lines)
Code
Loading syntax highlighter...

JSON-File Driver (Default)

Most common for development and simple deployments:

BASH(11 lines)
Code
Loading syntax highlighter...
Configure log rotation:
YAML(10 lines)
Code
Loading syntax highlighter...
BASH(5 lines)
Code
Loading syntax highlighter...

Structured Logging

JSON logs are machine-parseable:

PYTHON(21 lines)
Code
Loading syntax highlighter...
JAVASCRIPT(5 lines)
Code
Loading syntax highlighter...
Benefits of structured logging:
  • Machine-parseable (Elasticsearch, Loki, etc.)
  • Filterable by fields
  • Aggregatable for metrics
  • Consistent format

Health Checks

Health checks tell Docker (and orchestrators) if a container is working:

DOCKERFILE(11 lines)
Code
Loading syntax highlighter...
Health check options:
OptionDefaultDescription
--interval30sTime between checks
--timeout30sMax time for check to complete
--start-period0sInitialization grace period
--retries3Failures before unhealthy
Container health states:
  • starting - In start-period, checks not counted yet
  • healthy - Last N checks passed
  • unhealthy - Last N checks failed
BASH(5 lines)
Code
Loading syntax highlighter...
Comprehensive health check pattern:
JAVASCRIPT(26 lines)
Code
Loading syntax highlighter...

Metrics with Prometheus

Standard pattern for container metrics:

JAVASCRIPT(39 lines)
Code
Loading syntax highlighter...
Prometheus scrape config:
YAML(6 lines)
Code
Loading syntax highlighter...

Log Aggregation Patterns

Pattern 1: Sidecar container:
YAML(15 lines)
Code
Loading syntax highlighter...
Pattern 2: Logging driver to aggregator:
YAML(15 lines)
Code
Loading syntax highlighter...
Pattern 3: stdout to Loki:
YAML(18 lines)
Code
Loading syntax highlighter...

⚠️ Common Mistakes

Mistake 1: Logging to Files Instead of stdout

DOCKERFILE(8 lines)
Code
Loading syntax highlighter...

Mistake 2: No Log Rotation

YAML(15 lines)
Code
Loading syntax highlighter...

Mistake 3: Shallow Health Checks

DOCKERFILE(8 lines)
Code
Loading syntax highlighter...

🐛 Debug This: The Disappearing Logs

A developer reports: "My container logs show nothing! But I know it's writing logs!"

BASH(9 lines)
Code
Loading syntax highlighter...
Why doesn't docker logs show anything?

✅ Solution:
The application is logging to a file instead of stdout/stderr. Docker only captures stdout/stderr streams.
Fixes:
1. Configure application to log to stdout:
BASH(2 lines)
Code
Loading syntax highlighter...
2. Redirect file to stdout in Dockerfile:
DOCKERFILE(2 lines)
Code
Loading syntax highlighter...
3. Use tail in entrypoint:
DOCKERFILE(2 lines)
Code
Loading syntax highlighter...
4. Change application logging config:
PYTHON(7 lines)
Code
Loading syntax highlighter...
The correct fix is usually option 1 or 4 - configure the application properly. Options 2 and 3 are workarounds.
12-factor principle: Applications should never manage log files. Write to stdout, let the platform handle routing and storage.

💻 Exercises

Exercise 1: Configure Log Rotation

⭐ Difficulty: Easy | ⏱️ Time: 15 minutes

BASH(20 lines)
Code
Loading syntax highlighter...

Exercise 2: Implement Health Check

⭐⭐ Difficulty: Medium | ⏱️ Time: 20 minutes

BASH(57 lines)
Code
Loading syntax highlighter...

Exercise 3: Structured Logging

⭐⭐ Difficulty: Medium | ⏱️ Time: 20 minutes

BASH(47 lines)
Code
Loading syntax highlighter...

Exercise 4: Prometheus Metrics

⭐⭐⭐ Difficulty: Hard | ⏱️ Time: 30 minutes

Create a complete metrics setup:

YAML(22 lines)
Code
Loading syntax highlighter...
YAML(8 lines)
Code
Loading syntax highlighter...

Tasks:

  1. Create a simple app that exposes /metrics endpoint
  2. Include request count and duration metrics
  3. Set up Prometheus to scrape the metrics
  4. Create a simple Grafana dashboard

Exercise 5: Complete Observability Stack

⭐⭐⭐⭐ Difficulty: Expert | ⏱️ Time: 45 minutes

Build a complete observability setup:

  1. Application with structured JSON logging
  2. Health checks (liveness + readiness)
  3. Prometheus metrics
  4. Log aggregation with Loki
  5. Grafana dashboards for both metrics and logs
YAML(7 lines)
Code
Loading syntax highlighter...

🎤 Senior-Level Interview Questions

Q1: Why should containers log to stdout instead of files?

Strong Answer:

"This follows the twelve-factor app methodology and has practical benefits:

1. Separation of concerns:
  • Application produces logs
  • Platform handles routing, storage, rotation
  • App doesn't need to know about log infrastructure
2. Portability:
  • Same container works with any logging backend
  • json-file for dev, fluentd for prod, CloudWatch for AWS
  • No code changes needed
3. No log file management:
  • No rotation logic in app
  • No disk full issues from unrotated logs
  • No volume mounts for log persistence
4. Container lifecycle:
  • Logs available via docker logs immediately
  • Survive container restarts (with json-file driver)
  • Aggregatable across containers
5. Debugging:
  • docker logs just works
  • No need to exec into container
  • Consistent interface across all containers
The exception is when you need file-based logging for specific tools. In that case, symlink the file to /dev/stdout: ln -sf /dev/stdout /var/log/app.log"

Q2: Explain the difference between liveness and readiness health checks.

Strong Answer:

"They serve different purposes in container orchestration:

Liveness:
  • Question: 'Is the process alive and not deadlocked?'
  • Failure action: Restart the container
  • Should be: Fast, minimal dependencies
  • Example: Can the process respond to a simple request?
GET /health/live
Response: 200 OK (process is alive)
Readiness:
  • Question: 'Can this instance serve traffic?'
  • Failure action: Remove from load balancer, don't restart
  • Should check: Dependencies (DB, cache, upstream services)
  • Example: Are all required connections healthy?
GET /health/ready
Response: 200 if DB + cache + upstream OK
         503 if any dependency is down
Why separate them:

Scenario: Database goes down temporarily.

  • Readiness fails: Container removed from LB, no traffic
  • Liveness passes: Container stays running
  • When DB recovers: Readiness passes, traffic resumes
  • No unnecessary container restarts

If we only had liveness including DB check:

  • DB down → liveness fails → container restarts
  • Restarting doesn't fix DB
  • Container keeps crash-looping
  • Worse than just waiting for DB
In Docker:
DOCKERFILE
Code
Loading syntax highlighter...
In Kubernetes:
YAML(17 lines)
Code
Loading syntax highlighter...
  • JSON format for machine parsing
  • Include service name in every log
  • Include trace/correlation ID
2. Distributed tracing:
  • Propagate trace ID across service calls
  • Use OpenTelemetry or similar
  • Enables request flow visualization
3. Centralized aggregation:
Container → Logging driver → Collector → Storage → Query UI
           (fluentd)       (Fluentd)   (ES/Loki) (Kibana/Grafana)
4. Log levels strategically:
  • ERROR: Requires attention
  • WARN: Concerning but handled
  • INFO: Business events
  • DEBUG: Only in dev/troubleshooting
5. Include context:
PYTHON(6 lines)
Code
Loading syntax highlighter...
6. Alerting on patterns:
  • Alert on error rate increase
  • Alert on specific error types
  • Don't alert on every error

The goal is: from any error, I can trace the entire request flow across all services."

Q4: How do you configure Docker logging to prevent disk space issues?

Strong Answer:

"This is a common production issue. The solution involves multiple layers:

1. Configure log rotation (container level):
YAML(7 lines)
Code
Loading syntax highlighter...

Each container: max 50MB logs (5 × 10MB)

2. Set daemon defaults (host level):
JSON(8 lines)
Code
Loading syntax highlighter...

Applies to all containers without explicit config.

3. Monitor disk usage:
BASH(2 lines)
Code
Loading syntax highlighter...
4. Consider alternative drivers:
  • Production: fluentd/syslog to external storage
  • Development: json-file with rotation
5. Regular cleanup:
BASH(5 lines)
Code
Loading syntax highlighter...
6. Log level management:
  • Production: INFO and above
  • Debug logs only when troubleshooting

The key is proactive configuration. Default json-file with no limits will eventually fill any disk."

Q5: How would you implement metrics collection for Docker containers?

Strong Answer:

"I use the Prometheus pull model as the standard approach:

Application metrics:
PYTHON(9 lines)
Code
Loading syntax highlighter...
Container metrics (cAdvisor):
YAML(8 lines)
Code
Loading syntax highlighter...

Exposes CPU, memory, network, disk metrics per container.

Prometheus configuration:
YAML(7 lines)
Code
Loading syntax highlighter...
Key metrics to collect:
  • Request rate, errors, duration (RED)
  • Resource usage (CPU, memory, network)
  • Business metrics (orders, users, etc.)
  • Dependency health (DB connections, queue depth)
Visualization: Grafana dashboards with:
  • Per-service panels
  • Container resource usage
  • Error rates and latencies
  • Alerting thresholds
For Kubernetes: Use kube-state-metrics and node-exporter in addition to cAdvisor."

📝 Summary & Key Takeaways

Logging Best Practices

PracticeImplementation
Log to stdoutApp writes to console, Docker captures
Structured formatJSON for machine parsing
Include contexttrace_id, service, request details
Configure rotationmax-size and max-file options
Centralize logsFluentd, Loki, or cloud service

Health Check Types

TypeChecksOn Failure
LivenessProcess alive?Restart container
ReadinessCan serve traffic?Remove from LB
StartupInitialized?Delay other checks

Observability Stack

Application
├── Logs → stdout → Logging driver → Aggregator
├── Metrics → /metrics → Prometheus → Grafana
└── Health → /health/* → Docker/K8s → Orchestrator

📋 Quick Reference

Logging Commands

BASH(9 lines)
Code
Loading syntax highlighter...

Health Check Dockerfile

DOCKERFILE(3 lines)
Code
Loading syntax highlighter...

Compose Logging

YAML(7 lines)
Code
Loading syntax highlighter...

📅 Review Schedule

DayTaskTime
Day 1Review logging driver options10 min
Day 3Configure log rotation in a project15 min
Day 7Implement health check20 min
Day 14Set up structured logging25 min
Day 30Deploy complete monitoring stack45 min

📚 Series Navigation

PreviousCurrentNext
Part 10: Volumes & StoragePart 11: Logging & ObservabilityPart 12: Container Security
Docker Compendium Series: