Devops

Build Process Deep Dive

You run docker build and wait. Sometimes it's fast, sometimes painfully slow. Sometimes cache works, sometimes it doesn't. This article explains exactly what happens during a build - from parsing your Dockerfile to producing the final image - so you can optimize builds with precision.

📋 At a Glance

AspectDetails
TopicBuild process, BuildKit, build context, cache mechanics
ComplexityAdvanced
PrerequisitesPart 2 (Image Anatomy), Dockerfile basics
Key InsightUnderstanding build context and cache is 80% of optimization
Time to Master3-4 hours

🎯 What You'll Learn

  • Build context - what gets sent to Docker daemon and why it matters
  • BuildKit internals - the modern build engine and its advantages
  • Cache mechanics - exactly when cache hits or misses
  • Build stages - how multi-stage builds work internally
  • Parallel building - how BuildKit parallelizes where possible

🔥 Production Story: The 30-Minute Build

A team's CI builds took 30 minutes. They blamed slow runners, added more resources, saw no improvement. The real problem was simpler.

Investigation:
BASH
Code
Loading syntax highlighter...

That line appeared at the start of every build. They were sending 2.4GB of files to Docker daemon before build even started.

Root cause:
  • No .dockerignore file
  • Build directory included:
    • node_modules/ (800MB)
    • .git/ (1.2GB of history)
    • Test fixtures (400MB of images/videos)
  • Context transfer: 2-3 minutes
  • Context parsing: 1-2 minutes
  • Actual build: 5 minutes
The fix:
BASH(7 lines)
Code
Loading syntax highlighter...

Build time: 30 minutes → 6 minutes. The build itself was always 5 minutes - they were wasting 25 minutes on context.


🧠 Mental Model: Build Pipeline

┌─────────────────────────────────────────────────────────────────┐
│                    docker build -t myapp .                      │
└──────────────────────────┬──────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│ STAGE 1: CONTEXT PREPARATION                                    │
│                                                                 │
│  Current directory (.)                                          │
│  ├── Dockerfile          ✓ Always included                      │
│  ├── src/               ✓ Included unless in .dockerignore      │
│  ├── node_modules/      ✗ Excluded by .dockerignore             │
│  ├── .git/              ✗ Excluded by .dockerignore             │
│  └── test/              ✓ Included (maybe shouldn't be?)        │
│                                                                 │
│  → Creates tarball                                              │
│  → Sends to Docker daemon (or BuildKit)                         │
└──────────────────────────┬──────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│ STAGE 2: DOCKERFILE PARSING                                     │
│                                                                 │
│  Parse Dockerfile → Abstract Syntax Tree                        │
│  Validate syntax, resolve ARGs                                  │
│  Plan execution (what depends on what?)                         │
└──────────────────────────┬──────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│ STAGE 3: LAYER EXECUTION                                        │
│                                                                 │
│  For each instruction:                                          │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │ 1. Check cache (can we reuse existing layer?)              │ │
│  │    └─ Cache key = parent layer + instruction + context     │ │
│  │                                                            │ │
│  │ 2. If cache miss:                                          │ │
│  │    ├─ Create temporary container from parent layer         │ │
│  │    ├─ Execute instruction (RUN, COPY, etc.)                │ │
│  │    ├─ Commit container as new layer                        │ │
│  │    └─ Store layer with cache key                           │ │
│  │                                                            │ │
│  │ 3. If cache hit:                                           │ │
│  │    └─ Use existing layer, skip execution                   │ │
│  └────────────────────────────────────────────────────────────┘ │
└──────────────────────────┬──────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│ STAGE 4: IMAGE ASSEMBLY                                         │
│                                                                 │
│  Stack all layers                                               │
│  Create image config (ENV, CMD, etc.)                           │
│  Generate manifest                                              │
│  Tag image (if -t provided)                                     │
└─────────────────────────────────────────────────────────────────┘

🔬 Deep Dive

Build Context: The Foundation

Build context is everything Docker has access to during build. It's critical to understand because:

  1. Large context = slow builds (network transfer)
  2. Wrong context = security risks (secrets included)
  3. Missing context = failed COPY commands
BASH(6 lines)
Code
Loading syntax highlighter...
Visualizing context transfer:
┌─────────────────────────────────────────────────────────────────┐
│                        YOUR MACHINE                             │
│                                                                 │
│  project/                                                       │
│  ├── Dockerfile                                                 │
│  ├── src/                    ┐                                  │
│  │   ├── app.py              │                                  │
│  │   └── utils.py            │                                  │
│  ├── requirements.txt        │ Included in context              │
│  ├── README.md               │ (unless .dockerignore)           │
│  ├── tests/                  │                                  │
│  │   └── test_app.py         ┘                                  │
│  ├── .git/                   ┐                                  │
│  ├── node_modules/           │ Should be in .dockerignore!      │
│  ├── __pycache__/            │                                  │
│  └── .env                    ┘                                  │
│                                                                 │
│  docker build creates tarball ─────────────────────────────────┐│
└──────────────────────────────┬────────────────────────────────┬┘│
                               │                                │ │
                               ▼                                ▼ │
┌─────────────────────────────────────────────────────────────────┐
│                     DOCKER DAEMON                               │
│                                                                 │
│  Receives tarball                                               │
│  Extracts to temporary directory                                │
│  COPY commands reference files from here                        │
└─────────────────────────────────────────────────────────────────┘
The perfect .dockerignore:
BASH(51 lines)
Code
Loading syntax highlighter...
Debugging context issues:
BASH(8 lines)
Code
Loading syntax highlighter...

BuildKit: The Modern Build Engine

BuildKit is Docker's next-generation build engine (default since Docker 23.0).

Enable BuildKit (if not default):
BASH(10 lines)
Code
Loading syntax highlighter...
BuildKit vs Legacy Builder:
FeatureLegacy BuilderBuildKit
Parallel buildsNoYes
Better cachingBasicAdvanced mount caches
Build secretsNo--mount=type=secret
SSH forwardingNo--mount=type=ssh
Output formatsImage onlyImage, tar, OCI, local
Progress outputSequentialParallel with dependencies
BuildKit parallel execution:
DOCKERFILE(18 lines)
Code
Loading syntax highlighter...
BuildKit execution:
┌─────────────────────────────────────────────────────────────────┐
│                    BUILDKIT PARALLEL EXECUTION                  │
│                                                                 │
│  Time ──────────────────────────────────────────────────────►   │
│                                                                 │
│  frontend: [npm install]────[build]─────────┐                   │
│                                              │                  │
│  backend:  [pip install]────[pytest]────────┼──┐                │
│                                              │  │               │
│  final:                                      └──┴─[copy both]   │
│                                                                 │
│  Total time = max(frontend, backend) + final                    │
│  Not: frontend + backend + final                                │
└─────────────────────────────────────────────────────────────────┘

Cache Mechanics: Deep Dive

Cache is the key to fast builds. Let's understand exactly how it works.

Cache key components:
InstructionCache Key Includes
FROMBase image digest
RUNParent layer + command string (exact match)
COPYParent layer + source file contents (hash)
ADDParent layer + source contents + URL content
ARGParent layer + arg name (value doesn't invalidate)
ENVParent layer + key=value
Cache invalidation cascade:
DOCKERFILE(6 lines)
Code
Loading syntax highlighter...
Scenario: package.json changed

Layer 1: HIT  (base image unchanged)
Layer 2: HIT  (WORKDIR unchanged, parent hit)
Layer 3: MISS (file content changed!)
Layer 4: MISS (parent missed) ← npm install runs again
Layer 5: MISS (parent missed)
Layer 6: MISS (parent missed)

Scenario: Only src/app.js changed

Layer 1: HIT
Layer 2: HIT
Layer 3: HIT  (package.json unchanged)
Layer 4: HIT  (parent hit, command unchanged)
Layer 5: MISS (file content in "." changed)
Layer 6: MISS (parent missed)

npm install cached! Only final stages rebuild.
Cache busting techniques:
DOCKERFILE(9 lines)
Code
Loading syntax highlighter...
BuildKit cache mounts (game changer):
DOCKERFILE(20 lines)
Code
Loading syntax highlighter...

Cache mounts persist across builds but don't become part of the image layer.

Build Arguments and Variables

Understanding ARG vs ENV:

DOCKERFILE(15 lines)
Code
Loading syntax highlighter...
ARG scope rules:
DOCKERFILE(10 lines)
Code
Loading syntax highlighter...

Build Secrets (BuildKit)

Never put secrets in build args (they're visible in history):

BASH(6 lines)
Code
Loading syntax highlighter...
DOCKERFILE(10 lines)
Code
Loading syntax highlighter...

Multi-Stage Build Internals

How stages actually work:

DOCKERFILE(12 lines)
Code
Loading syntax highlighter...
What happens internally:
┌─────────────────────────────────────────────────────────────────┐
│                     MULTI-STAGE BUILD                           │
│                                                                 │
│  Stage: builder                                                 │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │ FROM golang:1.21 (800MB)                                   │ │
│  │ WORKDIR /src                                               │ │
│  │ COPY go.* ./                                               │ │
│  │ RUN go mod download                                        │ │
│  │ COPY . .                                                   │ │
│  │ RUN go build -o /app                                       │ │
│  │                                                            │ │
│  │ Final: ~850MB with all Go toolchain                        │ │
│  │ But only /app binary needed!                               │ │
│  └────────────────────────────────────────────────────────────┘ │
│                            │                                    │
│                            │ COPY --from=builder /app           │
│                            ▼                                    │
│  Stage: final                                                   │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │ FROM scratch (0MB)                                         │ │
│  │ COPY --from=builder /app /app                              │ │
│  │                                                            │ │
│  │ Final image: ~10MB (just the binary!)                      │ │
│  └────────────────────────────────────────────────────────────┘ │
│                                                                 │
│  builder stage layers: NOT in final image                       │
│  They're discarded after COPY extracts what's needed            │
└─────────────────────────────────────────────────────────────────┘
Targeting specific stages:
BASH(8 lines)
Code
Loading syntax highlighter...
Copying from external images:
DOCKERFILE(3 lines)
Code
Loading syntax highlighter...

Build Output Formats (BuildKit)

BASH(14 lines)
Code
Loading syntax highlighter...

Build Performance Analysis

BASH(31 lines)
Code
Loading syntax highlighter...

⚠️ Common Mistakes

Mistake 1: Ignoring Build Context Size

BASH(7 lines)
Code
Loading syntax highlighter...
Quick check:
BASH(2 lines)
Code
Loading syntax highlighter...

Mistake 2: Cache-Busting ORDER Instructions

DOCKERFILE(8 lines)
Code
Loading syntax highlighter...

Mistake 3: Not Using BuildKit Cache Mounts

DOCKERFILE(6 lines)
Code
Loading syntax highlighter...

🐛 Debug This: The Phantom Cache Miss

A developer reports: "My build should use cache but it keeps rebuilding everything. The Dockerfile hasn't changed!"

DOCKERFILE(6 lines)
Code
Loading syntax highlighter...
BASH(12 lines)
Code
Loading syntax highlighter...
Why does COPY package.json miss cache when only index.js changed?

✅ Solution:

There are several possible causes:

Cause 1: File permissions/timestamps changed
BASH(7 lines)
Code
Loading syntax highlighter...
Cause 2: Different build context
BASH(4 lines)
Code
Loading syntax highlighter...
Cause 3: .dockerignore not excluding generated files
BASH(3 lines)
Code
Loading syntax highlighter...
Cause 4: Build args changing
DOCKERFILE(6 lines)
Code
Loading syntax highlighter...
Cause 5: Multi-stage target confusion
BASH(3 lines)
Code
Loading syntax highlighter...
Debug steps:
BASH(11 lines)
Code
Loading syntax highlighter...

💻 Exercises

Exercise 1: Measure Build Context Impact

⭐ Difficulty: Easy | ⏱️ Time: 15 minutes

BASH(27 lines)
Code
Loading syntax highlighter...

Exercise 2: Optimize a Slow Dockerfile

⭐⭐ Difficulty: Medium | ⏱️ Time: 25 minutes

DOCKERFILE(15 lines)
Code
Loading syntax highlighter...
BASH(14 lines)
Code
Loading syntax highlighter...

Exercise 3: Use BuildKit Cache Mounts

⭐⭐ Difficulty: Medium | ⏱️ Time: 20 minutes

BASH(35 lines)
Code
Loading syntax highlighter...

Exercise 4: Multi-Stage Build Analysis

⭐⭐⭐ Difficulty: Hard | ⏱️ Time: 30 minutes

BASH(61 lines)
Code
Loading syntax highlighter...

Exercise 5: Build Secrets Deep Dive

⭐⭐⭐⭐ Difficulty: Expert | ⏱️ Time: 30 minutes

BASH(48 lines)
Code
Loading syntax highlighter...

🎤 Senior-Level Interview Questions

Q1: Explain how Docker build cache works and how to optimize for it.

Strong Answer:

"Docker's build cache stores layers keyed by parent layer hash plus instruction details. For RUN, the key is the command string. For COPY/ADD, it's the file content hashes.

Cache invalidation cascades: if layer N misses, all subsequent layers must rebuild. This is why instruction ordering is crucial:

  1. Base image first - rarely changes
  2. System dependencies - apt-get, apk add
  3. Application dependencies - package.json/requirements.txt, then install
  4. Application code - most frequently changing, last

For example:

DOCKERFILE(3 lines)
Code
Loading syntax highlighter...

If only source code changes, npm install stays cached.

With BuildKit, I also use cache mounts:

DOCKERFILE
Code
Loading syntax highlighter...

This persists the npm cache across builds without including it in layers. Even if the layer rebuilds, cached packages remain.

Other optimizations: proper .dockerignore to avoid context changes, avoiding build args that change frequently, and pinning base image versions."

Q2: What is build context and why does it matter?

Strong Answer:
"Build context is the set of files Docker can access during build. When you run docker build ., the entire directory (minus .dockerignore entries) is tarred and sent to the daemon.

Why it matters:

  1. Performance: Large context = slow builds. I've seen 2GB contexts that took 5 minutes just to transfer. A proper .dockerignore got it to 15MB.
  2. Security: Context might include secrets, .env files, or credentials. Without .dockerignore, these get sent to daemon and potentially into layers via COPY.
  3. Reproducibility: Different contexts can produce different images even with the same Dockerfile.

Best practices:

  • Always have a .dockerignore
  • Exclude: .git, node_modules, build outputs, test files, IDE configs
  • Use specific COPY statements, not COPY . . when possible
  • Consider using docker build -f Dockerfile path/to/context to control context location
Debug with: tar -cvf - --exclude-from=.dockerignore . | wc -c"

Q3: How does BuildKit differ from the legacy builder?

Strong Answer:

"BuildKit is Docker's next-gen builder with major improvements:

Parallel execution: BuildKit builds independent stages concurrently. If you have frontend and backend stages, they build in parallel, merging at the final stage.
Better caching:
  • Cache mounts persist across builds: --mount=type=cache
  • External cache import/export for CI: --cache-from, --cache-to
  • More intelligent cache invalidation
Secrets handling:
  • --mount=type=secret exposes secrets during build only
  • Never written to layers
  • --mount=type=ssh for SSH agent forwarding
Frontend extensibility: BuildKit uses frontends parsed from # syntax= directive. Can support non-Dockerfile syntaxes.
Output flexibility: Can output to local directory, tar, OCI format, or push directly to registry.
Progress display: Shows parallel operations and dependencies clearly.

Practical impact: CI builds that took 10 minutes can drop to 3 minutes with proper parallelization and cache mounts."

Q4: How would you debug a build that's not caching as expected?

Strong Answer:

"My debugging process:

  1. Enable verbose output:
    BASH
    Code
    Loading syntax highlighter...

    This shows exactly which steps are cached and why.

  2. Check for invisible changes:
    • File permissions (git checkout can change these)
    • Timestamps (some builds are timestamp-sensitive)
    • Generated files not in .dockerignore
  3. Verify context:
    BASH
    Code
    Loading syntax highlighter...

    Make sure only expected files are included.

  4. Check build args: ARGs that change between builds invalidate cache from that point.
  5. Compare layer hashes:
    BASH
    Code
    Loading syntax highlighter...
  6. Test incrementally: Build with --target to isolate which stage has the problem.

Common culprits:

  • Missing .dockerignore
  • COPY before dependency install
  • Build args with timestamps/hashes
  • Different Docker versions having different cache behavior"

Q5: Explain multi-stage builds and when you'd use them.

Strong Answer:

"Multi-stage builds let you use multiple FROM statements, with each creating an isolated stage. Only the final stage becomes the output image, but you can COPY artifacts from previous stages.

Key use cases:

Compiled languages: Build with full toolchain, copy only binary to minimal runtime:
DOCKERFILE(5 lines)
Code
Loading syntax highlighter...

Reduces 800MB Go image to 10MB binary.

Separation of concerns:
DOCKERFILE(9 lines)
Code
Loading syntax highlighter...
Build-time secrets: Put secret-requiring steps in non-final stage.
Conditional builds: Use --target to build different outputs from same Dockerfile.

What doesn't transfer between stages:

  • Environment variables
  • Build arguments (must redeclare)
  • Layer cache state
  • Everything except explicitly COPYed files
Stages can be named (AS name) or numbered (0, 1, 2). Named is clearer. You can also COPY from external images: COPY --from=nginx:alpine /etc/nginx/..."

📝 Summary & Key Takeaways

Core Concepts

ConceptKey Point
Build contextEverything Docker can access - use .dockerignore
Cache mechanicsParent hash + instruction = cache key, cascade on miss
BuildKitParallel builds, cache mounts, secrets, better output
Multi-stageSeparate build from runtime, copy only what's needed
Layer orderingPut rarely-changing before frequently-changing

The Build Equation

Build Time = Context Transfer + Cache Misses × Build Time + Final Assembly

Optimize by:
1. Minimize context (.dockerignore)
2. Maximize cache hits (layer ordering)
3. Parallelize independent stages (multi-stage)
4. Use cache mounts (BuildKit)

What You Can Do Now

  1. Audit your .dockerignore: Is everything excluded that should be?
  2. Reorder your Dockerfile: Dependencies before code
  3. Add cache mounts: Use BuildKit --mount=type=cache
  4. Profile builds: Use --progress=plain to find slow steps

📋 Quick Reference

Build Commands

BASH(23 lines)
Code
Loading syntax highlighter...

.dockerignore Essentials

BASH(26 lines)
Code
Loading syntax highlighter...

BuildKit Features

DOCKERFILE(11 lines)
Code
Loading syntax highlighter...

📅 Review Schedule

DayTaskTime
Day 1Review cache invalidation rules10 min
Day 3Add/improve .dockerignore in a project15 min
Day 7Do Exercise 2 (optimize slow Dockerfile)25 min
Day 14Implement BuildKit cache mounts20 min
Day 30Profile a real project's build, optimize30 min

📚 Series Navigation

PreviousCurrentNext
Part 2: Image AnatomyPart 3: Build ProcessPart 4: Networking Internals
Docker Compendium Series: