Build Process Deep Dive
docker build and wait. Sometimes it's fast, sometimes painfully slow. Sometimes cache works, sometimes it doesn't. This article explains exactly what happens during a build - from parsing your Dockerfile to producing the final image - so you can optimize builds with precision.📋 At a Glance
| Aspect | Details |
|---|---|
| Topic | Build process, BuildKit, build context, cache mechanics |
| Complexity | Advanced |
| Prerequisites | Part 2 (Image Anatomy), Dockerfile basics |
| Key Insight | Understanding build context and cache is 80% of optimization |
| Time to Master | 3-4 hours |
🎯 What You'll Learn
- Build context - what gets sent to Docker daemon and why it matters
- BuildKit internals - the modern build engine and its advantages
- Cache mechanics - exactly when cache hits or misses
- Build stages - how multi-stage builds work internally
- Parallel building - how BuildKit parallelizes where possible
🔥 Production Story: The 30-Minute Build
A team's CI builds took 30 minutes. They blamed slow runners, added more resources, saw no improvement. The real problem was simpler.
BASHCodeLoading syntax highlighter...
That line appeared at the start of every build. They were sending 2.4GB of files to Docker daemon before build even started.
- No
.dockerignorefile - Build directory included:
node_modules/(800MB).git/(1.2GB of history)- Test fixtures (400MB of images/videos)
- Context transfer: 2-3 minutes
- Context parsing: 1-2 minutes
- Actual build: 5 minutes
BASH(7 lines)CodeLoading syntax highlighter...
Build time: 30 minutes → 6 minutes. The build itself was always 5 minutes - they were wasting 25 minutes on context.
🧠 Mental Model: Build Pipeline
┌─────────────────────────────────────────────────────────────────┐ │ docker build -t myapp . │ └──────────────────────────┬──────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ STAGE 1: CONTEXT PREPARATION │ │ │ │ Current directory (.) │ │ ├── Dockerfile ✓ Always included │ │ ├── src/ ✓ Included unless in .dockerignore │ │ ├── node_modules/ ✗ Excluded by .dockerignore │ │ ├── .git/ ✗ Excluded by .dockerignore │ │ └── test/ ✓ Included (maybe shouldn't be?) │ │ │ │ → Creates tarball │ │ → Sends to Docker daemon (or BuildKit) │ └──────────────────────────┬──────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ STAGE 2: DOCKERFILE PARSING │ │ │ │ Parse Dockerfile → Abstract Syntax Tree │ │ Validate syntax, resolve ARGs │ │ Plan execution (what depends on what?) │ └──────────────────────────┬──────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ STAGE 3: LAYER EXECUTION │ │ │ │ For each instruction: │ │ ┌────────────────────────────────────────────────────────────┐ │ │ │ 1. Check cache (can we reuse existing layer?) │ │ │ │ └─ Cache key = parent layer + instruction + context │ │ │ │ │ │ │ │ 2. If cache miss: │ │ │ │ ├─ Create temporary container from parent layer │ │ │ │ ├─ Execute instruction (RUN, COPY, etc.) │ │ │ │ ├─ Commit container as new layer │ │ │ │ └─ Store layer with cache key │ │ │ │ │ │ │ │ 3. If cache hit: │ │ │ │ └─ Use existing layer, skip execution │ │ │ └────────────────────────────────────────────────────────────┘ │ └──────────────────────────┬──────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ STAGE 4: IMAGE ASSEMBLY │ │ │ │ Stack all layers │ │ Create image config (ENV, CMD, etc.) │ │ Generate manifest │ │ Tag image (if -t provided) │ └─────────────────────────────────────────────────────────────────┘
🔬 Deep Dive
Build Context: The Foundation
Build context is everything Docker has access to during build. It's critical to understand because:
- Large context = slow builds (network transfer)
- Wrong context = security risks (secrets included)
- Missing context = failed COPY commands
BASH(6 lines)CodeLoading syntax highlighter...
┌─────────────────────────────────────────────────────────────────┐ │ YOUR MACHINE │ │ │ │ project/ │ │ ├── Dockerfile │ │ ├── src/ ┐ │ │ │ ├── app.py │ │ │ │ └── utils.py │ │ │ ├── requirements.txt │ Included in context │ │ ├── README.md │ (unless .dockerignore) │ │ ├── tests/ │ │ │ │ └── test_app.py ┘ │ │ ├── .git/ ┐ │ │ ├── node_modules/ │ Should be in .dockerignore! │ │ ├── __pycache__/ │ │ │ └── .env ┘ │ │ │ │ docker build creates tarball ─────────────────────────────────┐│ └──────────────────────────────┬────────────────────────────────┬┘│ │ │ │ ▼ ▼ │ ┌─────────────────────────────────────────────────────────────────┐ │ DOCKER DAEMON │ │ │ │ Receives tarball │ │ Extracts to temporary directory │ │ COPY commands reference files from here │ └─────────────────────────────────────────────────────────────────┘
BASH(51 lines)CodeLoading syntax highlighter...
BASH(8 lines)CodeLoading syntax highlighter...
BuildKit: The Modern Build Engine
BuildKit is Docker's next-generation build engine (default since Docker 23.0).
BASH(10 lines)CodeLoading syntax highlighter...
| Feature | Legacy Builder | BuildKit |
|---|---|---|
| Parallel builds | No | Yes |
| Better caching | Basic | Advanced mount caches |
| Build secrets | No | --mount=type=secret |
| SSH forwarding | No | --mount=type=ssh |
| Output formats | Image only | Image, tar, OCI, local |
| Progress output | Sequential | Parallel with dependencies |
DOCKERFILE(18 lines)CodeLoading syntax highlighter...
┌─────────────────────────────────────────────────────────────────┐ │ BUILDKIT PARALLEL EXECUTION │ │ │ │ Time ──────────────────────────────────────────────────────► │ │ │ │ frontend: [npm install]────[build]─────────┐ │ │ │ │ │ backend: [pip install]────[pytest]────────┼──┐ │ │ │ │ │ │ final: └──┴─[copy both] │ │ │ │ Total time = max(frontend, backend) + final │ │ Not: frontend + backend + final │ └─────────────────────────────────────────────────────────────────┘
Cache Mechanics: Deep Dive
Cache is the key to fast builds. Let's understand exactly how it works.
| Instruction | Cache Key Includes |
|---|---|
FROM | Base image digest |
RUN | Parent layer + command string (exact match) |
COPY | Parent layer + source file contents (hash) |
ADD | Parent layer + source contents + URL content |
ARG | Parent layer + arg name (value doesn't invalidate) |
ENV | Parent layer + key=value |
DOCKERFILE(6 lines)CodeLoading syntax highlighter...
Scenario: package.json changed Layer 1: HIT (base image unchanged) Layer 2: HIT (WORKDIR unchanged, parent hit) Layer 3: MISS (file content changed!) Layer 4: MISS (parent missed) ← npm install runs again Layer 5: MISS (parent missed) Layer 6: MISS (parent missed) Scenario: Only src/app.js changed Layer 1: HIT Layer 2: HIT Layer 3: HIT (package.json unchanged) Layer 4: HIT (parent hit, command unchanged) Layer 5: MISS (file content in "." changed) Layer 6: MISS (parent missed) npm install cached! Only final stages rebuild.
DOCKERFILE(9 lines)CodeLoading syntax highlighter...
DOCKERFILE(20 lines)CodeLoading syntax highlighter...
Cache mounts persist across builds but don't become part of the image layer.
Build Arguments and Variables
Understanding ARG vs ENV:
DOCKERFILE(15 lines)CodeLoading syntax highlighter...
DOCKERFILE(10 lines)CodeLoading syntax highlighter...
Build Secrets (BuildKit)
Never put secrets in build args (they're visible in history):
BASH(6 lines)CodeLoading syntax highlighter...
DOCKERFILE(10 lines)CodeLoading syntax highlighter...
Multi-Stage Build Internals
How stages actually work:
DOCKERFILE(12 lines)CodeLoading syntax highlighter...
┌─────────────────────────────────────────────────────────────────┐ │ MULTI-STAGE BUILD │ │ │ │ Stage: builder │ │ ┌────────────────────────────────────────────────────────────┐ │ │ │ FROM golang:1.21 (800MB) │ │ │ │ WORKDIR /src │ │ │ │ COPY go.* ./ │ │ │ │ RUN go mod download │ │ │ │ COPY . . │ │ │ │ RUN go build -o /app │ │ │ │ │ │ │ │ Final: ~850MB with all Go toolchain │ │ │ │ But only /app binary needed! │ │ │ └────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ COPY --from=builder /app │ │ ▼ │ │ Stage: final │ │ ┌────────────────────────────────────────────────────────────┐ │ │ │ FROM scratch (0MB) │ │ │ │ COPY --from=builder /app /app │ │ │ │ │ │ │ │ Final image: ~10MB (just the binary!) │ │ │ └────────────────────────────────────────────────────────────┘ │ │ │ │ builder stage layers: NOT in final image │ │ They're discarded after COPY extracts what's needed │ └─────────────────────────────────────────────────────────────────┘
BASH(8 lines)CodeLoading syntax highlighter...
DOCKERFILE(3 lines)CodeLoading syntax highlighter...
Build Output Formats (BuildKit)
BASH(14 lines)CodeLoading syntax highlighter...
Build Performance Analysis
BASH(31 lines)CodeLoading syntax highlighter...
⚠️ Common Mistakes
Mistake 1: Ignoring Build Context Size
BASH(7 lines)CodeLoading syntax highlighter...
BASH(2 lines)CodeLoading syntax highlighter...
Mistake 2: Cache-Busting ORDER Instructions
DOCKERFILE(8 lines)CodeLoading syntax highlighter...
Mistake 3: Not Using BuildKit Cache Mounts
DOCKERFILE(6 lines)CodeLoading syntax highlighter...
🐛 Debug This: The Phantom Cache Miss
A developer reports: "My build should use cache but it keeps rebuilding everything. The Dockerfile hasn't changed!"
DOCKERFILE(6 lines)CodeLoading syntax highlighter...
BASH(12 lines)CodeLoading syntax highlighter...
There are several possible causes:
BASH(7 lines)CodeLoading syntax highlighter...
BASH(4 lines)CodeLoading syntax highlighter...
BASH(3 lines)CodeLoading syntax highlighter...
DOCKERFILE(6 lines)CodeLoading syntax highlighter...
BASH(3 lines)CodeLoading syntax highlighter...
BASH(11 lines)CodeLoading syntax highlighter...
💻 Exercises
Exercise 1: Measure Build Context Impact
⭐ Difficulty: Easy | ⏱️ Time: 15 minutes
BASH(27 lines)CodeLoading syntax highlighter...
Exercise 2: Optimize a Slow Dockerfile
⭐⭐ Difficulty: Medium | ⏱️ Time: 25 minutes
DOCKERFILE(15 lines)CodeLoading syntax highlighter...
BASH(14 lines)CodeLoading syntax highlighter...
Exercise 3: Use BuildKit Cache Mounts
⭐⭐ Difficulty: Medium | ⏱️ Time: 20 minutes
BASH(35 lines)CodeLoading syntax highlighter...
Exercise 4: Multi-Stage Build Analysis
⭐⭐⭐ Difficulty: Hard | ⏱️ Time: 30 minutes
BASH(61 lines)CodeLoading syntax highlighter...
Exercise 5: Build Secrets Deep Dive
⭐⭐⭐⭐ Difficulty: Expert | ⏱️ Time: 30 minutes
BASH(48 lines)CodeLoading syntax highlighter...
🎤 Senior-Level Interview Questions
Q1: Explain how Docker build cache works and how to optimize for it.
"Docker's build cache stores layers keyed by parent layer hash plus instruction details. For RUN, the key is the command string. For COPY/ADD, it's the file content hashes.
Cache invalidation cascades: if layer N misses, all subsequent layers must rebuild. This is why instruction ordering is crucial:
- Base image first - rarely changes
- System dependencies - apt-get, apk add
- Application dependencies - package.json/requirements.txt, then install
- Application code - most frequently changing, last
For example:
DOCKERFILE(3 lines)CodeLoading syntax highlighter...
If only source code changes, npm install stays cached.
With BuildKit, I also use cache mounts:
DOCKERFILECodeLoading syntax highlighter...
This persists the npm cache across builds without including it in layers. Even if the layer rebuilds, cached packages remain.
Other optimizations: proper .dockerignore to avoid context changes, avoiding build args that change frequently, and pinning base image versions."
Q2: What is build context and why does it matter?
docker build ., the entire directory (minus .dockerignore entries) is tarred and sent to the daemon.Why it matters:
-
Performance: Large context = slow builds. I've seen 2GB contexts that took 5 minutes just to transfer. A proper .dockerignore got it to 15MB.
-
Security: Context might include secrets, .env files, or credentials. Without .dockerignore, these get sent to daemon and potentially into layers via COPY.
-
Reproducibility: Different contexts can produce different images even with the same Dockerfile.
Best practices:
- Always have a .dockerignore
- Exclude: .git, node_modules, build outputs, test files, IDE configs
- Use specific COPY statements, not
COPY . .when possible - Consider using
docker build -f Dockerfile path/to/contextto control context location
tar -cvf - --exclude-from=.dockerignore . | wc -c"Q3: How does BuildKit differ from the legacy builder?
"BuildKit is Docker's next-gen builder with major improvements:
- Cache mounts persist across builds:
--mount=type=cache - External cache import/export for CI:
--cache-from,--cache-to - More intelligent cache invalidation
--mount=type=secretexposes secrets during build only- Never written to layers
--mount=type=sshfor SSH agent forwarding
# syntax= directive. Can support non-Dockerfile syntaxes.Practical impact: CI builds that took 10 minutes can drop to 3 minutes with proper parallelization and cache mounts."
Q4: How would you debug a build that's not caching as expected?
"My debugging process:
-
Enable verbose output:BASHCodeLoading syntax highlighter...
This shows exactly which steps are cached and why.
-
Check for invisible changes:
- File permissions (git checkout can change these)
- Timestamps (some builds are timestamp-sensitive)
- Generated files not in .dockerignore
-
Verify context:BASHCodeLoading syntax highlighter...
Make sure only expected files are included.
-
Check build args: ARGs that change between builds invalidate cache from that point.
-
Compare layer hashes:BASHCodeLoading syntax highlighter...
-
Test incrementally: Build with
--targetto isolate which stage has the problem.
Common culprits:
- Missing .dockerignore
- COPY before dependency install
- Build args with timestamps/hashes
- Different Docker versions having different cache behavior"
Q5: Explain multi-stage builds and when you'd use them.
"Multi-stage builds let you use multiple FROM statements, with each creating an isolated stage. Only the final stage becomes the output image, but you can COPY artifacts from previous stages.
Key use cases:
DOCKERFILE(5 lines)CodeLoading syntax highlighter...
Reduces 800MB Go image to 10MB binary.
DOCKERFILE(9 lines)CodeLoading syntax highlighter...
--target to build different outputs from same Dockerfile.What doesn't transfer between stages:
- Environment variables
- Build arguments (must redeclare)
- Layer cache state
- Everything except explicitly COPYed files
COPY --from=nginx:alpine /etc/nginx/..."📝 Summary & Key Takeaways
Core Concepts
| Concept | Key Point |
|---|---|
| Build context | Everything Docker can access - use .dockerignore |
| Cache mechanics | Parent hash + instruction = cache key, cascade on miss |
| BuildKit | Parallel builds, cache mounts, secrets, better output |
| Multi-stage | Separate build from runtime, copy only what's needed |
| Layer ordering | Put rarely-changing before frequently-changing |
The Build Equation
Build Time = Context Transfer + Cache Misses × Build Time + Final Assembly Optimize by: 1. Minimize context (.dockerignore) 2. Maximize cache hits (layer ordering) 3. Parallelize independent stages (multi-stage) 4. Use cache mounts (BuildKit)
What You Can Do Now
- Audit your .dockerignore: Is everything excluded that should be?
- Reorder your Dockerfile: Dependencies before code
- Add cache mounts: Use BuildKit --mount=type=cache
- Profile builds: Use --progress=plain to find slow steps
📋 Quick Reference
Build Commands
BASH(23 lines)CodeLoading syntax highlighter...
.dockerignore Essentials
BASH(26 lines)CodeLoading syntax highlighter...
BuildKit Features
DOCKERFILE(11 lines)CodeLoading syntax highlighter...
📅 Review Schedule
| Day | Task | Time |
|---|---|---|
| Day 1 | Review cache invalidation rules | 10 min |
| Day 3 | Add/improve .dockerignore in a project | 15 min |
| Day 7 | Do Exercise 2 (optimize slow Dockerfile) | 25 min |
| Day 14 | Implement BuildKit cache mounts | 20 min |
| Day 30 | Profile a real project's build, optimize | 30 min |
📚 Series Navigation
| Previous | Current | Next |
|---|---|---|
| Part 2: Image Anatomy | Part 3: Build Process | Part 4: Networking Internals |
- Part 0: How to Use This Series
- Part 1: Container Internals
- Part 2: Image Anatomy
- Part 3: Build Process Deep Dive ← You are here
- Part 4: Networking Internals