Image Anatomy: Layers, Manifests & Registry
docker pull an image and it downloads in layers. But what are layers really? How does Docker know which parts to download? This article dissects Docker images - from the overlay filesystem to multi-architecture manifests to content-addressable storage.📋 At a Glance
| Aspect | Details |
|---|---|
| Topic | Image layers, content-addressable storage, registries, multi-arch |
| Complexity | Advanced |
| Prerequisites | Part 1 (Container Internals), basic Docker usage |
| Key Insight | Images are just tarballs with metadata, layers are deduplicated filesystem diffs |
| Time to Master | 3-4 hours |
🎯 What You'll Learn
- Layer mechanics - how images are built from filesystem diffs
- Content-addressable storage - why layer hashes matter
- Image manifests - the metadata that ties layers together
- Multi-architecture images - how one tag serves ARM and AMD64
- Registry protocol - what happens during push/pull
🔥 Production Story: 50GB of Dangling Layers
A CI server's disk filled up every week. The team blamed "too many builds" and added cron jobs to clean up. But the real problem was worse.
- 847 dangling images consuming 47GB
- Each build pulled base image, built, but never cleaned intermediates
- Layer deduplication wasn't working - same base layers stored multiple times
docker build without --rm and never running docker image prune. But deeper: they didn't understand that every build command creates a layer, and layers persist until explicitly removed.BASH(8 lines)CodeLoading syntax highlighter...
🧠 Mental Model: Images as Layer Stacks
┌─────────────────────────────────────────────────────────────────┐ │ IMAGE STRUCTURE │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Image: nginx:1.25 │ │ │ │ ┌────────────────────────────────────────────────────────────┐ │ │ │ Manifest (JSON) │ │ │ │ - Config blob reference │ │ │ │ - Layer references (in order) │ │ │ │ - Media types │ │ │ └────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌────────────────────────────────────────────────────────────┐ │ │ │ Config (JSON) │ │ │ │ - Environment variables │ │ │ │ - CMD, ENTRYPOINT │ │ │ │ - Exposed ports │ │ │ │ - History (build steps) │ │ │ └────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌────────────────────────────────────────────────────────────┐ │ │ │ LAYER 4: sha256:a1b2... (12MB) ← nginx config │ │ │ │ Changes: /etc/nginx/nginx.conf, /usr/share/nginx/html/ │ │ │ ├────────────────────────────────────────────────────────────┤ │ │ │ LAYER 3: sha256:c3d4... (25MB) ← nginx binary │ │ │ │ Changes: /usr/sbin/nginx, /usr/lib/nginx/ │ │ │ ├────────────────────────────────────────────────────────────┤ │ │ │ LAYER 2: sha256:e5f6... (45MB) ← apt packages │ │ │ │ Changes: /usr/bin/*, /usr/lib/* │ │ │ ├────────────────────────────────────────────────────────────┤ │ │ │ LAYER 1: sha256:7890... (80MB) ← debian:bookworm-slim │ │ │ │ Changes: /bin/*, /lib/*, /etc/* │ │ │ └────────────────────────────────────────────────────────────┘ │ │ │ │ Total size: 162MB (but layers shared with other images!) │ │ │ └─────────────────────────────────────────────────────────────────┘
🔬 Deep Dive
How Layers Work: Union Filesystem
┌─────────────────────────────────────────────────────────────────┐ │ CONTAINER FILESYSTEM VIEW │ │ │ │ What container sees: /bin /etc /usr /var /app ... │ │ │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────┐ Container layer (read-write) │ │ │ UPPER │ - Container's changes go here │ │ │ (rw) │ - Created fresh for each container │ │ └──────────────┘ │ │ │ │ │ │ overlay merge │ │ ▼ │ │ ┌──────────────┐ │ │ │ LAYER 4 │ Image layers (read-only) │ │ │ (ro) │ │ │ ├──────────────┤ │ │ │ LAYER 3 │ │ │ │ (ro) │ │ │ ├──────────────┤ │ │ │ LAYER 2 │ │ │ │ (ro) │ │ │ ├──────────────┤ │ │ │ LAYER 1 │ │ │ │ (ro) │ │ │ └──────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ File resolution (top-down): 1. Check UPPER - if file exists, use it 2. Check LAYER 4 - if file exists, use it 3. Check LAYER 3... 4. Continue down until found Write operations: - New file → write to UPPER - Modify existing → copy to UPPER, modify (copy-on-write) - Delete → create "whiteout" marker in UPPER
BASH(14 lines)CodeLoading syntax highlighter...
Content-Addressable Storage
BASH(13 lines)CodeLoading syntax highlighter...
- Deduplication: Same layer content = same hash = store once
- Integrity: Downloaded data must match expected hash
- Caching: Already have this hash? Don't download again
- Immutability: Can't modify a layer without changing its hash
BASH(5 lines)CodeLoading syntax highlighter...
Image Manifest Structure
The manifest ties everything together:
BASH(2 lines)CodeLoading syntax highlighter...
JSON(21 lines)CodeLoading syntax highlighter...
| Field | Purpose |
|---|---|
config.digest | Points to image config JSON |
layers[].digest | SHA256 of compressed layer tarball |
layers[].size | Size in bytes (for progress display) |
mediaType | Format identifier |
Image Config Deep Dive
BASH(6 lines)CodeLoading syntax highlighter...
JSON(31 lines)CodeLoading syntax highlighter...
Multi-Architecture Images (Manifest Lists)
One tag can serve multiple architectures:
BASH(2 lines)CodeLoading syntax highlighter...
JSON(32 lines)CodeLoading syntax highlighter...
┌─────────────────────────────────────────────────────────────────┐ │ docker pull nginx:1.25 │ └──────────────────────────┬──────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Registry Response │ │ │ │ Manifest List (nginx:1.25) │ │ ├─ amd64/linux → sha256:amd64manifest │ │ ├─ arm64/linux → sha256:arm64manifest │ │ └─ arm/v7/linux → sha256:armv7manifest │ └──────────────────────────┬──────────────────────────────────────┘ │ │ Client selects based on │ local architecture ▼ ┌─────────────────────────────────────────────────────────────────┐ │ On M1 Mac: Pull sha256:arm64manifest │ │ On x86 PC: Pull sha256:amd64manifest │ └─────────────────────────────────────────────────────────────────┘
BASH(11 lines)CodeLoading syntax highlighter...
Registry Protocol (OCI Distribution)
docker pull:┌─────────────────────────────────────────────────────────────────┐ │ docker pull nginx:1.25 │ └──────────────────────────┬──────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Step 1: Resolve tag to manifest │ │ │ │ GET /v2/library/nginx/manifests/1.25 │ │ Accept: application/vnd.docker.distribution.manifest.v2+json │ │ │ │ Response: Manifest JSON with config + layer digests │ └──────────────────────────┬──────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Step 2: Download config blob │ │ │ │ GET /v2/library/nginx/blobs/sha256:configdigest │ │ │ │ Response: Image config JSON │ └──────────────────────────┬──────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Step 3: Download layers (parallel, if not cached) │ │ │ │ For each layer: │ │ - Check if sha256:layerdigest exists locally │ │ - If not: GET /v2/library/nginx/blobs/sha256:layerdigest │ │ - Verify downloaded hash matches │ │ - Decompress and store │ └─────────────────────────────────────────────────────────────────┘
docker push:BASH(5 lines)CodeLoading syntax highlighter...
Analyzing Image Layers
BASH(16 lines)CodeLoading syntax highlighter...
Layer Caching Mechanics
Why does rebuild sometimes use cache, sometimes not?
DOCKERFILE(7 lines)CodeLoading syntax highlighter...
| Instruction | Cache Key |
|---|---|
FROM | Base image digest |
WORKDIR | Previous layer + instruction |
COPY | Previous layer + file content hashes |
RUN | Previous layer + command string |
CMD | Previous layer + command |
If package.json changes: ├─ COPY package.json . → MISS (file changed) ├─ RUN npm install → MISS (parent changed) ├─ COPY . . → MISS (parent changed) └─ All subsequent layers rebuild If only src/index.js changes: ├─ COPY package.json . → HIT ├─ RUN npm install → HIT ├─ COPY . . → MISS (file changed) └─ Only this and following rebuild
Where Images Live on Disk
BASH(16 lines)CodeLoading syntax highlighter...
BASH(12 lines)CodeLoading syntax highlighter...
⚠️ Common Mistakes
Mistake 1: Not Understanding Layer Accumulation
DOCKERFILE(17 lines)CodeLoading syntax highlighter...
Mistake 2: Ignoring Image Provenance
BASH(14 lines)CodeLoading syntax highlighter...
Mistake 3: Using :latest in Production
BASH(11 lines)CodeLoading syntax highlighter...
🐛 Debug This: The Missing Layer Mystery
A developer reports: "I pushed my image, then pulled on another machine, but it's different. Some files are missing!"
BASH(13 lines)CodeLoading syntax highlighter...
The second machine had an old version of a base layer cached:
BASH(10 lines)CodeLoading syntax highlighter...
The build machine used cached layers from a previous build that weren't pushed:
BASH(7 lines)CodeLoading syntax highlighter...
Files were in build stage but not copied to final stage:
DOCKERFILE(9 lines)CodeLoading syntax highlighter...
BASH(10 lines)CodeLoading syntax highlighter...
💻 Exercises
Exercise 1: Dissect an Image
⭐ Difficulty: Easy | ⏱️ Time: 15 minutes
BASH(23 lines)CodeLoading syntax highlighter...
Exercise 2: Measure Layer Deduplication
⭐⭐ Difficulty: Medium | ⏱️ Time: 20 minutes
BASH(19 lines)CodeLoading syntax highlighter...
Exercise 3: Build Multi-Architecture Image
⭐⭐ Difficulty: Medium | ⏱️ Time: 25 minutes
BASH(25 lines)CodeLoading syntax highlighter...
Exercise 4: Registry Protocol Deep Dive
⭐⭐⭐ Difficulty: Hard | ⏱️ Time: 30 minutes
BASH(26 lines)CodeLoading syntax highlighter...
Exercise 5: Analyze Image Efficiency
⭐⭐⭐⭐ Difficulty: Expert | ⏱️ Time: 30 minutes
BASH(40 lines)CodeLoading syntax highlighter...
🎤 Senior-Level Interview Questions
Q1: Explain the difference between image ID, digest, and tag.
"These are three different ways to reference images:
nginx:1.25. It's mutable - pushing a new image with the same tag overwrites the reference. Never use :latest in production because it can change.sha256:abc123.... It's immutable - this exact hash always refers to exactly this image. Format: nginx@sha256:abc123...docker images. Two images with different tags can have the same ID if they're identical.In practice:
- Use tags for development convenience
- Use digests for production deployments (immutability)
- Image IDs are mainly for local identification
The relationship: Tag → Manifest (has digest) → Config (has ID) + Layers"
Q2: How does layer caching work and why does COPY order matter?
"Layer caching is Docker's optimization to avoid rebuilding unchanged layers.
For each instruction, Docker checks if it can reuse a cached layer:
FROM: Cache hit if base image digest matchesRUN: Cache hit if parent layer AND command string matchCOPY/ADD: Cache hit if parent layer AND all source file contents match
This is why COPY order matters:
DOCKERFILE(8 lines)CodeLoading syntax highlighter...
npm install only reruns if package.json changes. The COPY . . for code changes doesn't affect it because it comes after.Same principle applies to any slow step: put dependencies before code, rarely-changing before frequently-changing."
Q3: What happens during docker pull at the network level?
"Docker pull follows the OCI Distribution protocol:
-
Resolve tag to manifest: GET to
/v2/<name>/manifests/<tag>. Registry returns manifest JSON with config and layer digests. -
Check for multi-arch: If it's a manifest list, Docker selects the manifest matching local architecture.
-
Download config: GET to
/v2/<name>/blobs/<config-digest>. This is the image configuration JSON. -
Download layers (parallel):
- For each layer, check local storage for matching digest
- If missing: GET to
/v2/<name>/blobs/<layer-digest> - Response is gzipped tarball
- Verify SHA256 matches expected digest
- Extract to local storage
-
Assemble: Create local image metadata linking config to layers.
Key optimizations:
- Layers download in parallel
- Already-cached layers skip network entirely
- Registries support range requests for resumable downloads
- CDN acceleration for popular images"
Q4: How would you debug an image that's larger than expected?
"My debugging process:
-
Quick size check:BASH(2 lines)CodeLoading syntax highlighter...
-
Layer analysis with dive:BASHCodeLoading syntax highlighter...
This shows exactly which files are in each layer and flags wasted space.
-
Common issues I look for:
- Build artifacts in final image (node_modules dev deps, .git, test files)
- Package manager cache not cleaned
- Multiple RUN statements that could combine
- Missing
.dockerignore - Wrong base image (ubuntu vs alpine vs distroless)
-
Multi-stage check:
- Are we copying only needed artifacts?
- Any unnecessary COPY commands?
-
Concrete fixes:
- Add
.dockerignorefor build context - Combine RUN commands with cleanup
- Use smaller base image
- Multi-stage build for compiled languages
- Remove dev dependencies
- Add
I'd also check if the team has image size targets in CI to prevent regression."
Q5: Explain content-addressable storage and why it matters.
"Content-addressable storage means every blob is identified by the SHA256 hash of its contents. The hash IS the address.
Why this matters:
-
Deduplication: If two images share a layer (same content = same hash), it's stored once. A host running 100 containers might only have 10 unique layers on disk.
-
Integrity: When downloading
sha256:abc123, you hash the received data. If it doesn't match, something went wrong (corruption, MITM). Immutable verification. -
Immutability: You can't modify a layer without changing its hash. This enables safe caching - if you have
sha256:abc123, you know exactly what it contains, forever. -
Efficient distribution: Registries and clients can skip transferring layers they already have.
docker pullis essentially 'sync these hashes'.
nginx:latest is content-addressed under the hood, but the tag can point to different digests over time. Production should use digest references: nginx@sha256:abc..."📝 Summary & Key Takeaways
Core Concepts
| Concept | Key Point |
|---|---|
| Layers | Filesystem diffs stacked with overlay FS |
| Content-addressable | SHA256 hash = identity, enables deduplication |
| Manifest | JSON listing config + layers, ties image together |
| Multi-arch | Manifest list points to arch-specific manifests |
| Registry protocol | Standard HTTP API for push/pull |
The Image Equation
Image = Manifest + Config + Layers Manifest: "Here's what this image contains" → Config digest (how to run) → Layer digests (filesystem content) Config: "Here's how to run this image" → CMD, ENV, EXPOSE, etc. → Build history Layers: "Here's the filesystem" → Ordered tarballs → Each is diff from previous → Stacked by overlay filesystem
What You Can Do Now
- Analyze images: Use
docker historyanddiveto understand composition - Debug size issues: Identify which layers contribute most
- Understand caching: Know why builds are slow or fast
- Use digests: Reference immutable images in production
📋 Quick Reference
Image Inspection Commands
BASH(14 lines)CodeLoading syntax highlighter...
Registry API Endpoints
| Endpoint | Purpose |
|---|---|
GET /v2/_catalog | List repositories |
GET /v2/<name>/tags/list | List tags |
GET /v2/<name>/manifests/<ref> | Get manifest |
GET /v2/<name>/blobs/<digest> | Get layer/config |
HEAD /v2/<name>/blobs/<digest> | Check if blob exists |
Disk Usage Commands
BASH(10 lines)CodeLoading syntax highlighter...
📅 Review Schedule
| Day | Task | Time |
|---|---|---|
| Day 1 | Draw layer stacking diagram from memory | 10 min |
| Day 3 | Do Exercise 1 (dissect an image) | 15 min |
| Day 7 | Explain content-addressable storage to colleague | 5 min |
| Day 14 | Do Exercise 4 (registry protocol) | 30 min |
| Day 30 | Analyze a production image with dive | 20 min |
📚 Series Navigation
| Previous | Current | Next |
|---|---|---|
| Part 1: Container Internals | Part 2: Image Anatomy | Part 3: Build Process |
- Part 0: How to Use This Series
- Part 1: Container Internals
- Part 2: Image Anatomy ← You are here
- Part 3: Build Process Deep Dive
- Part 4: Networking Internals