Devops

Schema Registry & Evolution


At a Glance

AspectDetails
TopicSchema Registry, Avro/Protobuf/JSON Schema, compatibility modes
ComplexityIntermediate
PrerequisitesParts 5-7 (Producers), Parts 8-11 (Consumers)
Time90 minutes
Spring KafkaSchema Registry serializers, @KafkaListener with Avro

What You'll Learn

After completing this article, you will be able to:

  1. Understand why schemas matter at scale
  2. Choose between Avro, Protobuf, and JSON Schema
  3. Configure compatibility modes for safe evolution
  4. Implement Spring Kafka with Schema Registry
  5. Handle schema evolution without breaking consumers

Production Story: The Breaking Change That Broke Everything

The Incident

Our order service team was releasing a "simple" change: renaming a field from customer_id to customerId for consistency. They deployed the producer update on Tuesday afternoon. Within minutes, all downstream consumers started failing with deserialization errors.

The Investigation

JAVA(17 lines)
Code
Loading syntax highlighter...

The impact was immediate:

┌─────────────────────────────────────────────────────────────────────┐
│                    THE BREAKING SCHEMA CHANGE                       │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  14:00 - Producer v2 deployed (customer_id → customerId)            │
│                                                                     │
│  Topic: orders                                                      │
│  ┌────────┬────────┬────────┬────────┬────────┬────────┬────────┐   │
│  │ v1     │ v1     │ v1     │ v2     │ v2     │ v2     │ v2     │   │
│  │ msg    │ msg    │ msg    │ msg    │ msg    │ msg    │ msg    │   │
│  └────────┴────────┴────────┴────────┴────────┴────────┴────────┘   │
│                              ▲                                      │
│                              │ Schema change here                   │
│                              │                                      │
│  14:01 - Consumer A (v1 schema)                                     │
│          ├── Reads v1 messages: OK                                  │
│          └── Reads v2 messages: FAIL! Missing field!                │
│                                                                     │
│  14:02 - Consumer B (v1 schema)                                     │
│          └── Same failure                                           │
│                                                                     │
│  14:03 - Consumer C (v1 schema)                                     │
│          └── Same failure                                           │
│                                                                     │
│  14:05 - All order processing stopped                               │
│          - 5 consumer groups affected                               │
│          - 15 services down                                         │
│          - Orders piling up                                         │
│                                                                     │
│  14:30 - Rollback producer to v1                                    │
│          - But v2 messages still in topic!                          │
│          - Consumers still failing on v2 messages                   │
│                                                                     │
│  15:00 - Manual intervention to skip bad messages                   │
│          - Lost 500 orders in the gap                               │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

The Root Cause

No schema registry, no compatibility checks:

  1. Producer changed schema without compatibility validation
  2. Consumer had no way to handle old vs new format
  3. No schema versioning - just "whatever JSON we send"

The Fix

JAVA(45 lines)
Code
Loading syntax highlighter...

After the fix: Zero breaking changes, safe schema evolution.


Mental Model: Schema Registry Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                    SCHEMA REGISTRY ARCHITECTURE                         │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ┌───────────────────┐                    ┌───────────────────┐         │
│  │    Producer       │                    │    Consumer       │         │
│  │                   │                    │                   │         │
│  │  OrderEvent obj   │                    │  Deserialize      │         │
│  │       │           │                    │       ▲           │         │
│  │       ▼           │                    │       │           │         │
│  │  KafkaAvro        │                    │  KafkaAvro        │         │
│  │  Serializer       │                    │  Deserializer     │         │
│  └────────┬──────────┘                    └───────┬───────────┘         │
│           │                                       │                     │
│           │ 1. Get/Register Schema                │ 4. Get Schema       │
│           │                                       │    by ID            │
│           ▼                                       ▼                     │
│  ┌──────────────────────────────────────────────────────────────┐       │
│  │                     SCHEMA REGISTRY                          │       │
│  │                                                              │       │
│  │  Subjects:                                                   │       │
│  │  ┌─────────────────────────────────────────────────────────┐ │       │
│  │  │ orders-value                                            │ │       │
│  │  │   ├── Version 1: Schema ID 1 (original)                 │ │       │
│  │  │   ├── Version 2: Schema ID 5 (added field)              │ │       │
│  │  │   └── Version 3: Schema ID 12 (current)                 │ │       │
│  │  ├─────────────────────────────────────────────────────────┤ │       │
│  │  │ orders-key                                              │ │       │
│  │  │   └── Version 1: Schema ID 2 (string key)               │ │       │
│  │  ├─────────────────────────────────────────────────────────┤ │       │
│  │  │ users-value                                             │ │       │
│  │  │   └── Version 1: Schema ID 3                            │ │       │
│  │  └─────────────────────────────────────────────────────────┘ │       │
│  │                                                              │       │
│  │  Compatibility Settings:                                     │       │
│  │  orders-value: BACKWARD                                      │       │
│  │  users-value: FULL                                           │       │
│  │                                                              │       │
│  └──────────────────────────────────────────────────────────────┘       │
│           │                                       │                     │
│           │ 2. Schema ID                          │                     │
│           ▼                                       │                     │
│  ┌──────────────────────────────────────────────────────────────┐       │
│  │                        KAFKA                                 │       │
│  │                                                              │       │
│  │  Message Format:                                             │       │
│  │  ┌──────┬──────────┬────────────────────────────────────┐    │       │
│  │  │ 0x0  │ Schema   │           Avro Payload             │    │       │
│  │  │      │ ID (4B)  │                                    │    │       │
│  │  └──────┴──────────┴────────────────────────────────────┘    │       │
│  │   Magic  Schema ID  Binary Avro data (no schema embedded)    │       │
│  │   byte   (int)                                               │       │
│  │                                                              │       │
│  └──────────────────────────────────────────────────────────────┘       │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Message on wire (example):
┌──────┬──────────┬──────────────────────────────────────┐
│ 0x00 │ 00000005 │ <binary avro: {customerId:"C123"...}>│
└──────┴──────────┴──────────────────────────────────────┘
       Schema ID=5

Consumer flow:
1. Read message from Kafka
2. Extract Schema ID (bytes 1-4)
3. Fetch schema from Registry (cached locally)
4. Deserialize payload using schema
5. Return typed object

Deep Dive

1. Schema Formats Comparison

┌─────────────────┬──────────────────┬──────────────────┬──────────────────┐
│                 │      Avro        │    Protobuf      │   JSON Schema    │
├─────────────────┼──────────────────┼──────────────────┼──────────────────┤
│ Format          │ Binary           │ Binary           │ Text (JSON)      │
│ Size            │ Compact          │ Very compact     │ Large            │
│ Speed           │ Fast             │ Fastest          │ Slow             │
├─────────────────┼──────────────────┼──────────────────┼──────────────────┤
│ Schema in msg   │ No (ID only)     │ No (ID only)     │ No (ID only)     │
│ Schema language │ JSON             │ Proto files      │ JSON             │
│ Code gen        │ Optional         │ Required         │ Optional         │
├─────────────────┼──────────────────┼──────────────────┼──────────────────┤
│ Evolution       │ Excellent        │ Good             │ Good             │
│ Default values  │ Yes              │ Yes              │ Yes              │
│ Aliases         │ Yes              │ No               │ No               │
├─────────────────┼──────────────────┼──────────────────┼──────────────────┤
│ Ecosystem       │ Hadoop/Kafka     │ gRPC/Google      │ Web APIs         │
│ Learning curve  │ Medium           │ Medium           │ Low              │
├─────────────────┼──────────────────┼──────────────────┼──────────────────┤
│ Best for        │ Kafka events     │ High-perf RPC    │ Existing JSON    │
│                 │ Data pipelines   │ Cross-language   │ Web services     │
└─────────────────┴──────────────────┴──────────────────┴──────────────────┘

Avro Schema Example

JSON(62 lines)
Code
Loading syntax highlighter...

Protobuf Schema Example

PROTOBUF(31 lines)
Code
Loading syntax highlighter...

JSON Schema Example

JSON(40 lines)
Code
Loading syntax highlighter...

2. Compatibility Modes

┌─────────────────────────────────────────────────────────────────────────┐
│                      COMPATIBILITY MODES                                │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  BACKWARD (Recommended for most cases)                                  │
│  ─────────────────────────────────────                                  │
│  New schema can READ data written with OLD schema                       │
│                                                                         │
│  Allowed changes:                                                       │
│  ✓ Delete fields (if they have defaults in old schema)                  │
│  ✓ Add optional fields (with defaults)                                  │
│                                                                         │
│  Use case: Upgrade consumers first, then producers                      │
│                                                                         │
│  Example:                                                               │
│  Schema v1: {name: string, age: int}                                    │
│  Schema v2: {name: string, age: int, email: string = ""}  ✓ BACKWARD    │
│             (v2 consumer can read v1 data, uses default for email)      │
│                                                                         │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  FORWARD                                                                │
│  ───────                                                                │
│  Old schema can READ data written with NEW schema                       │
│                                                                         │
│  Allowed changes:                                                       │
│  ✓ Add fields (old readers ignore)                                      │
│  ✓ Delete optional fields                                               │
│                                                                         │
│  Use case: Upgrade producers first, consumers later                     │
│                                                                         │
│  Example:                                                               │
│  Schema v1: {name: string, age: int}                                    │
│  Schema v2: {name: string}  ✓ FORWARD                                   │
│             (v1 consumer can read v2 data, age missing but optional)    │
│                                                                         │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  FULL (Most restrictive)                                                │
│  ─────                                                                  │
│  Both BACKWARD and FORWARD compatible                                   │
│                                                                         │
│  Allowed changes:                                                       │
│  ✓ Add optional fields with defaults                                    │
│  ✓ Delete optional fields                                               │
│                                                                         │
│  Use case: When you can't control deployment order                      │
│                                                                         │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  BACKWARD_TRANSITIVE / FORWARD_TRANSITIVE / FULL_TRANSITIVE             │
│  ─────────────────────────────────────────────────────────              │
│  Same as above, but checked against ALL previous versions               │
│  (not just the immediately previous version)                            │
│                                                                         │
│  Use case: Long-lived topics, strict compatibility                      │
│                                                                         │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  NONE                                                                   │
│  ────                                                                   │
│  No compatibility checking                                              │
│  ⚠️ DANGEROUS - Use only for development                                │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Compatible Evolution Examples

AVRO SCHEMA EVOLUTION PATTERNS:

1. ADDING AN OPTIONAL FIELD (Backward compatible)
   ─────────────────────────────────────────────
   v1: {"name": "OrderEvent", "fields": [
         {"name": "orderId", "type": "string"}
       ]}

   v2: {"name": "OrderEvent", "fields": [
         {"name": "orderId", "type": "string"},
         {"name": "priority", "type": "string", "default": "NORMAL"}  ← NEW
       ]}

   ✓ v2 reader + v1 data: Uses default "NORMAL" for priority
   ✓ v1 reader + v2 data: Ignores priority field (if FORWARD too)


2. REMOVING A FIELD (Forward compatible)
   ──────────────────────────────────────
   v1: {"name": "OrderEvent", "fields": [
         {"name": "orderId", "type": "string"},
         {"name": "legacyField", "type": "string", "default": ""}  ← Has default
       ]}

   v2: {"name": "OrderEvent", "fields": [
         {"name": "orderId", "type": "string"}
         // legacyField removed
       ]}

   ✓ v1 reader + v2 data: Uses default for missing legacyField


3. RENAMING A FIELD (Use aliases)
   ────────────────────────────────
   v1: {"name": "OrderEvent", "fields": [
         {"name": "customer_id", "type": "string"}
       ]}

   v2: {"name": "OrderEvent", "fields": [
         {"name": "customerId", "type": "string", "aliases": ["customer_id"]}
       ]}

   ✓ v2 reader reads customer_id as customerId


4. WIDENING A TYPE (Limited support)
   ───────────────────────────────────
   v1: {"name": "OrderEvent", "fields": [
         {"name": "amount", "type": "int"}
       ]}

   v2: {"name": "OrderEvent", "fields": [
         {"name": "amount", "type": "long"}  ← Widened
       ]}

   ✓ int can be read as long (promotion)


5. ADDING ENUM VALUE (Careful!)
   ─────────────────────────────
   v1: {"type": "enum", "name": "Status",
        "symbols": ["PENDING", "COMPLETE"]}

   v2: {"type": "enum", "name": "Status",
        "symbols": ["PENDING", "COMPLETE", "CANCELLED"],  ← NEW VALUE
        "default": "PENDING"}  ← Required for backward compat!

   ✓ v1 reader gets default for unknown CANCELLED

3. Spring Kafka with Schema Registry

JAVA(61 lines)
Code
Loading syntax highlighter...

Using with @KafkaListener

JAVA(32 lines)
Code
Loading syntax highlighter...

4. Subject Naming Strategies

JAVA(33 lines)
Code
Loading syntax highlighter...

5. Schema Registry Operations

BASH(32 lines)
Code
Loading syntax highlighter...
JAVA(26 lines)
Code
Loading syntax highlighter...

Common Mistakes

Mistake 1: Not Setting Compatibility Mode

BASH(10 lines)
Code
Loading syntax highlighter...

Mistake 2: Adding Required Fields Without Defaults

JSON(14 lines)
Code
Loading syntax highlighter...

Mistake 3: Removing Required Fields

JSON(18 lines)
Code
Loading syntax highlighter...

Mistake 4: Using specific.avro.reader=false by Mistake

JAVA(10 lines)
Code
Loading syntax highlighter...

Mistake 5: Not Validating Schemas in CI/CD

YAML(16 lines)
Code
Loading syntax highlighter...

Debug This

Scenario: Deserialization Fails Intermittently

Symptoms:
  • Some messages deserialize fine
  • Others fail with AvroTypeException
  • Pattern seems random
Investigation:
JAVA(19 lines)
Code
Loading syntax highlighter...
BASH(15 lines)
Code
Loading syntax highlighter...
Common Causes:
  1. Multiple schema versions in topic: Old messages with old schema
  2. Schema ID mismatch: Registry restored from backup, IDs changed
  3. Wrong subject naming strategy: Looking up wrong subject
  4. Corrupted message: Magic byte or schema ID corrupted
Resolution:
JAVA(10 lines)
Code
Loading syntax highlighter...

Exercises

Exercise 1: Design an Evolving Schema

Design an Avro schema for a UserEvent that:
  1. Starts with basic fields (id, name, email)
  2. Evolves to add optional phone
  3. Evolves to add address (nested type)
  4. Maintains BACKWARD compatibility throughout

Exercise 2: Set Up Schema Validation CI

Create a CI pipeline that:

  1. Reads Avro schemas from src/main/avro/
  2. Validates compatibility with Schema Registry
  3. Fails build on incompatible changes
  4. Provides clear error messages

Exercise 3: Implement Schema Migration

Build a tool that:

  1. Reads all messages from a topic (old schema)
  2. Transforms to new schema
  3. Writes to new topic
  4. Verifies data integrity

Exercise 4: Multi-Schema Topic Handler

Create a consumer that:

  1. Handles multiple message types on one topic
  2. Uses record name strategy
  3. Routes to different handlers based on type
  4. Gracefully handles unknown types

Exercise 5: Schema Registry HA Setup

Deploy Schema Registry in HA mode:

  1. Multiple Schema Registry instances
  2. Shared Kafka backend for schemas
  3. Load balancer in front
  4. Test failover behavior

Interview Questions

Q1: Why use Schema Registry instead of including schema in each message?

A: Including schema in each message is wasteful and problematic:
Size impact:
  • Schema can be 1KB-10KB
  • Message payload might be 100 bytes
  • 10x-100x overhead per message
  • At 1M messages/day, that's 1-10GB of wasted bandwidth
Schema Registry benefits:
  • Schema stored once, referenced by 4-byte ID
  • ID embedded in message header
  • Consumers cache schemas locally
  • Central schema management and evolution
Additional benefits:
  • Compatibility enforcement
  • Schema versioning and history
  • UI for schema browsing
  • Integration with data governance tools

Q2: Explain BACKWARD vs FORWARD compatibility.

A: They differ in which direction compatibility is guaranteed:
BACKWARD (Recommended):
  • New schema can read old data
  • Allows: Add optional fields (with defaults), delete fields
  • Deploy: Consumers first, then producers
  • Example: Add email field with default - old messages work
FORWARD:
  • Old schema can read new data
  • Allows: Add fields (old readers ignore), delete optional fields
  • Deploy: Producers first, then consumers
  • Example: Remove legacyField - old consumers still work
FULL:
  • Both backward and forward
  • Most restrictive changes
  • Safe for any deployment order
Transitive variants (BACKWARD_TRANSITIVE, etc.):
  • Check compatibility against ALL versions, not just previous
  • Required for long-lived topics

Q3: How do you handle breaking schema changes?

A: True breaking changes require special handling:
Option 1: New topic
orders-v1 → orders-v2 (new topic)
Run consumers for both during migration
Migrate producers to v2
Drain v1, then retire
Option 2: Dual-write
Producer writes to both schemas temporarily
Consumers migrate gradually
Stop dual-write when all migrated
Option 3: Schema transformation
Introduce transform service
Reads old format, writes new format
Consumers use new format only
Option 4: Consumer-side transformation
Consumer handles multiple schema versions
Transform old format to new in memory
Store only new format
Best practice: Avoid breaking changes through good initial design and use of optional fields/defaults.

Q4: What's the difference between subject naming strategies?

A: Subject naming determines how schemas are organized in the registry:
TopicNameStrategy (Default):
  • Subject: <topic>-value or <topic>-key
  • One schema per topic
  • Simple, most common
  • Limitation: Can't reuse schema across topics
RecordNameStrategy:
  • Subject: <fully.qualified.record.name>
  • Schema shared across topics
  • Good for: Same event type on multiple topics
  • Example: com.example.OrderEvent used in orders, order-updates
TopicRecordNameStrategy:
  • Subject: <topic>-<fully.qualified.record.name>
  • Topic-specific versions of same type
  • Good for: When same type evolves differently per topic
JAVA(3 lines)
Code
Loading syntax highlighter...

Q5: How do you test schema compatibility in a CI/CD pipeline?

A: Schema validation should be part of CI/CD:
Approach 1: Schema Registry test endpoint
BASH(7 lines)
Code
Loading syntax highlighter...
Approach 2: Maven/Gradle plugin
XML(15 lines)
Code
Loading syntax highlighter...
Approach 3: Git hooks
BASH(5 lines)
Code
Loading syntax highlighter...
Best practice: Fail the build on incompatible changes, requiring explicit override for intentional breaking changes.

Summary

Key Takeaways

  1. Schema Registry centralizes schema management, storing schemas once and referencing by ID
  2. Choose format based on needs: Avro for Kafka, Protobuf for performance, JSON Schema for web APIs
  3. BACKWARD compatibility is recommended - new consumers can read old data
  4. Always add defaults when adding new fields for backward compatibility
  5. Use aliases in Avro to rename fields safely
  6. Subject naming strategy determines how schemas are organized - TopicNameStrategy is default
  7. Validate schemas in CI/CD to catch breaking changes before deployment
  8. Configure specific.avro.reader=true for typed objects in Spring Kafka

Quick Reference

Spring Kafka + Avro Configuration

YAML(14 lines)
Code
Loading syntax highlighter...

Schema Registry REST API

OperationEndpoint
List subjectsGET /subjects
Get versionsGET /subjects/{subject}/versions
Get schemaGET /subjects/{subject}/versions/{version}
RegisterPOST /subjects/{subject}/versions
Test compatibility
POST /compatibility/subjects/{subject}/versions/{version}
Set compatibilityPUT /config/{subject}

Series Navigation

PreviousCurrentNext
Part 11: Exactly-OncePart 12: Schema RegistryPart 13: Security

Series Overview

  • Part 0: How to Use This Series
  • Parts 1-4: Fundamentals
  • Parts 5-7: Producers
  • Parts 8-11: Consumers
  • Parts 12-14: Operations (Schema Registry, Security, Monitoring)
  • Parts 15-17: Kafka Streams
  • Parts 18-20: Patterns & Practices
  • Part 21: Cheatsheet & Decision Guide