Devops
Schema Registry & Evolution
At a Glance
| Aspect | Details |
|---|---|
| Topic | Schema Registry, Avro/Protobuf/JSON Schema, compatibility modes |
| Complexity | Intermediate |
| Prerequisites | Parts 5-7 (Producers), Parts 8-11 (Consumers) |
| Time | 90 minutes |
| Spring Kafka | Schema Registry serializers, @KafkaListener with Avro |
What You'll Learn
After completing this article, you will be able to:
- Understand why schemas matter at scale
- Choose between Avro, Protobuf, and JSON Schema
- Configure compatibility modes for safe evolution
- Implement Spring Kafka with Schema Registry
- Handle schema evolution without breaking consumers
Production Story: The Breaking Change That Broke Everything
The Incident
Our order service team was releasing a "simple" change: renaming a field from
customer_id to customerId for consistency. They deployed the producer update on Tuesday afternoon. Within minutes, all downstream consumers started failing with deserialization errors.The Investigation
JAVA(17 lines)CodeLoading syntax highlighter...
The impact was immediate:
┌─────────────────────────────────────────────────────────────────────┐ │ THE BREAKING SCHEMA CHANGE │ ├─────────────────────────────────────────────────────────────────────┤ │ │ │ 14:00 - Producer v2 deployed (customer_id → customerId) │ │ │ │ Topic: orders │ │ ┌────────┬────────┬────────┬────────┬────────┬────────┬────────┐ │ │ │ v1 │ v1 │ v1 │ v2 │ v2 │ v2 │ v2 │ │ │ │ msg │ msg │ msg │ msg │ msg │ msg │ msg │ │ │ └────────┴────────┴────────┴────────┴────────┴────────┴────────┘ │ │ ▲ │ │ │ Schema change here │ │ │ │ │ 14:01 - Consumer A (v1 schema) │ │ ├── Reads v1 messages: OK │ │ └── Reads v2 messages: FAIL! Missing field! │ │ │ │ 14:02 - Consumer B (v1 schema) │ │ └── Same failure │ │ │ │ 14:03 - Consumer C (v1 schema) │ │ └── Same failure │ │ │ │ 14:05 - All order processing stopped │ │ - 5 consumer groups affected │ │ - 15 services down │ │ - Orders piling up │ │ │ │ 14:30 - Rollback producer to v1 │ │ - But v2 messages still in topic! │ │ - Consumers still failing on v2 messages │ │ │ │ 15:00 - Manual intervention to skip bad messages │ │ - Lost 500 orders in the gap │ │ │ └─────────────────────────────────────────────────────────────────────┘
The Root Cause
No schema registry, no compatibility checks:
- Producer changed schema without compatibility validation
- Consumer had no way to handle old vs new format
- No schema versioning - just "whatever JSON we send"
The Fix
JAVA(45 lines)CodeLoading syntax highlighter...
After the fix: Zero breaking changes, safe schema evolution.
Mental Model: Schema Registry Architecture
┌─────────────────────────────────────────────────────────────────────────┐ │ SCHEMA REGISTRY ARCHITECTURE │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌───────────────────┐ ┌───────────────────┐ │ │ │ Producer │ │ Consumer │ │ │ │ │ │ │ │ │ │ OrderEvent obj │ │ Deserialize │ │ │ │ │ │ │ ▲ │ │ │ │ ▼ │ │ │ │ │ │ │ KafkaAvro │ │ KafkaAvro │ │ │ │ Serializer │ │ Deserializer │ │ │ └────────┬──────────┘ └───────┬───────────┘ │ │ │ │ │ │ │ 1. Get/Register Schema │ 4. Get Schema │ │ │ │ by ID │ │ ▼ ▼ │ │ ┌──────────────────────────────────────────────────────────────┐ │ │ │ SCHEMA REGISTRY │ │ │ │ │ │ │ │ Subjects: │ │ │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ │ │ orders-value │ │ │ │ │ │ ├── Version 1: Schema ID 1 (original) │ │ │ │ │ │ ├── Version 2: Schema ID 5 (added field) │ │ │ │ │ │ └── Version 3: Schema ID 12 (current) │ │ │ │ │ ├─────────────────────────────────────────────────────────┤ │ │ │ │ │ orders-key │ │ │ │ │ │ └── Version 1: Schema ID 2 (string key) │ │ │ │ │ ├─────────────────────────────────────────────────────────┤ │ │ │ │ │ users-value │ │ │ │ │ │ └── Version 1: Schema ID 3 │ │ │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ │ Compatibility Settings: │ │ │ │ orders-value: BACKWARD │ │ │ │ users-value: FULL │ │ │ │ │ │ │ └──────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ 2. Schema ID │ │ │ ▼ │ │ │ ┌──────────────────────────────────────────────────────────────┐ │ │ │ KAFKA │ │ │ │ │ │ │ │ Message Format: │ │ │ │ ┌──────┬──────────┬────────────────────────────────────┐ │ │ │ │ │ 0x0 │ Schema │ Avro Payload │ │ │ │ │ │ │ ID (4B) │ │ │ │ │ │ └──────┴──────────┴────────────────────────────────────┘ │ │ │ │ Magic Schema ID Binary Avro data (no schema embedded) │ │ │ │ byte (int) │ │ │ │ │ │ │ └──────────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────┘ Message on wire (example): ┌──────┬──────────┬──────────────────────────────────────┐ │ 0x00 │ 00000005 │ <binary avro: {customerId:"C123"...}>│ └──────┴──────────┴──────────────────────────────────────┘ Schema ID=5 Consumer flow: 1. Read message from Kafka 2. Extract Schema ID (bytes 1-4) 3. Fetch schema from Registry (cached locally) 4. Deserialize payload using schema 5. Return typed object
Deep Dive
1. Schema Formats Comparison
┌─────────────────┬──────────────────┬──────────────────┬──────────────────┐ │ │ Avro │ Protobuf │ JSON Schema │ ├─────────────────┼──────────────────┼──────────────────┼──────────────────┤ │ Format │ Binary │ Binary │ Text (JSON) │ │ Size │ Compact │ Very compact │ Large │ │ Speed │ Fast │ Fastest │ Slow │ ├─────────────────┼──────────────────┼──────────────────┼──────────────────┤ │ Schema in msg │ No (ID only) │ No (ID only) │ No (ID only) │ │ Schema language │ JSON │ Proto files │ JSON │ │ Code gen │ Optional │ Required │ Optional │ ├─────────────────┼──────────────────┼──────────────────┼──────────────────┤ │ Evolution │ Excellent │ Good │ Good │ │ Default values │ Yes │ Yes │ Yes │ │ Aliases │ Yes │ No │ No │ ├─────────────────┼──────────────────┼──────────────────┼──────────────────┤ │ Ecosystem │ Hadoop/Kafka │ gRPC/Google │ Web APIs │ │ Learning curve │ Medium │ Medium │ Low │ ├─────────────────┼──────────────────┼──────────────────┼──────────────────┤ │ Best for │ Kafka events │ High-perf RPC │ Existing JSON │ │ │ Data pipelines │ Cross-language │ Web services │ └─────────────────┴──────────────────┴──────────────────┴──────────────────┘
Avro Schema Example
JSON(62 lines)CodeLoading syntax highlighter...
Protobuf Schema Example
PROTOBUF(31 lines)CodeLoading syntax highlighter...
JSON Schema Example
JSON(40 lines)CodeLoading syntax highlighter...
2. Compatibility Modes
┌─────────────────────────────────────────────────────────────────────────┐ │ COMPATIBILITY MODES │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ BACKWARD (Recommended for most cases) │ │ ───────────────────────────────────── │ │ New schema can READ data written with OLD schema │ │ │ │ Allowed changes: │ │ ✓ Delete fields (if they have defaults in old schema) │ │ ✓ Add optional fields (with defaults) │ │ │ │ Use case: Upgrade consumers first, then producers │ │ │ │ Example: │ │ Schema v1: {name: string, age: int} │ │ Schema v2: {name: string, age: int, email: string = ""} ✓ BACKWARD │ │ (v2 consumer can read v1 data, uses default for email) │ │ │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ FORWARD │ │ ─────── │ │ Old schema can READ data written with NEW schema │ │ │ │ Allowed changes: │ │ ✓ Add fields (old readers ignore) │ │ ✓ Delete optional fields │ │ │ │ Use case: Upgrade producers first, consumers later │ │ │ │ Example: │ │ Schema v1: {name: string, age: int} │ │ Schema v2: {name: string} ✓ FORWARD │ │ (v1 consumer can read v2 data, age missing but optional) │ │ │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ FULL (Most restrictive) │ │ ───── │ │ Both BACKWARD and FORWARD compatible │ │ │ │ Allowed changes: │ │ ✓ Add optional fields with defaults │ │ ✓ Delete optional fields │ │ │ │ Use case: When you can't control deployment order │ │ │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ BACKWARD_TRANSITIVE / FORWARD_TRANSITIVE / FULL_TRANSITIVE │ │ ───────────────────────────────────────────────────────── │ │ Same as above, but checked against ALL previous versions │ │ (not just the immediately previous version) │ │ │ │ Use case: Long-lived topics, strict compatibility │ │ │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ NONE │ │ ──── │ │ No compatibility checking │ │ ⚠️ DANGEROUS - Use only for development │ │ │ └─────────────────────────────────────────────────────────────────────────┘
Compatible Evolution Examples
AVRO SCHEMA EVOLUTION PATTERNS: 1. ADDING AN OPTIONAL FIELD (Backward compatible) ───────────────────────────────────────────── v1: {"name": "OrderEvent", "fields": [ {"name": "orderId", "type": "string"} ]} v2: {"name": "OrderEvent", "fields": [ {"name": "orderId", "type": "string"}, {"name": "priority", "type": "string", "default": "NORMAL"} ← NEW ]} ✓ v2 reader + v1 data: Uses default "NORMAL" for priority ✓ v1 reader + v2 data: Ignores priority field (if FORWARD too) 2. REMOVING A FIELD (Forward compatible) ────────────────────────────────────── v1: {"name": "OrderEvent", "fields": [ {"name": "orderId", "type": "string"}, {"name": "legacyField", "type": "string", "default": ""} ← Has default ]} v2: {"name": "OrderEvent", "fields": [ {"name": "orderId", "type": "string"} // legacyField removed ]} ✓ v1 reader + v2 data: Uses default for missing legacyField 3. RENAMING A FIELD (Use aliases) ──────────────────────────────── v1: {"name": "OrderEvent", "fields": [ {"name": "customer_id", "type": "string"} ]} v2: {"name": "OrderEvent", "fields": [ {"name": "customerId", "type": "string", "aliases": ["customer_id"]} ]} ✓ v2 reader reads customer_id as customerId 4. WIDENING A TYPE (Limited support) ─────────────────────────────────── v1: {"name": "OrderEvent", "fields": [ {"name": "amount", "type": "int"} ]} v2: {"name": "OrderEvent", "fields": [ {"name": "amount", "type": "long"} ← Widened ]} ✓ int can be read as long (promotion) 5. ADDING ENUM VALUE (Careful!) ───────────────────────────── v1: {"type": "enum", "name": "Status", "symbols": ["PENDING", "COMPLETE"]} v2: {"type": "enum", "name": "Status", "symbols": ["PENDING", "COMPLETE", "CANCELLED"], ← NEW VALUE "default": "PENDING"} ← Required for backward compat! ✓ v1 reader gets default for unknown CANCELLED
3. Spring Kafka with Schema Registry
JAVA(61 lines)CodeLoading syntax highlighter...
Using with @KafkaListener
JAVA(32 lines)CodeLoading syntax highlighter...
4. Subject Naming Strategies
JAVA(33 lines)CodeLoading syntax highlighter...
5. Schema Registry Operations
BASH(32 lines)CodeLoading syntax highlighter...
JAVA(26 lines)CodeLoading syntax highlighter...
Common Mistakes
Mistake 1: Not Setting Compatibility Mode
BASH(10 lines)CodeLoading syntax highlighter...
Mistake 2: Adding Required Fields Without Defaults
JSON(14 lines)CodeLoading syntax highlighter...
Mistake 3: Removing Required Fields
JSON(18 lines)CodeLoading syntax highlighter...
Mistake 4: Using specific.avro.reader=false by Mistake
JAVA(10 lines)CodeLoading syntax highlighter...
Mistake 5: Not Validating Schemas in CI/CD
YAML(16 lines)CodeLoading syntax highlighter...
Debug This
Scenario: Deserialization Fails Intermittently
Symptoms:
- Some messages deserialize fine
- Others fail with
AvroTypeException - Pattern seems random
Investigation:
JAVA(19 lines)CodeLoading syntax highlighter...
BASH(15 lines)CodeLoading syntax highlighter...
Common Causes:
- Multiple schema versions in topic: Old messages with old schema
- Schema ID mismatch: Registry restored from backup, IDs changed
- Wrong subject naming strategy: Looking up wrong subject
- Corrupted message: Magic byte or schema ID corrupted
Resolution:
JAVA(10 lines)CodeLoading syntax highlighter...
Exercises
Exercise 1: Design an Evolving Schema
Design an Avro schema for a
UserEvent that:- Starts with basic fields (id, name, email)
- Evolves to add optional phone
- Evolves to add address (nested type)
- Maintains BACKWARD compatibility throughout
Exercise 2: Set Up Schema Validation CI
Create a CI pipeline that:
- Reads Avro schemas from
src/main/avro/ - Validates compatibility with Schema Registry
- Fails build on incompatible changes
- Provides clear error messages
Exercise 3: Implement Schema Migration
Build a tool that:
- Reads all messages from a topic (old schema)
- Transforms to new schema
- Writes to new topic
- Verifies data integrity
Exercise 4: Multi-Schema Topic Handler
Create a consumer that:
- Handles multiple message types on one topic
- Uses record name strategy
- Routes to different handlers based on type
- Gracefully handles unknown types
Exercise 5: Schema Registry HA Setup
Deploy Schema Registry in HA mode:
- Multiple Schema Registry instances
- Shared Kafka backend for schemas
- Load balancer in front
- Test failover behavior
Interview Questions
Q1: Why use Schema Registry instead of including schema in each message?
A: Including schema in each message is wasteful and problematic:
Size impact:
- Schema can be 1KB-10KB
- Message payload might be 100 bytes
- 10x-100x overhead per message
- At 1M messages/day, that's 1-10GB of wasted bandwidth
Schema Registry benefits:
- Schema stored once, referenced by 4-byte ID
- ID embedded in message header
- Consumers cache schemas locally
- Central schema management and evolution
Additional benefits:
- Compatibility enforcement
- Schema versioning and history
- UI for schema browsing
- Integration with data governance tools
Q2: Explain BACKWARD vs FORWARD compatibility.
A: They differ in which direction compatibility is guaranteed:
BACKWARD (Recommended):
- New schema can read old data
- Allows: Add optional fields (with defaults), delete fields
- Deploy: Consumers first, then producers
- Example: Add
emailfield with default - old messages work
FORWARD:
- Old schema can read new data
- Allows: Add fields (old readers ignore), delete optional fields
- Deploy: Producers first, then consumers
- Example: Remove
legacyField- old consumers still work
FULL:
- Both backward and forward
- Most restrictive changes
- Safe for any deployment order
Transitive variants (BACKWARD_TRANSITIVE, etc.):
- Check compatibility against ALL versions, not just previous
- Required for long-lived topics
Q3: How do you handle breaking schema changes?
A: True breaking changes require special handling:
Option 1: New topic
orders-v1 → orders-v2 (new topic) Run consumers for both during migration Migrate producers to v2 Drain v1, then retire
Option 2: Dual-write
Producer writes to both schemas temporarily Consumers migrate gradually Stop dual-write when all migrated
Option 3: Schema transformation
Introduce transform service Reads old format, writes new format Consumers use new format only
Option 4: Consumer-side transformation
Consumer handles multiple schema versions Transform old format to new in memory Store only new format
Best practice: Avoid breaking changes through good initial design and use of optional fields/defaults.
Q4: What's the difference between subject naming strategies?
A: Subject naming determines how schemas are organized in the registry:
TopicNameStrategy (Default):
- Subject:
<topic>-valueor<topic>-key - One schema per topic
- Simple, most common
- Limitation: Can't reuse schema across topics
RecordNameStrategy:
- Subject:
<fully.qualified.record.name> - Schema shared across topics
- Good for: Same event type on multiple topics
- Example:
com.example.OrderEventused inorders,order-updates
TopicRecordNameStrategy:
- Subject:
<topic>-<fully.qualified.record.name> - Topic-specific versions of same type
- Good for: When same type evolves differently per topic
JAVA(3 lines)CodeLoading syntax highlighter...
Q5: How do you test schema compatibility in a CI/CD pipeline?
A: Schema validation should be part of CI/CD:
Approach 1: Schema Registry test endpoint
BASH(7 lines)CodeLoading syntax highlighter...
Approach 2: Maven/Gradle plugin
XML(15 lines)CodeLoading syntax highlighter...
Approach 3: Git hooks
BASH(5 lines)CodeLoading syntax highlighter...
Best practice: Fail the build on incompatible changes, requiring explicit override for intentional breaking changes.
Summary
Key Takeaways
-
Schema Registry centralizes schema management, storing schemas once and referencing by ID
-
Choose format based on needs: Avro for Kafka, Protobuf for performance, JSON Schema for web APIs
-
BACKWARD compatibility is recommended - new consumers can read old data
-
Always add defaults when adding new fields for backward compatibility
-
Use aliases in Avro to rename fields safely
-
Subject naming strategy determines how schemas are organized - TopicNameStrategy is default
-
Validate schemas in CI/CD to catch breaking changes before deployment
-
Configure
specific.avro.reader=truefor typed objects in Spring Kafka
Quick Reference
Spring Kafka + Avro Configuration
YAML(14 lines)CodeLoading syntax highlighter...
Schema Registry REST API
| Operation | Endpoint |
|---|---|
| List subjects | GET /subjects |
| Get versions | GET /subjects/{subject}/versions |
| Get schema | GET /subjects/{subject}/versions/{version} |
| Register | POST /subjects/{subject}/versions |
| Test compatibility | |
| Set compatibility | PUT /config/{subject} |
Series Navigation
| Previous | Current | Next |
|---|---|---|
| Part 11: Exactly-Once | Part 12: Schema Registry | Part 13: Security |
Series Overview
- Part 0: How to Use This Series
- Parts 1-4: Fundamentals
- Parts 5-7: Producers
- Parts 8-11: Consumers
- Parts 12-14: Operations (Schema Registry, Security, Monitoring)
- Parts 15-17: Kafka Streams
- Parts 18-20: Patterns & Practices
- Part 21: Cheatsheet & Decision Guide