Devops

How to Use This Series

πŸ“‹ Who This Series Is For

This series is for mid-senior developers who:
  • βœ… Already use Kafka in production
  • βœ… Can write producers and consumers
  • βœ… Know what topics, partitions, and consumer groups are
  • βœ… Want to understand why things work the way they do
  • βœ… Need to debug production issues confidently
  • βœ… Are preparing for senior-level interviews
This series is NOT for:
  • ❌ Complete beginners (start with Kafka Quickstart first)
  • ❌ People who just need to send a message (use the docs)
  • ❌ Those looking for copy-paste solutions without understanding

🎯 What You'll Master

By the end of this series, you'll be able to:

SkillParts
Explain why Kafka is fast1-2
Configure producers for your exact guarantees5-7
Debug consumer lag and rebalancing issues8-10
Implement exactly-once processing11
Design schemas that evolve safely12
Secure a multi-tenant cluster13
Build streaming applications15-17
Design event-driven architectures18
Handle errors gracefully19
Pass senior Kafka interviewsAll

πŸ—ΊοΈ Series Roadmap

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    KAFKA COMPENDIUM ROADMAP                     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚   FUNDAMENTALS (Parts 1-4)          ← Start here                β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚ 1. Architecture & Storage    Why Kafka is fast          β”‚   β”‚
β”‚   β”‚ 2. Partitions & Replication  Data distribution          β”‚   β”‚
β”‚   β”‚ 3. Leaders, ISR & Faults     Reliability guarantees     β”‚   β”‚
β”‚   β”‚ 4. KRaft vs ZooKeeper        Cluster coordination       β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                           ↓                                     β”‚
β”‚   PRODUCERS (Parts 5-7)                                         β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚ 5. Producer Internals        How sending works          β”‚   β”‚
β”‚   β”‚ 6. Delivery Guarantees       acks, idempotence          β”‚   β”‚
β”‚   β”‚ 7. Advanced Patterns         Transactions, ordering     β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                           ↓                                     β”‚
β”‚   CONSUMERS (Parts 8-11)                                        β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚ 8. Consumer Internals        How fetching works         β”‚   β”‚
β”‚   β”‚ 9. Groups & Rebalancing      Coordination patterns      β”‚   β”‚
β”‚   β”‚ 10. Offset Management        Commit strategies          β”‚   β”‚
β”‚   β”‚ 11. Exactly-Once             End-to-end guarantees      β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                           ↓                                     β”‚
β”‚   OPERATIONS (Parts 12-14)                                      β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚ 12. Schema Registry          Schema evolution           β”‚   β”‚
β”‚   β”‚ 13. Security                 AuthN, AuthZ, encryption   β”‚   β”‚
β”‚   β”‚ 14. Monitoring               Metrics, alerting, ops     β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                           ↓                                     β”‚
β”‚   KAFKA STREAMS (Parts 15-17)                                   β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚ 15. Streams Fundamentals     KStream, KTable, topology  β”‚   β”‚
β”‚   β”‚ 16. State Stores             Stateful processing        β”‚   β”‚
β”‚   β”‚ 17. Windowing & Joins        Time-based operations      β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                           ↓                                     β”‚
β”‚   PATTERNS (Parts 18-20)                                        β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚ 18. Event-Driven Patterns    Sourcing, CQRS, Saga       β”‚   β”‚
β”‚   β”‚ 19. Error Handling           DLQ, retries, recovery     β”‚   β”‚
β”‚   β”‚ 20. Testing                  Unit, integration, E2E     β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                           ↓                                     β”‚
β”‚   REFERENCE (Part 21)                                           β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚ 21. Cheatsheet               Commands, configs, trees   β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“š Learning Paths

Time: ~35 hours | For: Comprehensive understanding

Read sequentially: Parts 0 β†’ 21

Path 2: Producer Focus

Time: ~8 hours | For: Backend developers sending events
  1. Part 1: Architecture (why it matters)
  2. Parts 5-7: Producer deep dive
  3. Part 12: Schema Registry
  4. Part 19: Error handling

Path 3: Consumer Focus

Time: ~10 hours | For: Event processors, workers
  1. Part 1: Architecture
  2. Parts 8-11: Consumer deep dive
  3. Part 14: Monitoring (lag!)
  4. Part 19: Error handling

Path 4: Kafka Streams

Time: ~12 hours | For: Stream processing engineers
  1. Parts 1-2: Fundamentals
  2. Part 10: Offset management
  3. Parts 15-17: Kafka Streams
  4. Part 20: Testing

Path 5: Interview Prep

Time: ~6 hours | For: Senior role interviews

Focus on Interview Questions sections in:

  • Part 3: Fault tolerance
  • Part 6: Delivery guarantees
  • Part 9: Consumer groups
  • Part 11: Exactly-once
  • Part 21: Cheatsheet

πŸ› οΈ Prerequisites Check

Before starting, you should be able to answer:

QuestionIf No, Study
What is a Kafka topic?Kafka Quickstart
What is a partition?Kafka Quickstart
What is a consumer group?Kafka Basics
Can you write a Spring Boot app?Spring Boot guides
What is a distributed system?Distributed Systems basics
Quick self-test: If you can explain "consumer group rebalancing" in one sentence, you're ready.

πŸ’» Local Environment Setup

You'll need a local Kafka cluster to practice. Here's a production-like setup using KRaft (no ZooKeeper):

Docker Compose Setup

YAML(81 lines)
Code
Loading syntax highlighter...

Starting the Environment

BASH(14 lines)
Code
Loading syntax highlighter...

Spring Boot Dependencies

XML(40 lines)
Code
Loading syntax highlighter...
YAML(14 lines)
Code
Loading syntax highlighter...

πŸ“– Article Structure

Every article in this series follows the same structure:

1. At a Glance

Quick reference table with difficulty, prerequisites, and time investment.

2. What You'll Learn

4-5 concrete learning objectives.

3. Production Story πŸ”₯

A real-world scenario where things went wrong. Each story includes:

  • The setup (what the team was trying to do)
  • The symptoms (what went wrong)
  • The investigation (how they found the problem)
  • The root cause (the actual issue)
  • The fix (how they solved it)
Why? These stories make abstract concepts memorable.

4. Mental Model 🧠

ASCII diagram showing how the concept works visually. These diagrams are designed to be the "picture in your head" when thinking about the concept.

5. Deep Dive πŸ”¬

The main technical content, typically 5-8 sections with code examples.

6. Common Mistakes ⚠️

5 mistakes with "wrong" and "right" examples. Learn from others' errors.

7. Debug This πŸ›

An interactive scenario: given symptoms, figure out what's wrong. Answers are in a collapsible section.

8. Exercises πŸ’»

5 hands-on exercises ranging from basic to advanced.

9. Interview Questions 🎀

5 senior-level questions with detailed answers. These are questions you might face in staff/principal engineer interviews.

10. Summary & Quick Reference

Key takeaways and copy-paste reference snippets.


🎯 How to Study

For Maximum Retention

  1. Read the Production Story first - It gives context for why the concept matters
  2. Draw the Mental Model yourself - Don't just look at it, recreate it
  3. Run the code examples - Type them, don't copy-paste
  4. Do at least 2 exercises - Application beats passive reading
  5. Explain Interview Questions out loud - Teaching solidifies learning

Spaced Repetition Schedule

For each part:

  • Day 1: Read and do exercises
  • Day 3: Review mental model, redo Debug This
  • Day 7: Answer interview questions without looking
  • Day 14: Quick review of summary
  • Day 30: Full review if preparing for interviews

⚑ Quick Start Verification

Let's verify your environment works. Create a simple producer/consumer:

JAVA(24 lines)
Code
Loading syntax highlighter...
JAVA(9 lines)
Code
Loading syntax highlighter...
JAVA(18 lines)
Code
Loading syntax highlighter...

If you see messages being sent and received, you're ready!


πŸ”‘ Key Conventions

Configuration Notation

JAVA(4 lines)
Code
Loading syntax highlighter...

Code Style

  • All examples use Java 21+ features
  • All Spring examples use Spring Boot 3.x and Spring Kafka 3.x
  • Kafka version is 3.6+ (KRaft mode)
  • Examples are complete and runnable (not snippets)

Terminology

TermMeaning
BrokerKafka server instance
LeaderBroker responsible for partition reads/writes
FollowerBroker that replicates from leader
ISRIn-Sync Replicas - followers caught up with leader
Consumer GroupSet of consumers sharing topic consumption
OffsetPosition of a message in a partition
LagHow far behind a consumer is
RebalanceRedistributing partitions among consumers

πŸ†˜ Getting Help

If you get stuck:

  1. Check the Debug This section - Your issue might be a common one
  2. Review the Common Mistakes - You might be hitting a known pitfall
  3. Use Kafka UI - Visual inspection often reveals the problem
  4. Check broker logs - docker compose logs kafka

πŸ“š What's Next?

Ready to dive in? Start with Part 1: Architecture & Storage Engine to understand why Kafka is fundamentally different from other message brokers.

You'll learn:

  • Why Kafka uses a log-structured storage
  • How zero-copy transfer makes it fast
  • What the page cache does for performance
  • How messages are actually stored on disk

πŸ“… Review Schedule

  • Now: Set up your local environment
  • Part 1: Understand why Kafka is fast
  • Part 5: Send your first production-ready message
  • Part 8: Consume with confidence
  • Part 21: Have a complete mental model of Kafka

πŸ“š Series Navigation