Cluster Coordination (KRaft vs ZooKeeper)

At a Glance

Aspect	Details
Topic	Cluster coordination, metadata management, controller election
Complexity	Intermediate
Prerequisites	Parts 1-3 (Architecture, Partitions, Fault Tolerance)
Time	90 minutes
Kafka Version	3.6+ (KRaft production-ready)

What You'll Learn

After completing this article, you will be able to:

Explain why ZooKeeper is being removed from Kafka's architecture
Describe KRaft's controller quorum and how it handles metadata
Configure a KRaft-based Kafka cluster for production
Plan migration from ZooKeeper to KRaft mode
Troubleshoot controller election and metadata propagation issues

Production Story: The ZooKeeper Session Timeout Storm

The Incident

It was Black Friday, and our e-commerce platform was handling 5x normal traffic. At 2:47 PM, alerts started firing: "Consumer lag increasing across all topics." Within minutes, the entire Kafka cluster became unresponsive.

The Investigation

BASH(5 lines)
Code
Loading syntax highlighter...

The cluster had 15 brokers, 200+ consumers, and 50+ producers - all maintaining ZooKeeper sessions. Under extreme load:

GC pauses on ZooKeeper nodes exceeded session timeout
Session expirations triggered mass reconnections
Reconnection storm overwhelmed ZooKeeper
Broker disconnections caused controller failover
Cascading failures across the entire cluster

Timeline of Chaos:
14:47:00 - ZK node 1: Long GC pause (8 seconds)
14:47:08 - 500+ sessions expire simultaneously
14:47:09 - Reconnection storm begins
14:47:15 - ZK node 2 overwhelmed, stops responding
14:47:20 - Controller broker loses ZK session
14:47:21 - Controller election starts
14:47:45 - New controller elected, but ZK still struggling
14:48:00 - Brokers can't update metadata
14:48:30 - Producers start timing out
14:49:00 - Full cluster unavailability

The Root Cause

ZooKeeper's architecture wasn't designed for Kafka's scale:

┌─────────────────────────────────────────────────────────┐
│                   ZooKeeper Cluster                     │
│  ┌─────────┐    ┌─────────┐    ┌─────────┐              │
│  │  ZK-1   │    │  ZK-2   │    │  ZK-3   │              │
│  │ (Leader)│◄──►│(Follower│◄──►│(Follower│              │
│  └────┬────┘    └────┬────┘    └────┬────┘              │
│       │              │              │                   │
└───────┼──────────────┼──────────────┼───────────────────┘
        │              │              │
        ▼              ▼              ▼
   ┌─────────────────────────────────────────────┐
   │        ALL connections go to ZK             │
   │                                             │
   │  15 Brokers × 1 connection = 15             │
   │  200 Consumers × 1 connection = 200         │
   │  50 Producers (old clients) = 50            │
   │  Controller = 1                             │
   │  ─────────────────────────────              │
   │  Total: 266+ persistent connections         │
   │  + All their watches and ephemeral nodes    │
   └─────────────────────────────────────────────┘

The Fix (Short-term)

PROPERTIES(11 lines)
Code
Loading syntax highlighter...

The Real Solution: KRaft Migration

We migrated to KRaft mode, eliminating ZooKeeper entirely. Result:

No more session storms - clients don't connect to controllers
Faster failover - controller election in milliseconds, not seconds
Simplified operations - one system instead of two
Better scalability - tested to millions of partitions

Mental Model: ZooKeeper vs KRaft Architecture

ZooKeeper Mode (Legacy)

┌─────────────────────────────────────────────────────────────┐
│                    ZOOKEEPER MODE                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌──────────────────────┐     ┌──────────────────────┐      │
│  │   ZooKeeper Cluster  │     │    Kafka Cluster     │      │
│  │  ┌────┐ ┌────┐ ┌────┐│     │ ┌────┐ ┌────┐ ┌────┐ │      │
│  │  │ZK-1│ │ZK-2│ │ZK-3││     │ │ B1 │ │ B2 │ │ B3 │ │      │
│  │  └──┬─┘ └──┬─┘ └──┬─┘│     │ │    │ │CTRL│ │    │ │      │
│  │     │      │      │  │     │ └──┬─┘ └──┬─┘ └──┬─┘ │      │
│  │     └──────┼──────┘  │     │    │      │      │   │      │
│  │            │         │     │    └──────┼──────┘   │      │
│  └────────────┼─────────┘     └───────────┼──────────┘      │
│               │                           │                 │
│               └───────────┬───────────────┘                 │
│                           │                                 │
│                    ZK Connection                            │
│              (All brokers connect to ZK)                    │
│                                                             │
│  Metadata stored in: ZooKeeper znodes                       │
│  Controller election: Via ZK ephemeral node                 │
│  Broker registration: ZK ephemeral nodes                    │
│  Config changes: Written to ZK, brokers watch               │
│                                                             │
└─────────────────────────────────────────────────────────────┘

KRaft Mode (Modern)

┌─────────────────────────────────────────────────────────────┐
│                      KRAFT MODE                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌─────────────────────────────────────────────────────┐    │
│  │              Kafka Cluster (Self-Managed)           │    │
│  │                                                     │    │
│  │   Controllers (Quorum)         Brokers              │    │
│  │  ┌─────────────────────┐    ┌──────────────────┐    │    │
│  │  │ ┌────┐ ┌────┐ ┌────┐│    │ ┌────┐    ┌────┐ │    │    │
│  │  │ │ C1 │ │ C2 │ │ C3 ││    │ │ B1 │    │ B2 │ │    │    │
│  │  │ │ACT │ │FLWR│ │FLWR││    │ │    │    │    │ │    │    │
│  │  │ └──┬─┘ └──┬─┘ └──┬─┘│    │ └──┬─┘    └──┬─┘ │    │    │
│  │  │    │      │      │  │    │    │         │   │    │    │
│  │  │    └──────┼──────┘  │    │    └────┬────┘   │    │    │
│  │  │           │         │    │         │        │    │    │
│  │  └───────────┼─────────┘    └─────────┼────────┘    │    │
│  │              │                        │             │    │
│  │              └────────────────────────┘             │    │
│  │                    Metadata Push                    │    │
│  │             (Controllers push to brokers)           │    │
│  │                                                     │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                             │
│  Metadata stored in: __cluster_metadata topic (Raft log)    │
│  Controller election: Raft consensus                        │
│  Broker registration: Metadata records                      │
│  Config changes: Replicated via Raft                        │
│                                                             │
│  NO ZOOKEEPER NEEDED!                                       │
└─────────────────────────────────────────────────────────────┘

Key Architectural Differences

┌────────────────────┬─────────────────────┬─────────────────────┐
│      Aspect        │     ZooKeeper       │       KRaft         │
├────────────────────┼─────────────────────┼─────────────────────┤
│ Metadata Storage   │ ZK znodes           │ __cluster_metadata  │
│                    │ (external system)   │ (internal topic)    │
├────────────────────┼─────────────────────┼─────────────────────┤
│ Controller         │ One active          │ Quorum (3-5 nodes)  │
│ Architecture       │ (others standby)    │ (Raft consensus)    │
├────────────────────┼─────────────────────┼─────────────────────┤
│ Failover Time      │ Seconds to minutes  │ Milliseconds        │
│                    │ (ZK session timeout)│ (Raft heartbeat)    │
├────────────────────┼─────────────────────┼─────────────────────┤
│ Scalability        │ ~200K partitions    │ Millions of         │
│                    │ (ZK is bottleneck)  │ partitions          │
├────────────────────┼─────────────────────┼─────────────────────┤
│ Client Connections │ Clients → ZK        │ Clients → Brokers   │
│                    │ (for old clients)   │ (no ZK contact)     │
├────────────────────┼─────────────────────┼─────────────────────┤
│ Operational        │ Two systems         │ One system          │
│ Complexity         │ (ZK + Kafka)        │ (Kafka only)        │
└────────────────────┴─────────────────────┴─────────────────────┘

Deep Dive

1. What ZooKeeper Did for Kafka

Before understanding KRaft, let's appreciate what ZooKeeper handled:

ZooKeeper's Responsibilities in Kafka:

1. CONTROLLER ELECTION
   /controller → {"brokerid": 2, "timestamp": ...}
   (Ephemeral node - disappears when broker dies)

2. BROKER REGISTRATION
   /brokers/ids/1 → {"host": "broker1", "port": 9092, ...}
   /brokers/ids/2 → {"host": "broker2", "port": 9092, ...}
   (Ephemeral nodes for liveness detection)

3. TOPIC CONFIGURATION
   /brokers/topics/orders → {"partitions": {"0": [1,2,3], ...}}
   /config/topics/orders → {"retention.ms": "604800000"}

4. PARTITION LEADERSHIP
   /brokers/topics/orders/partitions/0/state →
   {"leader": 1, "isr": [1,2,3], "controller_epoch": 5}

5. ACLs AND QUOTAS
   /kafka-acl/Topic/orders → [acl entries]
   /config/users/alice → {"producer_byte_rate": "1000000"}

6. CONSUMER GROUP OFFSETS (Legacy)
   /consumers/my-group/offsets/orders/0 → "12345"
   (Modern Kafka uses __consumer_offsets topic instead)

Problems with ZooKeeper Dependency

JAVA(29 lines)
Code
Loading syntax highlighter...

2. KRaft Architecture Deep Dive

KRaft (Kafka Raft) replaces ZooKeeper with a built-in consensus protocol:

┌───────────────────────────────────────────────────────────────┐
│                    KRAFT CONTROLLER QUORUM                    │
├───────────────────────────────────────────────────────────────┤
│                                                               │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐        │
│  │ Controller 1│    │ Controller 2│    │ Controller 3│        │
│  │   (ACTIVE)  │    │  (FOLLOWER) │    │  (FOLLOWER) │        │
│  │             │    │             │    │             │        │
│  │  Raft Log:  │    │  Raft Log:  │    │  Raft Log:  │        │
│  │  ┌────────┐ │    │  ┌────────┐ │    │  ┌────────┐ │        │
│  │  │Record 1│ │    │  │Record 1│ │    │  │Record 1│ │        │
│  │  │Record 2│ │    │  │Record 2│ │    │  │Record 2│ │        │
│  │  │Record 3│ │    │  │Record 3│ │    │  │Record 3│ │        │
│  │  │   ...  │ │    │  │   ...  │ │    │  │   ...  │ │        │
│  │  └────────┘ │    │  └────────┘ │    │  └────────┘ │        │
│  │             │    │             │    │             │        │
│  │  In-Memory  │    │  In-Memory  │    │  In-Memory  │        │
│  │  Metadata   │    │  Metadata   │    │  Metadata   │        │
│  │  Cache      │    │  Cache      │    │  Cache      │        │
│  └─────┬───────┘    └──────┬──────┘    └──────┬──────┘        │
│        │                   │                  │               │
│        │         Raft Replication             │               │
│        └───────────────────┼──────────────────┘               │
│                            │                                  │
│                            ▼                                  │
│              __cluster_metadata topic                         │
│              (The Raft log, partitioned)                      │
│                                                               │
└───────────────────────────────────────────────────────────────┘

Metadata Records in KRaft

JAVA(20 lines)
Code
Loading syntax highlighter...

3. Controller Quorum Mechanics

RAFT CONSENSUS IN KRAFT:

┌─────────────────────────────────────────────────────────────┐
│                    LEADER ELECTION                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. Initial state: No leader                                │
│     ┌────┐  ┌────┐  ┌────┐                                  │
│     │ C1 │  │ C2 │  │ C3 │  All candidates                  │
│     └────┘  └────┘  └────┘                                  │
│                                                             │
│  2. Election timeout triggers (randomized)                  │
│     ┌────┐  ┌────┐  ┌────┐                                  │
│     │ C1 │──┼──┼──►│ C2 │  C1 times out first               │
│     │CAND│  │  │   │    │  Requests votes                   │
│     └────┘  │  │   └────┘                                   │
│             │  ▼                                            │
│             │ ┌────┐                                        │
│             └►│ C3 │                                        │
│               └────┘                                        │
│                                                             │
│  3. Votes granted (majority needed)                         │
│     ┌────┐  ┌────┐  ┌────┐                                  │
│     │ C1 │◄─┤VOTE├──│ C2 │  C1 gets 2 votes                 │
│     │    │  └────┘  │    │  (self + C2)                     │
│     │    │◄─┤VOTE├──│    │                                  │
│     └────┘  └────┘  └────┘                                  │
│       ▲               │                                     │
│       └───────────────┘                                     │
│                                                             │
│  4. Leader established                                      │
│     ┌────┐  ┌────┐  ┌────┐                                  │
│     │ C1 │  │ C2 │  │ C3 │                                  │
│     │LEAD│──►FLWR│  │FLWR│  C1 is leader                    │
│     └────┘  └────┘  └────┘  Sends heartbeats                │
│                                                             │
└─────────────────────────────────────────────────────────────┘

LOG REPLICATION:

┌─────────────────────────────────────────────────────────────┐
│                                                             │
│  Leader (C1)              Followers (C2, C3)                │
│  ┌──────────────┐         ┌──────────────┐                  │
│  │ Log:         │         │ Log:         │                  │
│  │ [1] TopicA   │ ──────► │ [1] TopicA   │                  │
│  │ [2] Partition│ Append  │ [2] Partition│                  │
│  │ [3] Config   │ Entries │ [3] Config   │                  │
│  │ [4] Leader   │ ──────► │ [4] Leader   │                  │
│  └──────────────┘         └──────────────┘                  │
│                                                             │
│  Commit: Entry committed when majority acknowledges         │
│  [1] ✓ (3/3)  [2] ✓ (3/3)  [3] ✓ (2/3)  [4] ○ (1/3)         │
│                                                             │
└─────────────────────────────────────────────────────────────┘

4. KRaft Configuration

Controller-Only Nodes

PROPERTIES(21 lines)
Code
Loading syntax highlighter...

Broker-Only Nodes

PROPERTIES(20 lines)
Code
Loading syntax highlighter...

Combined Mode (Development)

PROPERTIES(16 lines)
Code
Loading syntax highlighter...

5. Spring Kafka with KRaft

JAVA(43 lines)
Code
Loading syntax highlighter...

6. Admin Operations in KRaft Mode

JAVA(86 lines)
Code
Loading syntax highlighter...

7. Migration Path: ZooKeeper to KRaft

MIGRATION PHASES:

┌─────────────────────────────────────────────────────────────┐
│ Phase 1: PREPARATION                                        │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  • Upgrade to Kafka 3.5+ (KRaft production-ready)           │
│  • Ensure inter.broker.protocol.version = 3.5+              │
│  • Audit custom tooling for ZK dependencies                 │
│  • Plan controller node placement                           │
│                                                             │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│ Phase 2: DEPLOY CONTROLLERS                                 │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌─────────┐    ┌─────────┐    ┌─────────┐                  │
│  │   ZK1   │    │   ZK2   │    │   ZK3   │  (Still active)  │
│  └─────────┘    └─────────┘    └─────────┘                  │
│                                                             │
│  ┌─────────┐    ┌─────────┐    ┌─────────┐                  │
│  │   C1    │    │   C2    │    │   C3    │  (New KRaft      │
│  │(standby)│    │(standby)│    │(standby)│   controllers)   │
│  └─────────┘    └─────────┘    └─────────┘                  │
│                                                             │
│  ┌─────────┐    ┌─────────┐    ┌─────────┐                  │
│  │ Broker1 │    │ Broker2 │    │ Broker3 │  (Using ZK)      │
│  └─────────┘    └─────────┘    └─────────┘                  │
│                                                             │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│ Phase 3: MIGRATION MODE                                     │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Run: kafka-metadata.sh snapshot --from zk --to kraft       │
│                                                             │
│  ┌─────────┐         ┌─────────────────────┐                │
│  │   ZK    │ ──────► │  __cluster_metadata │                │
│  │ znodes  │  Copy   │       (KRaft)       │                │
│  └─────────┘         └─────────────────────┘                │
│                                                             │
│  Metadata migrated, both systems active temporarily         │
│                                                             │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│ Phase 4: DUAL-WRITE                                         │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Brokers write to both ZK and KRaft controllers             │
│                                                             │
│             ┌─────────┐                                     │
│             │ Broker  │                                     │
│             └────┬────┘                                     │
│                  │                                          │
│         ┌───────┴───────┐                                   │
│         ▼               ▼                                   │
│    ┌─────────┐    ┌─────────┐                               │
│    │   ZK    │    │  KRaft  │                               │
│    │         │    │         │                               │
│    └─────────┘    └─────────┘                               │
│                                                             │
│  Validate: Both have consistent state                       │
│                                                             │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│ Phase 5: KRAFT ONLY                                         │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Run: kafka-metadata.sh finalize                            │
│                                                             │
│  ┌─────────┐    ┌─────────┐    ┌─────────┐                  │
│  │   ZK1   │    │   ZK2   │    │   ZK3   │  (Shutdown)      │
│  │  STOP   │    │  STOP   │    │  STOP   │                  │
│  └─────────┘    └─────────┘    └─────────┘                  │
│                                                             │
│  ┌─────────┐    ┌─────────┐    ┌─────────┐                  │
│  │   C1    │    │   C2    │    │   C3    │  (Active)        │
│  │ ACTIVE  │    │ FOLLWR  │    │ FOLLWR  │                  │
│  └─────────┘    └─────────┘    └─────────┘                  │
│                                                             │
│  ZooKeeper decommissioned!                                  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Migration Commands

BASH(31 lines)
Code
Loading syntax highlighter...

8. Monitoring KRaft Controllers

JAVA(42 lines)
Code
Loading syntax highlighter...

YAML(30 lines)
Code
Loading syntax highlighter...

Common Mistakes

Mistake 1: Running Insufficient Controllers

PROPERTIES(12 lines)
Code
Loading syntax highlighter...

Mistake 2: Same node.id Across Nodes

PROPERTIES(15 lines)
Code
Loading syntax highlighter...

Mistake 3: Mixing ZK and KRaft Configurations

PROPERTIES(14 lines)
Code
Loading syntax highlighter...

Mistake 4: Not Formatting Storage Before First Start

BASH(11 lines)
Code
Loading syntax highlighter...

Mistake 5: Different Cluster IDs Across Nodes

BASH(16 lines)
Code
Loading syntax highlighter...

Debug This

Scenario: Controller Not Becoming Active

Symptoms:

All controllers show "FOLLOWER" state
No active controller in cluster
Brokers cannot register
Admin operations timeout

Investigation:

BASH(21 lines)
Code
Loading syntax highlighter...

JAVA(35 lines)
Code
Loading syntax highlighter...

Resolution Steps:

Verify network connectivity between all controllers
Ensure all controllers have the same cluster.id
Check that controller.quorum.voters is identical on all nodes
Verify node.id matches the ID in controller.quorum.voters
Check for port conflicts on controller listener port
Review controller logs for specific error messages

Exercises

Exercise 1: Local KRaft Cluster

Set up a 3-controller, 3-broker KRaft cluster using Docker Compose:

YAML(56 lines)
Code
Loading syntax highlighter...

Task: Start the cluster, verify all controllers are in the quorum, and create a topic.

Exercise 2: Controller Failover Test

With the cluster from Exercise 1:

Identify the active controller
Stop the active controller container
Observe failover in logs
Verify new leader is elected
Restart the stopped controller
Verify it rejoins as follower

Exercise 3: Quorum Monitoring

Write a Spring Boot application that:

Connects to the KRaft cluster
Periodically checks quorum status
Alerts when:
- No active controller
- A voter is lagging
- Less than 3 voters available

Exercise 4: Metadata Inspection

Using the kafka-metadata.sh tool:

Dump the current metadata log
Identify different record types
Find the record for a specific topic
Analyze metadata for partition assignments

BASH(4 lines)
Code
Loading syntax highlighter...

Exercise 5: Migration Planning

Given a ZooKeeper-based cluster with:

5 brokers
3 ZooKeeper nodes
500 topics, 10,000 partitions

Create a detailed migration plan including:

Hardware requirements for KRaft controllers
Migration timeline with rollback points
Validation steps at each phase
Monitoring during migration

Interview Questions

Q1: Why is Kafka moving from ZooKeeper to KRaft?

A: Kafka is moving to KRaft for several compelling reasons:

Operational Simplicity:

One distributed system instead of two
Single security model, monitoring stack, deployment process
Fewer moving parts = fewer failure modes

Scalability:

ZooKeeper becomes a bottleneck around 200K partitions (all metadata in memory)
KRaft can handle millions of partitions
Metadata changes propagate faster (push vs poll)

Faster Recovery:

ZK-based controller failover takes seconds (session timeout)
KRaft failover takes milliseconds (Raft heartbeat)
Brokers recover faster because metadata is pushed, not pulled

Consistency:

ZK mode had inconsistency windows during metadata propagation
KRaft provides stronger consistency guarantees
Single source of truth in __cluster_metadata topic

Modern Architecture:

Built-in consensus protocol designed for Kafka's needs
Event-sourced metadata (can replay log to recover)
Better support for metadata snapshots and compaction

Q2: How does controller election work in KRaft?

A: KRaft uses Raft consensus for controller election:

Election Trigger:

Leader heartbeat timeout (followers don't hear from leader)
Initial cluster startup (no leader exists)

Election Process:

Follower increments its term and transitions to candidate
Candidate votes for itself and requests votes from other voters
Each voter grants vote to first candidate in new term (first-come-first-served)
Candidate becomes leader when it receives majority of votes
New leader starts sending heartbeats to maintain leadership

Key Properties:

Randomized election timeout: Prevents split votes (candidates start elections at different times)
Term numbers: Prevent stale leaders from causing confusion
Majority requirement: Ensures only one leader per term
Persistent vote: Voters remember who they voted for (survives restarts)

Failover Characteristics:

Typical election time: 100-500ms
Requires majority of voters (2/3, 3/5, etc.)
No split-brain because only one candidate can get majority

Q3: What happens to clients during a controller failover in KRaft?

A: The impact on clients is minimal in KRaft mode:

Producers:

Continue producing normally (producers talk to brokers, not controllers)
May see brief retry if producing to partition that needs leader update
Typically transparent (retries happen automatically)

Consumers:

Continue consuming normally (consumers talk to brokers, not controllers)
May see brief pause if fetching from partition needing leader update
Offset commits unaffected (goes to __consumer_offsets on brokers)

Admin Operations:

Topic creation/deletion temporarily blocked during failover
Config changes temporarily blocked
Resume automatically once new controller is active

Why Minimal Impact:

Clients only interact with brokers, never directly with controllers
Brokers cache metadata locally (serve clients from cache)
Controller failover is fast (milliseconds)
Brokers automatically refresh metadata from new controller

Q4: What's the `__cluster_metadata` topic and how is it different from regular topics?

A: __cluster_metadata is a special internal topic that stores all cluster metadata in KRaft mode:

Structure:

Single partition (partition 0)
Replicated across all controller nodes (not regular brokers)
Uses Raft consensus for replication (not standard Kafka replication)
Not accessible via normal producer/consumer APIs

Contents:

Broker registrations and fencing
Topic and partition metadata
Configuration changes
ACLs and quotas
Producer ID allocations
Feature flags

How It Differs from Regular Topics:

Aspect	Regular Topics	`__cluster_metadata`
Replication	ISR-based	Raft consensus
Producers	Any client	Only active controller
Consumers	Any client	Controllers only
Storage	Broker data dirs	Controller metadata dirs
Compaction	Optional	Always (implicit)
Access	Public API	Internal only

Event Sourcing:

All changes are appended as records
State can be reconstructed by replaying log
Periodic snapshots for faster recovery
Similar to event sourcing pattern in applications

Q5: How do you choose between combined mode and separate controller/broker roles?

A: The choice depends on cluster size and operational requirements:

Combined Mode (process.roles=broker,controller):

Best for:

Development and testing environments
Small clusters (3-5 nodes)
Resource-constrained deployments
Simpler operations

Drawbacks:

Controller and broker compete for resources
GC pauses on broker affect controller
Harder to scale controllers independently

Separate Roles: