Devops

Exactly-Once Semantics


At a Glance

AspectDetails
TopicAt-least/at-most/exactly-once, transactions, isolation levels
ComplexityAdvanced
PrerequisitesParts 8-10 (Consumer sections)
Time90 minutes
Spring KafkaTransactional listeners, ChainedTransactionManager

What You'll Learn

After completing this article, you will be able to:

  1. Distinguish between at-most-once, at-least-once, and exactly-once semantics
  2. Implement consume-transform-produce patterns with transactions
  3. Configure isolation levels for reading committed data only
  4. Build end-to-end exactly-once pipelines with Spring Kafka
  5. Understand when exactly-once is necessary vs overkill

Production Story: The Double-Charged Customers

The Incident

Our payment processing system had a subtle but devastating bug. Customers were being charged twice for single orders. The pattern was random - maybe 0.1% of transactions - but with 100,000 daily transactions, that meant 100 double charges per day. Customer trust was eroding fast.

The Investigation

JAVA(16 lines)
Code
Loading syntax highlighter...

The timeline of a double-charge:

┌─────────────────────────────────────────────────────────────────────┐
│                    THE DOUBLE-CHARGE SCENARIO                       │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  payment-requests topic     Payment Processor     payment-results   │
│                                                                     │
│  T=0:  Read payment request                                         │
│        ┌──────────────┐                                             │
│        │ Order-123    │ ──────────────────►  Process payment        │
│        │ $100         │                                             │
│        └──────────────┘                                             │
│                                                                     │
│  T=1:  Charge customer's card                                       │
│        paymentGateway.charge() → SUCCESS ($100 charged)             │
│                                                                     │
│  T=2:  Send result to payment-results                               │
│        ┌──────────────┐                                             │
│        │ Order-123    │                      ┌──────────────┐       │
│        │ SUCCESS      │ ────────────────────►│ Order-123    │       │
│        └──────────────┘                      │ SUCCESS      │       │
│                                              └──────────────┘       │
│                                                                     │
│  T=3:  NETWORK GLITCH! Producer send times out                      │
│        (but message actually reached broker)                        │
│                                                                     │
│  T=4:  Exception thrown, no acknowledgment                          │
│        Consumer offset NOT committed                                │
│                                                                     │
│  T=5:  Consumer restarts from last committed offset                 │
│        ┌──────────────┐                                             │
│        │ Order-123    │ ──────────────────►  Process payment        │
│        │ $100         │     AGAIN!                                  │
│        └──────────────┘                                             │
│                                                                     │
│  T=6:  paymentGateway.charge() → SUCCESS ($100 charged AGAIN!)      │
│                                                                     │
│  RESULT: Customer charged $200 for $100 order                       │
│          Two SUCCESS messages in payment-results                    │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

The Root Cause

The consume-transform-produce pattern has THREE operations that must be atomic:

  1. Process the input (charge customer)
  2. Write the output (payment result)
  3. Commit the input offset

Without transactions, these can fail independently, causing duplicates.

The Fix

JAVA(75 lines)
Code
Loading syntax highlighter...

After the fix: Zero double charges.


Mental Model: Delivery Semantics

┌─────────────────────────────────────────────────────────────────────────┐
│                    DELIVERY SEMANTICS COMPARISON                        │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  AT-MOST-ONCE (Fire and Forget)                                         │
│  ───────────────────────────────                                        │
│                                                                         │
│  commit() → process()                                                   │
│                                                                         │
│  Producer: acks=0 (no acknowledgment)                                   │
│  Consumer: Commit BEFORE processing                                     │
│                                                                         │
│  Messages:  [A] [B] [C] [D] [E]                                         │
│  Delivered:  A   B   -   D   E    (C lost during failure)               │
│                                                                         │
│  ✓ No duplicates ever                                                   │
│  ✗ May lose messages                                                    │
│  Use case: Metrics, logs where some loss acceptable                     │
│                                                                         │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  AT-LEAST-ONCE (Standard)                                               │
│  ─────────────────────────                                              │
│                                                                         │
│  process() → commit()                                                   │
│                                                                         │
│  Producer: acks=all, retries=MAX                                        │
│  Consumer: Commit AFTER processing                                      │
│                                                                         │
│  Messages:  [A] [B] [C] [D] [E]                                         │
│  Delivered:  A   B   C   C   D   E  (C delivered twice on retry)        │
│                                                                         │
│  ✗ May have duplicates                                                  │
│  ✓ Never loses messages                                                 │
│  Use case: Most applications (with idempotent processing)               │
│                                                                         │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  EXACTLY-ONCE                                                           │
│  ────────────                                                           │
│                                                                         │
│  Transaction { process() + commit() }                                   │
│                                                                         │
│  Producer: transactional.id, enable.idempotence=true                    │
│  Consumer: isolation.level=read_committed                               │
│                                                                         │
│  Messages:  [A] [B] [C] [D] [E]                                         │
│  Delivered:  A   B   C   D   E    (Each exactly once)                   │
│                                                                         │
│  ✓ No duplicates                                                        │
│  ✓ No message loss                                                      │
│  Use case: Financial, critical data pipelines                           │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Transaction Flow

EXACTLY-ONCE TRANSACTION FLOW:

┌─────────────────────────────────────────────────────────────────────┐
│  Consumer                    Kafka                     Producer     │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  1. Poll messages                                                   │
│  ┌──────────┐   fetch    ┌─────────────┐                            │
│  │ Consumer │ ◄───────── │ input-topic │                            │
│  └──────────┘            └─────────────┘                            │
│       │                                                             │
│       │ records                                                     │
│       ▼                                                             │
│  2. Begin transaction                                               │
│  ┌──────────┐                                    ┌──────────┐       │
│  │ Consumer │                                    │ Producer │       │
│  │          │                                    │ beginTx()│       │
│  └──────────┘                                    └──────────┘       │
│       │                                               │             │
│       │ process                                       │             │
│       ▼                                               │             │
│  3. Process and produce output                        │             │
│  ┌──────────┐                                    ┌──────────┐       │
│  │ Process  │                                    │ send()   │       │
│  │ message  │ ─────────────────────────────────► │ (in tx)  │       │
│  └──────────┘                                    └────┬─────┘       │
│                                                       │             │
│                              ┌──────────────┐         │             │
│                              │ output-topic │ ◄───────┘             │
│                              │ (uncommitted)│                       │
│                              └──────────────┘                       │
│                                                                     │
│  4. Send offsets to transaction                                     │
│  ┌──────────┐                                    ┌──────────┐       │
│  │ Consumer │                                    │ sendOff- │       │
│  │ offsets  │ ─────────────────────────────────► │ sets     │       │
│  └──────────┘                                    │ ToTx()   │       │
│                                                  └────┬─────┘       │
│                                                       │             │
│                              ┌────────────────┐       │             │
│                              │__consumer_     │ ◄─────┘             │
│                              │offsets         │                     │
│                              │(uncommitted)   │                     │
│                              └────────────────┘                     │
│                                                                     │
│  5. Commit transaction (atomic)                                     │
│                                                  ┌──────────┐       │
│                                                  │ commit() │       │
│                                                  └────┬─────┘       │
│                                                       │             │
│       ┌───────────────────────────────────────────────┘             │
│       │                                                             │
│       ▼ Atomic commit of:                                           │
│  ┌──────────────┐  ┌────────────────┐                               │
│  │ output-topic │  │__consumer_     │                               │
│  │  COMMITTED   │  │offsets         │                               │
│  └──────────────┘  │ COMMITTED      │                               │
│                    └────────────────┘                               │
│                                                                     │
│  All visible to downstream consumers simultaneously                 │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Deep Dive

1. Understanding Isolation Levels

JAVA(28 lines)
Code
Loading syntax highlighter...

Isolation Level Behavior

ISOLATION LEVEL IMPACT:

Partition state:
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│ 0  │ 1  │ 2  │ 3  │ 4  │ 5  │ 6  │ 7  │ 8  │ 9  │
│ C  │ C  │ T1 │ T1 │ C  │ T2 │ T2 │ C  │ C  │ N  │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

C = Committed (non-transactional or committed tx)
T1 = Transaction 1 (uncommitted)
T2 = Transaction 2 (uncommitted)
N = Non-transactional

Last Stable Offset (LSO) = 2 (first uncommitted tx)
High Water Mark (HWM) = 10


read_uncommitted consumer sees:
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│ 0  │ 1  │ 2  │ 3  │ 4  │ 5  │ 6  │ 7  │ 8  │ 9  │
│ ✓  │ ✓  │ ✓  │ ✓  │ ✓  │ ✓  │ ✓  │ ✓  │ ✓  │ ✓  │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘
Sees everything including uncommitted transactions
Problem: If T1 aborts, consumer already processed those messages!


read_committed consumer sees:
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│ 0  │ 1  │ 2  │ 3  │ 4  │ 5  │ 6  │ 7  │ 8  │ 9  │
│ ✓  │ ✓  │ -  │ -  │ -  │ -  │ -  │ -  │ -  │ -  │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘
Stops at LSO, waits for T1 to commit or abort
After T1 commits: can read 2,3, but still blocked by T2 at 5

Key point: read_committed may have higher latency
(waits for transactions to complete)

2. Consumer-Transform-Producer Pattern

JAVA(82 lines)
Code
Loading syntax highlighter...

3. External Side Effects

JAVA(44 lines)
Code
Loading syntax highlighter...

4. Chained Transaction Managers

JAVA(42 lines)
Code
Loading syntax highlighter...

Chained Transaction Caveats

CHAINED TRANSACTION BEHAVIOR:

Start:
  1. Begin DB transaction
  2. Begin Kafka transaction

Commit:
  1. Commit Kafka transaction FIRST
  2. Commit DB transaction

Rollback:
  1. Rollback DB transaction
  2. Rollback Kafka transaction

PROBLEM SCENARIO:

1. Begin DB tx + Kafka tx
2. Do DB work
3. Do Kafka work
4. Commit Kafka tx - SUCCESS
5. Commit DB tx - FAILS (constraint violation)
6. Rollback DB tx
7. Kafka already committed! ← INCONSISTENT

This is "pseudo-transactional" - not true two-phase commit
For true consistency, use:
- Outbox pattern (reliable)
- Saga pattern (eventual consistency)
- Accept occasional inconsistency with reconciliation

5. Performance Considerations

JAVA(45 lines)
Code
Loading syntax highlighter...

6. When NOT to Use Exactly-Once

JAVA(37 lines)
Code
Loading syntax highlighter...

Decision Framework

DO YOU NEED EXACTLY-ONCE?

                              ┌──────────────────┐
                              │ Is data critical │
                              │ (financial,      │
                              │ regulatory)?     │
                              └────────┬─────────┘
                                      │
                    ┌─────────────────┴─────────────────┐
                    │ YES                               │ NO
                    ▼                                   ▼
        ┌──────────────────────┐            ┌──────────────────────┐
        │ Can processing be    │            │ Can duplicates be    │
        │ made idempotent?     │            │ tolerated?           │
        └──────────┬───────────┘            └──────────┬───────────┘
                   │                                   │
         ┌─────────┴─────────┐               ┌─────────┴─────────┐
         │ YES               │ NO            │ YES               │ NO
         ▼                   ▼               ▼                   ▼
    ┌──────────┐      ┌──────────┐     ┌──────────┐      ┌──────────┐
    │At-least- │      │Exactly-  │     │At-least- │      │Exactly-  │
    │once +    │      │once      │     │once      │      │once      │
    │idempotent│      │          │     │(simple)  │      │          │
    └──────────┘      └──────────┘     └──────────┘      └──────────┘

Most applications: At-least-once + idempotent processing
Kafka Streams: Exactly-once built-in
Financial/critical: Consider exactly-once

Common Mistakes

Mistake 1: Mixing Isolation Levels

JAVA(14 lines)
Code
Loading syntax highlighter...

Mistake 2: Non-Unique Transaction IDs in Cluster

JAVA(13 lines)
Code
Loading syntax highlighter...

Mistake 3: External Side Effects in Transaction

JAVA(19 lines)
Code
Loading syntax highlighter...

Mistake 4: Long Transaction Duration

JAVA(27 lines)
Code
Loading syntax highlighter...

Mistake 5: Assuming Exactly-Once Means No Duplicates Ever

JAVA(17 lines)
Code
Loading syntax highlighter...

Debug This

Scenario: Transactions Timing Out

Symptoms:
  • TransactionAbortedException in logs
  • Messages not appearing in output topic
  • Consumer offset not advancing
Investigation:
BASH(8 lines)
Code
Loading syntax highlighter...
JAVA(15 lines)
Code
Loading syntax highlighter...
Common Causes:
  1. Processing too slow: Operations exceed transaction.timeout.ms
  2. Producer fenced: Another producer with same transactional.id
  3. Broker unavailable: Transaction coordinator can't be reached
  4. Memory pressure: GC pauses during transaction
Resolution:
JAVA(16 lines)
Code
Loading syntax highlighter...

Exercises

Exercise 1: Implement Word Count with Exactly-Once

Create a Kafka Streams-style word count:

  1. Read sentences from input topic
  2. Split into words
  3. Aggregate counts
  4. Write to output topic
  5. Use exactly-once semantics

Exercise 2: Payment Processor with Idempotency

Build a payment processor that:

  1. Consumes payment requests
  2. Calls external payment API (mock)
  3. Produces payment results
  4. Handles retries without double-charging
  5. Uses exactly-once for Kafka side

Exercise 3: Transaction Monitoring Dashboard

Create monitoring for transactional producers:

  1. Track transaction durations
  2. Count commits vs aborts
  3. Alert on approaching timeout
  4. Visualize transaction state

Exercise 4: Compare Delivery Semantics

Build three versions of same processor:

  1. At-most-once (commit before process)
  2. At-least-once (commit after process)
  3. Exactly-once (transactional)
  4. Inject failures and compare behavior

Exercise 5: Outbox Pattern Implementation

Implement the outbox pattern:

  1. Write to DB and outbox table atomically
  2. Separate process reads outbox
  3. Publishes to Kafka
  4. Marks outbox entries as published
  5. Compare with direct transactional approach

Interview Questions

Q1: Explain the three delivery semantics in Kafka.

A: The three delivery semantics differ in their guarantees:
At-most-once:
  • Messages may be lost but never duplicated
  • Commit offset before processing
  • If processing fails after commit, message lost
  • Use case: Logs, metrics where some loss acceptable
At-least-once:
  • Messages never lost but may be duplicated
  • Commit offset after processing
  • If crash after processing but before commit, reprocess
  • Use case: Most applications (with idempotent handling)
Exactly-once:
  • Each message processed exactly once
  • Transactional produce + offset commit atomic
  • Requires: transactional.id, read_committed isolation
  • Use case: Financial, critical data pipelines

Most applications use at-least-once with idempotent processing because it's simpler and performs better than exactly-once.

Q2: How does Kafka achieve exactly-once semantics?

A: Kafka achieves exactly-once through several mechanisms:
1. Idempotent Producer:
  • Assigns Producer ID (PID) and sequence numbers
  • Broker deduplicates retried messages
  • Prevents duplicates from producer retries
2. Transactions:
  • Groups multiple operations atomically
  • Producer begins transaction, sends messages, commits/aborts
  • All messages become visible at commit, or none at abort
3. Consumer Offset in Transaction:
  • Consumer offset sent as part of producer transaction
  • sendOffsetsToTransaction() includes offset commit
  • Atomic: messages + offsets committed together
4. Isolation Level:
  • read_committed consumers only see committed transactions
  • Uncommitted/aborted transaction messages invisible
  • Prevents processing messages that will be rolled back
Together: Consume → Process → Produce → Commit Offset all succeed or all fail.

Q3: What's the performance impact of exactly-once?

A: Exactly-once has measurable overhead:
Transaction overhead:
  • ~5-10ms per transaction for begin + commit
  • Involves transaction coordinator communication
  • Two-phase commit protocol
Mitigation through batching:
1 message/tx:    5ms overhead/msg
100 messages/tx: 0.05ms overhead/msg
1000 messages/tx: 0.005ms overhead/msg
read_committed latency:
  • Consumer blocked until transactions commit
  • Long-running transactions delay consumption
  • May see higher end-to-end latency
Throughput impact:
  • Non-transactional: ~1M messages/sec possible
  • Transactional: ~100K-500K messages/sec typical
  • Depends heavily on batch size and tx duration
When impact is acceptable:
  • Financial/compliance data
  • Low-to-medium throughput
  • Strong consistency requirement

Q4: How do you handle external side effects with exactly-once?

A: External systems (databases, APIs) can't be part of Kafka transactions. Strategies:
1. Idempotent external calls:
JAVA(3 lines)
Code
Loading syntax highlighter...
2. Outbox pattern:
JAVA(6 lines)
Code
Loading syntax highlighter...
3. Saga pattern (eventual consistency):
JAVA(6 lines)
Code
Loading syntax highlighter...
4. Accept inconsistency + reconciliation:
  • Process at-least-once
  • Run periodic reconciliation job
  • Compare Kafka and external system state

Q5: When should you NOT use exactly-once semantics?

A: Exactly-once isn't always necessary or beneficial:
Don't use when:
  1. Processing is naturally idempotent:
    • Cache updates (put(key, value))
    • Status updates (setStatus(COMPLETED))
    • Aggregations with unique keys
  2. Data is non-critical:
    • Metrics and monitoring
    • Analytics events
    • Log aggregation
  3. High throughput required:
    • "

      500K messages/sec

    • Sub-millisecond latency requirements
  4. External reconciliation exists:
    • Daily batch reconciliation
    • Source of truth elsewhere
    • Eventual consistency acceptable
  5. Duplicates handled downstream:
    • Database unique constraints
    • Deduplication service
    • Consumer-side filtering
Rule of thumb: Use at-least-once + idempotent processing by default. Add exactly-once only when duplicates are truly unacceptable.

Summary

Key Takeaways

  1. Three semantics: at-most-once (may lose), at-least-once (may duplicate), exactly-once (neither)
  2. Exactly-once requires: transactional producer + read_committed consumer + offset in transaction
  3. Consumer-transform-produce is the classic exactly-once pattern
  4. External side effects can't be in Kafka transaction - use idempotency or outbox pattern
  5. Transaction overhead is significant - batch messages to amortize
  6. Isolation level must be read_committed for consumers in exactly-once pipelines
  7. Most applications should use at-least-once + idempotent processing
  8. Chained transactions are not true two-phase commit - understand the limitations

Quick Reference

Exactly-Once Configuration

PROPERTIES(8 lines)
Code
Loading syntax highlighter...

Spring Kafka Exactly-Once

JAVA(9 lines)
Code
Loading syntax highlighter...

Delivery Semantics Summary

SemanticProducer ConfigConsumer ConfigGuarantee
At-most-onceacks=0Commit before processMay lose
At-least-onceacks=all, retriesCommit after processMay duplicate
Exactly-oncetransactional.idread_committedNeither

Series Navigation

PreviousCurrentNext
Part 10: Offset ManagementPart 11: Exactly-OncePart 12: Schema Registry

Series Overview

  • Part 0: How to Use This Series
  • Parts 1-4: Fundamentals
  • Parts 5-7: Producers
  • Parts 8-11: Consumers (Internals, Groups, Offset Management, Exactly-Once)
  • Parts 12-14: Operations
  • Parts 15-17: Kafka Streams
  • Parts 18-20: Patterns & Practices
  • Part 21: Cheatsheet & Decision Guide