Bloom Filter

📋 Quick Reference

Property	Value
Type	Probabilistic set membership
Add	O(k) where k = number of hash functions
Query	O(k) - may return false positives
Delete	Not supported (standard Bloom filter)
Space	Much smaller than storing actual elements
Best For	Fast membership checks with acceptable false positives

One-liner: Space-efficient probabilistic data structure that can tell you "definitely not in set" or "possibly in set".

🎮 Interactive Visualizer

Watch how Bloom Filter uses multiple hash functions:

Bloom Filter
Space: O(m)Probabilistic
1 / 41
Initialize Bloom Filter with 16 bits and 3 hash functions. All bits start as 0.
Used by:Chrome (malicious URLs), Medium (read articles), Cassandra/HBase (disk reads)
Trade-off:Space-efficient but allows false positives
🎲 Probabilistic Data Structure
False Positives: May say "probably yes" when element was never added
No False Negatives: If it says "no", element is DEFINITELY not in set
No Deletion: Cannot remove elements (would cause false negatives)
Bit Array (16 bits, 3 hash functions)Set: 0 / 16
00
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
[0]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
Current operation will appear here...
Hash Functions (k=3)
hash_1(x) = (x * 31) % 16
hash_2(x) = (djb2 hash) % 16
hash_3(x) = (x * 17 * pos) % 16
Added Elements
apple, banana, cherry
Checking
apple, grape, banana, mango
Pseudocode
1class BloomFilter:
2  def __init__(size, k):
3    self.bits = [0] * size
4    self.k = k  # hash functions
5 
6  def add(element):
7    for i in range(k):
8      index = hash_i(element) % size
9      bits[index] = 1
10 
11  def contains(element):
12    for i in range(k):
13      index = hash_i(element) % size
14      if bits[index] == 0:
15        return False  # definitely not
16    return True  # probably yes
Variables
size
=16
k
=3
Keyboard Shortcuts
P Play / Pause
[ Step back
] Step forward
R Reset
Speed
Bit = 0
Bit = 0
Bit = 1
Bit = 1
Currently checking
Currently checking
Match (probably in set)
Match (probably in set)
Miss (definitely not in set)
Miss (definitely not in set)
P · [ ] · R
Loading visualizer...

Try these operations:

Add elements and watch multiple bits get set
Query existing elements (always found)
Query non-existent elements (possible false positives)
Observe how more elements increase false positive rate

🔧 How It Works

Core Concept

Bloom Filter = bit array + k hash functions

Adding "hello":
  h1("hello") = 3   → set bit[3] = 1
  h2("hello") = 7   → set bit[7] = 1
  h3("hello") = 12  → set bit[12] = 1

Querying "hello":
  Check bit[3] AND bit[7] AND bit[12]
  All 1? → "Possibly in set"
  Any 0? → "Definitely not in set"

Key Properties

JAVA(7 lines)
Code
Loading syntax highlighter...

Optimal Parameters

Optimal number of hash functions:
k = (m/n) * ln(2) ≈ 0.693 * (m/n)

False positive probability:
p ≈ (1 - e^(-kn/m))^k

For 1% false positive rate with n elements:
m ≈ 9.6n bits (about 1.2 bytes per element)
k ≈ 7 hash functions

📊 Complexity Analysis

Operation	Time	Space
`add(element)`	O(k)	-
`contains(element)`	O(k)	-
Space (total)	-	O(m) bits

Space Comparison

Structure	Space for 1M elements
HashSet	~48 MB (avg 48 bytes/entry)
Bloom Filter (1% FP)	~1.2 MB
Bloom Filter (0.1% FP)	~1.8 MB

✅ When to Use Bloom Filter

Good Use Cases

Cache filtering - avoid expensive lookups for non-existent items
Spell checkers - quick "probably misspelled" check
Network routers - packet filtering
Database queries - skip disk reads for absent keys
Web crawlers - avoid revisiting URLs
Distributed systems - reduce unnecessary network calls

Avoid When

Need exact membership - can't distinguish "definitely" from "maybe"
Need deletion - standard Bloom filter doesn't support delete
Very low false positive tolerance - might need more space than HashSet
Need to enumerate elements - Bloom filter only answers "is X in set?"

🔄 Bloom Filter Variants

Variant	Feature	Trade-off
Standard	Simple, fast	No delete, false positives
Counting	Supports delete	4x more space (counters vs bits)
Scalable	Grows dynamically	Multiple filters, slightly slower
Cuckoo Filter	Delete support, better space	More complex

Counting Bloom Filter

JAVA(6 lines)
Code
Loading syntax highlighter...

🧩 Implementation Patterns

Guava BloomFilter (Java)

JAVA(18 lines)
Code
Loading syntax highlighter...

Redis Bloom Filter

BASH(10 lines)
Code
Loading syntax highlighter...

Cache Pattern

JAVA(26 lines)
Code
Loading syntax highlighter...

⚠️ Common Pitfalls

1. Ignoring False Positive Rate

JAVA(5 lines)
Code
Loading syntax highlighter...

2. Underestimating Element Count

JAVA(5 lines)
Code
Loading syntax highlighter...

3. Treating "Maybe" as "Yes"

JAVA(11 lines)
Code
Loading syntax highlighter...

4. Expecting Deletion Support

JAVA(4 lines)
Code
Loading syntax highlighter...

🎤 Interview Tips

Q: What is a Bloom filter and what's it used for?

"

A probabilistic data structure for set membership testing. It can tell you "definitely not in set" or "possibly in set". Used for caching, spell-checking, and avoiding expensive lookups when an element doesn't exist.

Q: What's the difference between false positive and false negative?

"

False positive: filter says "maybe yes" but element was never added. False negative: filter says "no" but element WAS added. Bloom filters guarantee no false negatives - if it says "no", the element is definitely not in the set.

Q: Why use multiple hash functions?

"

Multiple hash functions reduce false positive rate. With k hash functions, an element sets k bits. For a false positive, ALL k bits must have been set by other elements, which is less likely with good hash distribution.

Q: How do you choose optimal parameters?

"

Given expected elements (n) and desired false positive rate (p): bits m ≈ -n*ln(p)/(ln(2)²), hash functions k ≈ (m/n)*ln(2). For 1% FP rate, use about 10 bits per element and 7 hash functions.

Previous: Part 19: SQL Joins

Next: Part 21: B-Tree

Visualizer: BloomFilterVisualizer from @tomaszjarosz/react-visualizers

📋 Quick Reference

🎮 Interactive Visualizer

Bloom Filter

🔧 How It Works

Core Concept

Key Properties

Optimal Parameters

📊 Complexity Analysis

Space Comparison

✅ When to Use Bloom Filter

Good Use Cases

Avoid When

🔄 Bloom Filter Variants

Counting Bloom Filter

🧩 Implementation Patterns

Guava BloomFilter (Java)

Redis Bloom Filter

Cache Pattern

⚠️ Common Pitfalls

1. Ignoring False Positive Rate

2. Underestimating Element Count

3. Treating "Maybe" as "Yes"

4. Expecting Deletion Support

🎤 Interview Tips

📚 Series Navigation

Tags:

Bloom Filter