HashSet Internals
This article explores how HashSet enforces uniqueness, why iteration order is unpredictable, and the common pitfalls that cause "duplicates" to appear in what should be unique collections.
📋 At a Glance
| Aspect | Details |
|---|---|
| Internal Implementation | Backed by HashMap<E, Object> |
| Uniqueness Mechanism | Uses HashMap's key uniqueness |
| Time Complexity | O(1) average for add/remove/contains |
| Iteration Order | Undefined (hash-based, not predictable) |
| Null Elements | Allows one null element |
| Thread Safety | Not thread-safe |
🎯 What You'll Learn
After reading this article, you will be able to:
- Explain the HashMap backing: How HashSet uses HashMap internally
- Understand uniqueness: Why equals() and hashCode() are critical
- Perform set operations: Union, intersection, difference efficiently
- Avoid common traps: Mutable elements, incorrect equals/hashCode
- Choose the right Set: HashSet vs TreeSet vs LinkedHashSet
🔥 Production Story: The Duplicate Detection Failure
An e-commerce platform needed to prevent duplicate orders from being processed. The solution seemed simple:
JAVA(14 lines)CodeLoading syntax highlighter...
JAVA(23 lines)CodeLoading syntax highlighter...
JAVA(9 lines)CodeLoading syntax highlighter...
JAVA(17 lines)CodeLoading syntax highlighter...
🧠 Mental Model: HashSet = HashMap with Dummy Values
The simplest way to understand HashSet is to see what it actually is:
TEXT(31 lines)CodeLoading syntax highlighter...
🔬 Deep Dive
1. The Internal Structure
JAVA(28 lines)CodeLoading syntax highlighter...
2. Core Operations
Every HashSet operation is a HashMap operation:
JAVA(39 lines)CodeLoading syntax highlighter...
3. Why PRESENT and Not Null?
PRESENT object instead of null?JAVA(10 lines)CodeLoading syntax highlighter...
JAVA(7 lines)CodeLoading syntax highlighter...
4. Iteration Order: Why It's Unpredictable
HashSet iteration follows HashMap's bucket order:
JAVA(12 lines)CodeLoading syntax highlighter...
- Hash codes of elements
- Current capacity of the map
- Hash function's bit mixing
TEXT(12 lines)CodeLoading syntax highlighter...
- Insertion order: Use
LinkedHashSet - Sorted order: Use
TreeSet
5. Set Algebra Operations
HashSet supports standard set operations through Collection methods:
JAVA(25 lines)CodeLoading syntax highlighter...
6. Initial Capacity Planning
Like HashMap, HashSet benefits from proper initial capacity:
JAVA(10 lines)CodeLoading syntax highlighter...
capacity = (int)(expectedSize / loadFactor) + 1With default load factor 0.75:
- 1,000 elements → capacity ~1,334
- 10,000 elements → capacity ~13,334
- 1,000,000 elements → capacity ~1,333,334
⚠️ Common Mistakes
Mistake 1: Mutable Elements in HashSet
JAVA(12 lines)CodeLoading syntax highlighter...
JAVA(16 lines)CodeLoading syntax highlighter...
Mistake 2: Incorrect equals/hashCode Leading to "Duplicates"
JAVA(19 lines)CodeLoading syntax highlighter...
JAVA(22 lines)CodeLoading syntax highlighter...
Mistake 3: Expecting Iteration Order
JAVA(8 lines)CodeLoading syntax highlighter...
JAVA(15 lines)CodeLoading syntax highlighter...
Mistake 4: Inefficient Duplicate Detection
JAVA(10 lines)CodeLoading syntax highlighter...
JAVA(11 lines)CodeLoading syntax highlighter...
Mistake 5: Using HashSet with Non-Hashable Types
JAVA(6 lines)CodeLoading syntax highlighter...
JAVA(19 lines)CodeLoading syntax highlighter...
🐛 Debug This: The Mystery Duplicates
A developer reports: "I'm using HashSet but still getting duplicates!"
JAVA(31 lines)CodeLoading syntax highlighter...
equals() but not hashCode(). The toString() method is irrelevant for Set behavior.hashCode():- Each
CacheKeyusesObject.hashCode()(memory address) - Two equal CacheKeys have different hash codes
- They land in different buckets
- HashSet never calls
equals()because they're in different buckets
JAVA(36 lines)CodeLoading syntax highlighter...
JAVA(3 lines)CodeLoading syntax highlighter...
💻 Exercises
Exercise 1: Efficient Deduplication
⭐⭐ Difficulty: Medium | ⏱️ Time: 10 minutes
JAVA(10 lines)CodeLoading syntax highlighter...
JAVA(36 lines)CodeLoading syntax highlighter...
Exercise 2: Find Duplicates Between Lists
⭐⭐ Difficulty: Medium | ⏱️ Time: 15 minutes
JAVA(11 lines)CodeLoading syntax highlighter...
JAVA(46 lines)CodeLoading syntax highlighter...
Exercise 3: Set Algebra Operations
⭐⭐⭐ Difficulty: Medium | ⏱️ Time: 15 minutes
JAVA(9 lines)CodeLoading syntax highlighter...
JAVA(78 lines)CodeLoading syntax highlighter...
Exercise 4: Custom Hash Distribution Analyzer
⭐⭐⭐ Difficulty: Medium-Hard | ⏱️ Time: 20 minutes
JAVA(9 lines)CodeLoading syntax highlighter...
JAVA(83 lines)CodeLoading syntax highlighter...
Exercise 5: EnumSet Comparison
⭐⭐ Difficulty: Medium | ⏱️ Time: 10 minutes
JAVA(11 lines)CodeLoading syntax highlighter...
JAVA(62 lines)CodeLoading syntax highlighter...
📝 Summary & Key Takeaways
HashSet Internals
TEXT(12 lines)CodeLoading syntax highlighter...
Performance Characteristics
| Operation | Average | Worst Case | Notes |
|---|---|---|---|
| add(E) | O(1) | O(n) | Worst with many collisions |
| remove(E) | O(1) | O(n) | Worst with many collisions |
| contains(E) | O(1) | O(n) | Worst with many collisions |
| size() | O(1) | O(1) | Just returns field |
| isEmpty() | O(1) | O(1) | Checks size == 0 |
| iterator.next() | O(1)* | O(n) | *Amortized across full iteration |
When to Use Which Set
| Use Case | Best Set | Why |
|---|---|---|
| Default, no ordering | HashSet | O(1) operations |
| Maintain insertion order | LinkedHashSet | O(1) + predictable iteration |
| Sorted iteration | TreeSet | O(log n) but sorted |
| Enum values | EnumSet | Bit vector, extremely fast |
| Thread-safe | ConcurrentHashMap.newKeySet() | Concurrent access |
| Immutable | Set.of() | No modification possible |
Critical Rules
- Always override both equals() AND hashCode() for elements
- Use immutable elements or never modify after adding
- Pre-size for large sets to avoid resize overhead
- Don't rely on iteration order - use LinkedHashSet if needed
- Use EnumSet for enum values - much more efficient
🎤 Senior-Level Interview Questions
Question #1: How does HashSet ensure uniqueness?
add(element), it actually calls map.put(element, PRESENT). HashMap ensures key uniqueness - if the key already exists, put() returns the old value (PRESENT), and HashSet's add() returns false. If the key is new, put() returns null, and add() returns true. The uniqueness check uses hashCode() to find the bucket, then equals() to check for duplicates in that bucket.Question #2: What is the time complexity of add/remove/contains in HashSet?
Question #3: Why doesn't HashSet maintain insertion order?
Question #4: What happens if you modify an object after adding it to a HashSet?
Question #5: What's the difference between HashSet and TreeSet?
Question #6: When should you use EnumSet instead of HashSet?
EnumSet.allOf(), complementOf(), and range(). The only reason to use HashSet🏁 Conclusion
HashSet is deceptively simple - it's just a HashMap wrapper. But understanding this relationship unlocks deeper knowledge:
- Every HashSet operation is a HashMap operation
- Uniqueness depends on equals() AND hashCode()
- Mutable elements are dangerous
- Iteration order is unpredictable
- HashSet = HashMap with dummy values - understand one, understand both
- Override both equals() AND hashCode() - missing either breaks uniqueness
- Use immutable elements - mutation causes "lost" objects
- Pre-size large sets - inherits HashMap's resize behavior
- Choose the right Set - LinkedHashSet for order, TreeSet for sorting, EnumSet for enums
📅 Review Schedule for This Article
| Day | Task | Time |
|---|---|---|
| Day 1 | Review HashSet = HashMap relationship | 5 min |
| Day 3 | Redo Exercise 2 (Find Duplicates) | 15 min |
| Day 7 | Answer interview questions without looking | 10 min |
| Day 14 | Redo Debug This exercise | 10 min |
| Day 30 | Review Set comparison table | 5 min |