Comparison-Based Sorting
Sorting is the most studied problem in computer science, and for good reason - it appears everywhere, from database indexing to rendering graphics. This article takes you deep into the four most important comparison-based sorting algorithms: QuickSort, MergeSort, HeapSort, and TimSort (Java's default).
Understanding these algorithms at the implementation level transforms how you think about data processing and gives you the tools to make informed decisions in production systems.
๐ At a Glance
| Algorithm | Time (Avg) | Time (Worst) | Space | Stable | In-Place |
|---|---|---|---|---|---|
| QuickSort | O(n log n) | O(nยฒ) | O(log n) | No | Yes |
| MergeSort | O(n log n) | O(n log n) | O(n) | Yes | No |
| HeapSort | O(n log n) | O(n log n) | O(1) | No | Yes |
| TimSort | O(n log n) | O(n log n) | O(n) | Yes | No |
๐ฏ What You'll Learn
After reading this article, you will be able to:
- Implement QuickSort with multiple pivot strategies and 3-way partitioning
- Understand MergeSort's stability and why it matters for complex objects
- Build HeapSort from scratch using the heapify operation
- Explain TimSort's hybrid approach and why Java chose it
- Prove the ฮฉ(n log n) lower bound for comparison-based sorting
- Choose the right algorithm based on data characteristics
๐ฅ Production Story: The Sorted Data Disaster
An e-commerce platform had a product listing service that sorted items by price. Performance was excellent in testing with random data. In production, disaster struck.
JAVA(6 lines)CodeLoading syntax highlighter...
The product database was already mostly sorted by price (products added in price tiers). When users filtered by category, results came back nearly sorted.
TEXT(2 lines)CodeLoading syntax highlighter...
Arrays.sort() for objects used MergeSort. But for primitives, it used QuickSort with a fixed pivot strategy. A developer had converted Product prices to a primitive array for "optimization":JAVA(11 lines)CodeLoading syntax highlighter...
With first-element pivot on sorted data, QuickSort degrades to O(nยฒ).
JAVA(6 lines)CodeLoading syntax highlighter...
๐ง Mental Model: The Sorting Spectrum
Think of sorting algorithms on a spectrum of trade-offs:
TEXT(33 lines)CodeLoading syntax highlighter...
- Sorting objects by multiple fields
- Preserving previous sort order
- User expectations (stable sorts feel "less random")
๐ฎ Try It Yourself
Compare all sorting algorithms interactively. Use the dropdown to switch between algorithms and observe how each one approaches the problem differently:
Loading visualizer...
๐ฌ Deep Dive
1. QuickSort: The Speed Champion
QuickSort is often the fastest in practice due to excellent cache locality and low constant factors.
Loading visualizer...
JAVA(61 lines)CodeLoading syntax highlighter...
When there are many duplicates, standard QuickSort degrades. 3-way partitioning handles this:
JAVA(23 lines)CodeLoading syntax highlighter...
- Cache locality: Works on contiguous memory
- In-place: No extra array allocation
- Few swaps: Only swaps when necessary
- Branch prediction: Comparison loop is predictable
2. MergeSort: The Stable Workhorse
MergeSort guarantees O(n log n) and stability, making it ideal for objects.
Loading visualizer...
JAVA(70 lines)CodeLoading syntax highlighter...
JAVA(11 lines)CodeLoading syntax highlighter...
3. HeapSort: The Memory Champion
HeapSort achieves O(n log n) worst case with O(1) extra space.
JAVA(46 lines)CodeLoading syntax highlighter...
TEXT(14 lines)CodeLoading syntax highlighter...
- Poor cache locality (jumps around the array)
- More comparisons than QuickSort
- Not stable
- But: guaranteed O(n log n) and O(1) space
4. TimSort: Java's Default
TimSort is a hybrid algorithm combining MergeSort and InsertionSort, optimized for real-world data.
JAVA(31 lines)CodeLoading syntax highlighter...
- Exploits existing order: Finds natural runs in data
- Adaptive: Nearly-sorted data is O(n)
- Stable: Critical for object sorting
- Galloping mode: Optimizes unbalanced merges
JAVA(8 lines)CodeLoading syntax highlighter...
5. The Comparison Lower Bound: ฮฉ(n log n)
Why can't we sort faster than O(n log n) using comparisons?
TEXT(17 lines)CodeLoading syntax highlighter...
- No comparison-based sort can beat O(n log n)
- QuickSort, MergeSort, HeapSort are asymptotically optimal
- To go faster, we need non-comparison sorts (Part 5)
โ ๏ธ Common Mistakes
Mistake 1: QuickSort with Bad Pivot on Sorted Data
JAVA(18 lines)CodeLoading syntax highlighter...
Mistake 2: Unstable Sort for Objects
JAVA(14 lines)CodeLoading syntax highlighter...
Mistake 3: Not Using Insertion Sort for Small Arrays
JAVA(14 lines)CodeLoading syntax highlighter...
๐ Debug This
The following QuickSort has a subtle bug. Find it:
JAVA(39 lines)CodeLoading syntax highlighter...
Click to reveal the bug and solution
quickSort(arr, low, pivotIndex) should be quickSort(arr, low, pivotIndex - 1).pivotIndex included in the left partition, when pivot ends up at high, we recursively call:quickSort(arr, low, high)- the same range!
TEXT(5 lines)CodeLoading syntax highlighter...
JAVA(8 lines)CodeLoading syntax highlighter...
The pivot is already in its final position after partition, so we exclude it from both recursive calls.
๐ป Exercises
Exercise 1: Implement Median-of-Three Pivot โญ
Implement median-of-three pivot selection for QuickSort:
JAVA(5 lines)CodeLoading syntax highlighter...
Solution
JAVA(19 lines)CodeLoading syntax highlighter...
Median-of-three avoids worst case on sorted data while being cheaper than random.
Exercise 2: Count Comparisons โญโญ
Modify MergeSort to count the number of comparisons:
JAVA(3 lines)CodeLoading syntax highlighter...
Solution
JAVA(39 lines)CodeLoading syntax highlighter...
Exercise 3: In-Place MergeSort โญโญโญ
Implement MergeSort using O(1) extra space (in-place merge):
JAVA(4 lines)CodeLoading syntax highlighter...
Solution
JAVA(37 lines)CodeLoading syntax highlighter...
Exercise 4: Iterative QuickSort โญโญโญ
Implement QuickSort without recursion (using explicit stack):
JAVA(3 lines)CodeLoading syntax highlighter...
Solution
JAVA(38 lines)CodeLoading syntax highlighter...
By always processing the smaller partition first and pushing the larger, we guarantee O(log n) stack space.
Exercise 5: Detect Natural Runs (TimSort Style) โญโญโญโญ
Implement run detection like TimSort:
JAVA(6 lines)CodeLoading syntax highlighter...
Solution
JAVA(41 lines)CodeLoading syntax highlighter...
๐ค Interview Questions
Question #1: When would you choose HeapSort over QuickSort?
-
Strict memory constraints: HeapSort uses O(1) extra space vs O(log n) for QuickSort's stack
-
Worst-case guarantee required: HeapSort is always O(n log n), QuickSort can degrade to O(nยฒ)
-
Embedded systems: Predictable performance matters more than average speed
-
Partial sorting: To find top-k elements, build heap in O(n), extract k in O(k log n)
However, QuickSort is usually preferred because:
- Better cache locality (sequential access pattern)
- Lower constant factors
- In practice, O(nยฒ) is avoidable with good pivot selection
Question #2: Explain why MergeSort is stable but QuickSort is not.
JAVA(3 lines)CodeLoading syntax highlighter...
Equal elements from the left half (which came earlier in the original array) are placed first.
JAVA(2 lines)CodeLoading syntax highlighter...
Elements equal to the pivot can end up in either partition depending on their position.
Question #3: Why does Java use different sorting algorithms for primitives vs objects?
- Primitives: QuickSort (faster, no stability needed)
- Objects: MergeSort (stable, needed for Comparable contract)
- Primitives: Dual-Pivot QuickSort (even faster)
- Objects: TimSort (stable, adaptive)
-
Stability requirement: Objects sorted by Comparator should maintain relative order for equals. MergeSort/TimSort guarantee this.
-
Comparison cost: Object comparison (calling compareTo()) is expensive. TimSort minimizes comparisons through runs.
-
Primitive comparison is cheap: QuickSort's fewer swaps and better cache performance win for primitives.
-
Real-world data patterns: TimSort exploits partially sorted data common in objects (timestamps, IDs).
Question #4: How would you sort a nearly-sorted array efficiently?
-
Insertion Sort: O(n) for nearly sorted because elements are close to their final position
-
TimSort: Detects natural runs, nearly-sorted input is O(n)
-
Adaptive ShellSort: Exploits partial order
- Insertion sort moves elements short distances efficiently
- TimSort identifies long runs and minimizes merge work
- Already-sorted subsequences are preserved
- HeapSort: Always O(n log n), doesn't exploit order
- Standard QuickSort: Sorted data can trigger O(nยฒ)
JAVA(6 lines)CodeLoading syntax highlighter...
Question #5: Prove that any comparison-based sort requires ฮฉ(n log n) comparisons.
-
Sorting n distinct elements means determining which of n! permutations is correct
-
Each comparison answers a yes/no question, eliminating at most half of remaining permutations
-
A comparison-based sort is a decision tree where:
- Internal nodes are comparisons
- Leaves are permutations (sorted outputs)
- Tree must have at least n! leaves
-
A binary tree with L leaves has height at least logโ(L)
-
Therefore: height โฅ logโ(n!)
-
Using Stirling's approximation:
logโ(n!) โ logโ((n/e)^n ร โ(2ฯn)) = n logโ(n) - n logโ(e) + O(log n) โ n logโ(n) -
Conclusion: Any comparison sort needs at least ฮฉ(n log n) comparisons
This is a lower bound - no comparison-based algorithm can beat it. Non-comparison sorts (counting, radix) bypass this by not using comparisons.
๐ Quick Reference
Algorithm Selection Guide
TEXT(22 lines)CodeLoading syntax highlighter...
Complexity Summary
| Sort | Best | Average | Worst | Space | Stable |
|---|---|---|---|---|---|
| QuickSort | O(n log n) | O(n log n) | O(nยฒ) | O(log n) | No |
| MergeSort | O(n log n) | O(n log n) | O(n log n) | O(n) | Yes |
| HeapSort | O(n log n) | O(n log n) | O(n log n) | O(1) | No |
| TimSort | O(n) | O(n log n) | O(n log n) | O(n) | Yes |
| Insertion | O(n) | O(nยฒ) | O(nยฒ) | O(1) | Yes |
Java Sorting Methods
JAVA(13 lines)CodeLoading syntax highlighter...
๐ Summary & Key Takeaways
Core Principles
- QuickSort is fastest on average but has O(nยฒ) worst case - use random pivot
- MergeSort guarantees O(n log n) and stability - essential for objects
- HeapSort uses O(1) space but has poor cache locality
- TimSort adapts to data patterns - Java's smart default choice
- ฮฉ(n log n) is the lower bound for comparison-based sorting
What You Can Do Now
- Implement all four algorithms from scratch
- Choose the right algorithm based on requirements
- Explain why Java uses different sorts for primitives vs objects
- Prove the comparison sort lower bound
- Optimize QuickSort with pivot strategies and 3-way partitioning
๐ Review Schedule
| Day | Task | Time |
|---|---|---|
| Day 1 | Review complexity table | 5 min |
| Day 3 | Implement QuickSort with random pivot | 15 min |
| Day 7 | Implement HeapSort from memory | 15 min |
| Day 14 | Redo Debug This exercise | 10 min |
| Day 30 | Explain all four algorithms aloud | 15 min |