Algorithms

B-Tree

๐Ÿ“‹ Quick Reference

PropertyValue
TypeSelf-balancing multi-way search tree
SearchO(log n)
InsertO(log n)
DeleteO(log n)
Disk ReadsO(log_B n) where B = branching factor
Best ForDatabase indexes, file systems, disk-based storage
One-liner: Self-balancing tree optimized for systems that read/write large blocks of data, like databases and file systems.

๐ŸŽฎ Interactive Visualizer

Watch how B-Tree maintains balance through splits and merges:

Loading visualizer...
Try these operations:
  1. Insert elements and watch node splits
  2. Observe how tree grows from root
  3. Search for elements (see path traversal)
  4. Notice all leaves at same depth

๐Ÿ”ง How It Works

B-Tree Properties (Order m)

1. Every node has at most m children
2. Every non-leaf node (except root) has at least โŒˆm/2โŒ‰ children
3. Root has at least 2 children (if not a leaf)
4. All leaves appear at the same level
5. A node with k children contains k-1 keys

Example B-Tree of order 3 (2-3 tree):
           [17]
          /    \
      [8]        [25, 32]
     /   \       /   |   \
   [3,5] [12]  [20] [27] [35,40]

Node Structure

JAVA(7 lines)
Code
Loading syntax highlighter...

Search Operation

Search for key 20 in tree:
1. Start at root [17]
2. 20 > 17 โ†’ go right child [25, 32]
3. 20 < 25 โ†’ go left child [20]
4. Found!

Time: O(log n) comparisons ร— O(m) per node = O(m log n)
With m constant: O(log n)

Insert with Split

Insert 15 into node [10, 12, 14] (order 4, max 3 keys):

Before split:     [10, 12, 14, 15]  โ† overflow!
                        โ†“
After split:          [12]          โ† middle key goes up
                     /    \
                [10]      [14, 15]

If parent overflows โ†’ split propagates up (may create new root)

๐Ÿ“Š Complexity Analysis

OperationTimeDisk I/O
SearchO(log n)O(log_B n)
InsertO(log n)O(log_B n)
DeleteO(log n)O(log_B n)
Range queryO(log n + k)O(log_B n + k/B)

Why B-Trees for Databases?

Binary tree depth for 1M records: logโ‚‚(1,000,000) โ‰ˆ 20 levels
B-tree depth (order 100):         logโ‚โ‚€โ‚€(1,000,000) โ‰ˆ 3 levels

Each level = 1 disk read
B-tree: 3 disk reads vs BST: 20 disk reads

๐Ÿ”„ B-Tree vs B+ Tree

FeatureB-TreeB+ Tree
Data locationAll nodesLeaves only
Leaf linkingNoLinked list
Range queriesSlowerFaster (follow links)
Space usageKeys duplicateBetter utilization
Used inGeneralMost databases (MySQL, PostgreSQL)

B+ Tree Structure

B+ Tree (keys in internal nodes are guides only):

        [30]                    โ† internal nodes have keys only
       /    \
    [10,20]   [40,50]           โ† internal nodes
    / | \      / | \
[5,8][12,15][25,28][35,38][45,48][55,60]  โ† leaves have all data
  โ†”    โ†”     โ†”     โ†”     โ†”               โ† linked for range scan

โœ… When to Use B-Tree

Good Use Cases

  • Database indexes - optimized for disk access patterns
  • File systems - NTFS, HFS+, ext4 use B-trees
  • Key-value stores - efficient ordered storage
  • Range queries - find all keys between A and B
  • Sorted data access - in-order traversal

Avoid When

  • All data fits in memory - BST or hash table may be faster
  • Only point queries - hash table is O(1)
  • Frequent updates, rare reads - LSM tree may be better
  • Small datasets - overhead not worth it

๐Ÿงฉ Real-World Implementations

Database Indexes (SQL)

SQL(10 lines)
Code
Loading syntax highlighter...

Java TreeMap (Red-Black Tree, not B-Tree)

JAVA(11 lines)
Code
Loading syntax highlighter...

File System (conceptual)

Directory structure as B-tree:
        [etc, home, var]
       /       |       \
[apt,dpkg] [user1,user2] [log,tmp]
    |           |           |
 [...]       [docs,...]   [...]

Each node = disk block
Minimize disk reads to find file

โš ๏ธ Common Pitfalls

1. Confusing B-Tree Order Definitions

JAVA(7 lines)
Code
Loading syntax highlighter...

2. Wrong Data Structure Choice

JAVA(8 lines)
Code
Loading syntax highlighter...

3. Ignoring Node Size Optimization

JAVA(6 lines)
Code
Loading syntax highlighter...

4. Not Considering B+ Tree for Databases

JAVA(8 lines)
Code
Loading syntax highlighter...

๐ŸŽค Interview Tips

Q: Why are B-trees used in databases instead of binary search trees?
"

B-trees are optimized for disk I/O. They have high branching factor (many children per node), which minimizes tree height and thus disk reads. A B-tree with order 100 might have depth 3 for millions of records, while a BST would need depth 20+.

Q: What's the difference between B-tree and B+ tree?
"

In B-tree, data is stored in all nodes. In B+ tree, data is only in leaf nodes, and leaves are linked together. B+ trees are better for range queries and are used by most databases (MySQL InnoDB, PostgreSQL).

Q: How does B-tree maintain balance?
"

B-trees grow from the root, not leaves. When a node overflows during insert, it splits and pushes the middle key up. If the root splits, a new root is created. This ensures all leaves stay at the same level.

Q: What is the minimum fill factor of a B-tree node?
"

Non-root nodes must be at least half full. For order m B-tree, each node has at least โŒˆm/2โŒ‰ children (except root). This ensures good space utilization and bounds tree height.

Q: How do B-tree deletions work?
"

If removing a key causes underflow (< minimum keys), the node borrows from a sibling or merges with a sibling. Merging may propagate up the tree, potentially reducing tree height.


๐Ÿ“š Series Navigation


Visualizer: BTreeVisualizer from @tomaszjarosz/react-visualizers