B Trees

Motivation for B-Trees

Index structures for large datasets cannot fit in main memory.
Disk and SSD storage require a different approach to efficiency.
Modern SSDs have random access times of approximately 100--200 microseconds.
One SSD access takes more time than 500,000--2,000,000 CPU instructions (on modern multi-GHz processors).
One SSD access is approximately 1,000--10,000 times slower than reading from main memory (50--100 ns).
Suppose an AVL tree is used to store 20 million
records on an SSD.
This very deep binary tree requires lots of SSD accesses; accesses
microseconds per access milliseconds.
We know we can't improve on the lower bound on the search for a binary tree.
The solution is to use more branches and, thus, reduce the tree height!
- As branching increases, depth decreases, so fewer expensive storage accesses are needed.

Realistic Worst-Case Scan

You need to scan all records in a 1-billion row table.
Assume each random access is microseconds.

AVL/BST lookup per key: random reads (height ).
AVL I/O time: days
(or minutes at parallelism).
B-tree (order ) height: ; therefore, reads/key.
B-tree I/O time: days
(or minutes at parallelism).

Takeaway: Even with parallelism, reducing tree height from 30 to 5 saves I/O work. This is why B-Trees dominate for large datasets.

Web Search Scale

Live system: queries/sec, each needs one key lookup in B pages.

AVL tree depth: .
SSD accesses/query = ms/query at /access.
B-tree height (order ): .
SSD accesses/query = ms/query.
10k q/s: AVL = seconds of SSD work per second (queueing backlog).
B-tree = seconds of SSD work per second.

AVL-style access causes user-visible stalls under load, while a B-tree design maintains low-latency response.

Example B-Tree, Order-5 with 26 Elements

Definition of a B-tree

A B-tree of order is an -way tree (i.e., a tree where each node may have up to children) in which:
1. All leaves are on the same level.
2. The number of children of a non-leaf node is always one more than the number of keys.
3. All keys of a node are sorted in increasing order. These keys partition the child keys as a search tree.
4. The root is either a leaf node or has two to children.
5. Other non-leaf nodes have at least children.
6. A node contains no more than keys.
The number should always be odd.

Insert

Constructing a B-Tree

Starting with an empty Order-5 B-tree, let’s insert the
following sequence of keys:
1, 12, 8, 2, 25, 6, 14, 28, 17, 7, 52, 16, 48, 68, 3, 26, 29, 55, 45.

The first four items (1, 12, 8, 2) go into the root:
1 / 1
Putting the fifth item (25) in the root would violate condition 5.
Therefore, when 25 arrives, pick the middle key to make a new root.
1 / 1
6, 14, and 28 get added to the leaf nodes.
Adding 17 to the right leaf node would over-fill it.
1 / 1
So, the middle key is promoted (to the root), splitting the leaves.
We can now add 7, 52, 16, 48 to the leaf nodes.
Adding 68 to the right leaf node would over-fill it.
So we split the rightmost leaf, promoting 48 to the root.
Adding 3 to the left leaf node would over-fill it
Adding 3 causes us to split the leftmost leaf, promoting 3 to the root.
26, 29, 55 then go into the leaves.
Adding 45 causes a split of the 4th leaf.
Promote 28 to the root, which causes the root to split.
Promote 28 to the root; doesn’t fit.
Promote the 17 as the new root.
Attempt to insert the new key into a leaf.
If this would result in that leaf becoming too big, split the leaf into two, promoting the middle key to the leaf’s parent.
If this would result in the parent becoming too big, split the parent into two, promoting the middle key.
This strategy may need to repeat to the root.
If necessary, the root is split in two and the middle key is promoted to a new root, making the tree one level higher.

Exercise in Inserting a B-Tree

Insert the following keys to a 5-way B-tree:
3, 7, 9, 23, 45, 1, 5, 14, 25, 24, 13, 11, 8, 19, 4, 31, 35, 56, 2, 6, 12.

Deletion from a B-Tree

There are four removal cases to consider in order.

Key is in a leaf: Delete the key.
Key is not in a leaf: Its predecessor and successor must be in leaves. Delete the key and promote either the predecessor or successor (implementor's choice) to replace it in the non-leaf position.

After performing Case 1 or 2, a leaf node may now have too few keys (underflow). We will fix this underflow with Case 3 or 4.

When removing a key would cause underflow (too few keys), handle it as follows:

Sibling has spare keys — Rotate:
- Pull a key from a sibling up to the parent.
- Demote the parent key down into the lacking leaf.
Neither sibling has spare keys — Merge:
- Combine the lacking node, sibling, and parent key into one node.
- If the parent now underflows, recursively apply Cases 3 or 4 to it.

Type 1: Simple Removal from Leaf

Assuming a 5-way B-Tree (as before), delete 2.

Because there are enough keys in the node, just delete it.

Type 2: Simple Removal from Non-Leaf

Assuming a 5-way B-Tree (as before), delete 52.

Replace with the closest value from a leaf with enough values.

Type 4: Too few keys in node and its siblings

Delete 72

Type 3: Enough siblings

Delete 22

1 / 1

Exercise in Removal a B-Tree

Given a 5-way B-tree created by these data (last exercise):
3, 7, 9, 23, 45, 1, 5, 14, 25, 24, 13, 11, 8, 19, 4, 31, 35, 56, 2, 6, 12
Delete these keys: 4, 5, 7, 3, 14, 45

Analysis

Maximum Keys in an Order- B-Tree

Max keys per level:

Level	Key Count
root

So, the maximum number of items is:

When and , this gives .

Big- of B-Trees

Operation	Balanced Binary Tree	-Way B Tree
Insert
Find
Delete

The benefits are:

Reduced hard-drive access (but the same number of instructions) and
Reduced memory usage (fewer pointers).

Reasons for Using B-Trees

When searching tables held on a hard drive, the cost of each drive transfer is high but doesn’t depend much on the amount of data transferred, especially if consecutive items are transferred.

For example, an order 101 B-tree can transfer each node in one drive read operation.
An order 101 B-tree and height 3 can hold items (approximately 100 million) and any item can be accessed with 3 drive reads (assuming we hold the root in memory).

If we take , we get a 2–3 Tree, in which non-leaf nodes have two or three children (i.e., one or two keys).

B-Trees are always balanced (since the leaves are all at the same level), so 2-3 Trees make a good type of balanced tree.

Comparing Trees

Binary trees
- Can become unbalanced and lose their good time complexity (big ).
- AVL Trees are strict binary trees that overcome the balance problem.
- Heaps remain balanced but are unsorted.
Multi-way trees
- B-Trees are -way, with any (odd) number of children.
- The 3-Way B-Tree (a.k.a 2–3 Tree) approximates a self-balancing binary tree, exchanging the AVL tree’s balancing operations for insertion and (more complex) deletion operations.

What does the “B” in B-Tree stand for?

No one knows.
The inventors of the B-Tree will not say what it means.
It could mean: Boeing (where it was invented), balanced, between, broad, bushy, and Bayer (the inventors are Bayer and McCreight).
“The more you think about what the ‘B’ in B-trees means, the better you understand B-trees.”
— Edward M. McCreight (2013)

Variations

Introduction to B*-Trees

B*-Trees are a B-Tree variant that increases node occupancy for better space utilization.

B*-Trees require non-root nodes to be at least full (vs. full in B-Trees).
Insertions may redistribute keys among siblings before splitting, reducing splits overall.
This typically yields denser nodes and fewer I/O operations in large storage systems.
B*-Trees preserve B-Tree search costs while improving storage efficiency.

Introduction to B+Trees

B+Trees are a variant of B-Trees that further optimize for disk-based storage and range queries.

All data (or pointers to data) are stored only in leaf nodes.
Internal nodes store only keys (no records); therefore, they can hold more keys.
Leaf nodes are linked sequentially, enabling fast full-range scans and ordered iteration.
Higher fan-out and fewer disk reads per access make B+Trees popular in databases and file systems.
Insertion and deletion keep leaves at the same depth, preserving strong balance and performance.

Advanced B-Tree Variations

Beyond B+Trees and B*-Trees, researchers and systems developers have created specialized variants:

Bε-Trees: Write-optimized variants with buffered internal nodes to batch insertions, useful for heavy write workloads in key-value stores.
B-Link Trees: Concurrency-friendly variants with sibling pointers and latch-free search paths, common in database index implementations.
Cache-aware/Cache-oblivious B-Trees: Variants optimized for modern memory hierarchies (L1/L2/L3 caches) rather than just disk I/O.

Lab 20: B-Trees

Let’s look at the lab based on today’s lecture.

B Trees ​

Motivation for B-Trees ​

Realistic Worst-Case Scan ​

Web Search Scale ​

Definition of a B-tree ​

Insert ​

Constructing a B-Tree ​

Exercise in Inserting a B-Tree ​

Deletion from a B-Tree ​

Type 1: Simple Removal from Leaf ​

Type 2: Simple Removal from Non-Leaf ​

Type 4: Too few keys in node and its siblings ​

Type 3: Enough siblings ​

Exercise in Removal a B-Tree ​

Analysis ​

Maximum Keys in an Order- B-Tree ​

Big- of B-Trees ​

Reasons for Using B-Trees ​

Comparing Trees ​

What does the “B” in B-Tree stand for? ​

Variations ​

Introduction to B*-Trees ​

Introduction to B+Trees ​

Advanced B-Tree Variations ​

Lab 20: B-Trees ​