Priority Queues using Binary Heaps
Suppose you want to write a task scheduler with the following properties: each task has a priority, tasks can be added in any order, and tasks should be removed according to their priority. A simple solution is to use an array and re-sort the elements every time a task is added. However, sorting is a lot of work, and it's—perhaps surprisingly—more work than needs to be done in this case. As we'll see, we still require some order, just not the total order of a sorted array. What we need is a binary heap.
Binary Heaps
“Heap” is unfortunately a rather ambiguous term, and doubly so in computing. For our purposes, a heap is nothing more than a binary tree with a special property. Specifically, a binary tree is a heap if every parent node is greater than or equal to its children.
Here's a heap:
And another:
Notice that in both cases, the greatest element (9) appears at the root. This is always the case: the greatest element in a heap can't have a parent, because each parent must be greater than or equal to its children. So it must be at the root. Furthermore, each subtree in the examples above has the heap property. This is also a necessary property of heaps.
Here are some trees that don't have the heap property:
In the first tree, 6 is a parent of 9, and 2 is a parent of 5. These are two violations of the heap property. What about the second tree, in which every parent is greater than or equal to its children? This tree fails to satisfy an additional property that heaps must have that we haven't yet mentioned. For reasons which will soon be made clear, we want heaps to be nice and compact, so in addition to the “heap property” outlined above, we also require that every generation be “filled”, except for possibly the last. Furthermore, if the last generation is not filled, we require that the nodes in it be “packed” to the left.
Adding and Removing Elements
At this point, heaps appear to be little more than binary trees with some rather arbitrary qualities. However, recall that the largest element in a heap is always at the root. To return to our motivating example (a scheduler), if we have a collection of tasks arranged in a heap, then the highest priority task is always located at the root. To return the highest priority task then, we simple remove and return the root.
But now we no longer have a heap. We'd like to combine the leftovers into a new heap, so that the next time we want to remove the highest priority task, we can simply pluck off the root as we did here.
This is where the (perhaps arbitrary-seeming) heap property shines. We first move the bottom-most, left-most element into the root position.
Now we've reconnected our tree, but the tree is not a heap. To restore the heap property, we first compare the new root element with its two children.
If it's smaller than either (and it's smaller than both in this case), we swap it with its larger child (6).
We then continue, with our focus now on the subtree into which we swapped the root. We compare the subtree's root (2) with its children (1 and 1).
But since the root is greater than or equal to both children, we stop. We've successfully restored the heap property.
This process of repeatedly comparing the tree's root with its children and swapping if necessary is called “sifting down”, and it converts any “compact” (in the sense described above) binary tree into a heap.
How does it work? After removing the root, we are left with two disconnected subtrees, but these subtrees are themselves both heaps. We then replaced the root with the bottom-most, left-most element (the only reason we choose this element is because it's the only one we can remove without destroying the tree's “compactness”). If this new root is greater than or equal to both its children, the new tree is a heap (since the root is greater than its children, which are greater than their children, and so on). If it's smaller than one of its children, we swap it with the greater of its two children. After this swap, the greatest element in the tree now sits at the root, and every element in the “unswapped” subtree is smaller than the new root element. Now only the “swapped” subtree needs to be re-heaped. We stop when either the root is greater than or equal to its children, or we reach the bottom of the tree.
In the worst case, how many iterations does this require? At most
log(n)
, where n
is the number of nodes in
the tree. This is (potentially) significantly faster than
re-sorting all of the elements, which might take
n * log(n)
steps.
How about adding new elements to a heap? We'll again need to re-heap the tree, but this time starting with the added element. This process is called “sifting up” and ought to have a familiar feel to it.
We add new elements in the bottom-most, left-most position (starting a new generation if necessary).
We then compare the new element with its parent.
If it's greater (as it is here), we swap the two; otherwise, we stop.
We then repeat this process.
This time, however, the element is not greater than its parent, so we're done. The heap property has been restored.
Sifting up works for more-or-less the same reason that sifting-down does. In particular, it relies heavily on the fact that subtrees of heaps are themselves heaps. There's also a subtle point to be made here. Notice how the first two example heaps contain the exact same elements; they both have the heap property, and yet they're different trees. Unlike sorted arrays, which are uniquely determined by their elements, heaps permit a looser, more fluid organization. In a sense, the heap property requires just the right amount of order. As a result, re-heaping is quick.
A Compact Representation
Now that we understand the essential operations on heaps, we're ready to think about representation. “Compact” binary trees, like the ones we're concerned with here, permit a particular efficient implicit representation. While a more general recursive representation will do, the one we'll show here is far superior in terms of both memory usage, cache-locality, and more.
Specifically, a “compact” binary tree—in which every generation except for possibly the last is completely filled—can be represented using nothing more than an array with one element for each item in the tree. To encode a tree, its elements are stored in a breadth-first order: first the root, then its two children, then the nodes in the next generation, etc.
Here's the first example heap from above, along with its implicit array representation:
Using this represention, the left child of the node stored at index
i
is located at index 2i + 1
, and the right
child at 2i + 2
. The parent of the node at index
i
is located at index
⌊(i - 1) / 2⌋
, where
⌊n⌋
is the greatest integer less than or
equal to n
(the “floor” function).
Consider the node whose value is 6. It's stored at index
1
, so its children can be found at indexes
2 * 1 + 1 = 3
and 2 * 1 + 2 = 4
. Likewise,
the node whose value is 2 is stored at index 5
, and its
parent—5—can be found at index
⌊(5 - 1) / 2⌋ = 2
.
An Implementation in TypeScript
We'll now work through an implementation in TypeScript (although any
language with mutable arrays will do). To start, let's create a file
called
binary-heap.ts
and add a skeleton:
class BinaryHeap {
private elements: number[] = [];
isEmpty(): boolean {
throw new Error('unimplemented');
}
push(x: number) {
throw new Error('unimplemented');
}
peek(): number | null {
throw new Error('unimplemented');
}
pop(): number | null {
throw new Error('unimplemented');
}
}
Note that we've decided to return
null
in the event that a user peek
s or
pop
s an empty heap. Alternatively, we could throw an
error, or return something like Rust's Option
type.
We'll store our heap using the elements
array. Thus, to
check if the heap is empty, we simply need to check if the elements
array has length 0
:
isEmpty(): boolean {
return this.elements.length === 0;
}
Let's tackle pop
next. If the heap is empty, we'll simply
return null
. If it's not, we need to store the element at
index 0
(this is the root), move the element at the last
index (this is the bottom-most, left-most node) to index
0
, and “sift down”. Finally, we'll return the
previous root element.
class BinaryHeap {
// ...
pop(): number | null {
if (this.elements.length === 0) {
return null;
}
// Save the current root
const root = this.elements[0];
const last = this.elements.pop() as number;
if (root !== last) {
// Move the bottom-most, left-most node into the root's position
this.elements[0] = last;
// Sift down
this.siftDown();
}
return root;
}
private siftDown() {
throw new Error('unimplemented');
}
// ...
}
Recall how siftDown
works: we compare a parent node to
its children. If either of the children is greater than the parent, we
swap the parent with the greater of the two. We stop whenever
the current parent node is greater than or equal to its children (or
has no children). Here's what that looks like:
private siftDown() {
// Start at the root
let parentIndex = 0;
while (true) {
const lChildIndex = 2 * parentIndex + 1;
const rChildIndex = 2 * parentIndex + 2;
const parentValue = this.elements[parentIndex];
const lChildValue = this.elements[lChildIndex];
const rChildValue = this.elements[rChildIndex];
if (
(lChildValue && lChildValue > parentValue) ||
(rChildValue && rChildValue > parentValue)
) {
// A swap is required
if (rChildValue && rChildValue > lChildValue) {
// Swap with right child
this.elements[rChildIndex] = parentValue;
this.elements[parentIndex] = rChildValue;
parentIndex = rChildIndex;
} else {
// Swap with left child
this.elements[lChildIndex] = parentValue;
this.elements[parentIndex] = lChildValue;
parentIndex = lChildIndex;
}
} else {
// The parent is greater than its children (or has none)
break;
}
}
}
By way of a break, let's implement peek
next. Since the
root is stored at index 0
, peek
just needs
to return this value (or null
if the heap is empty).
peek(): number | null {
if (this.elements.length === 0) {
return null;
}
return this.elements[0];
}
All that now remains is push
:
class BinaryHeap {
// ...
push(x: number) {
// Add the new element at the bottom-most, left-most position in the tree
this.elements.push(x);
// Then sift up
this.siftUp();
}
private siftUp() {
throw new Error('unimplemented');
}
// ...
}
Compared to siftDown
, siftUp
is easy:
siftUp() {
// Start at the most recently added node
let focusedIndex = this.elements.length - 1;
while (focusedIndex > 0) {
const parentIndex = Math.floor((focusedIndex - 1) / 2);
const focusedValue = this.elements[focusedIndex];
const parentValue = this.elements[parentIndex];
if (focusedValue > parentValue) {
// A swap is required
this.elements[focusedIndex] = parentValue;
this.elements[parentIndex] = focusedValue;
focusedIndex = parentIndex;
} else {
// The focused node is not greater than its parent
break;
}
}
}
Here's the complete implementation:
class BinaryHeap {
private elements: number[] = [];
isEmpty(): boolean {
return this.elements.length === 0;
}
push(x: number) {
this.elements.push(x);
this.siftUp();
}
private siftUp() {
let focusedIndex = this.elements.length - 1;
while (focusedIndex > 0) {
const parentIndex = Math.floor((focusedIndex - 1) / 2);
const focusedValue = this.elements[focusedIndex];
const parentValue = this.elements[parentIndex];
if (focusedValue > parentValue) {
this.elements[focusedIndex] = parentValue;
this.elements[parentIndex] = focusedValue;
focusedIndex = parentIndex;
} else {
break;
}
}
}
peek(): number | null {
if (this.elements.length === 0) {
return null;
}
return this.elements[0];
}
pop(): number | null {
if (this.elements.length === 0) {
return null;
}
const root = this.elements[0];
const last = this.elements.pop() as number;
if (root !== last) {
this.elements[0] = last;
this.siftDown();
}
return root;
}
private siftDown() {
let parentIndex = 0;
while (true) {
const lChildIndex = 2 * parentIndex + 1;
const rChildIndex = 2 * parentIndex + 2;
const parentValue = this.elements[parentIndex];
const lChildValue = this.elements[lChildIndex];
const rChildValue = this.elements[rChildIndex];
if (
(lChildValue && lChildValue > parentValue) ||
(rChildValue && rChildValue > parentValue)
) {
if (rChildValue && rChildValue > lChildValue) {
this.elements[rChildIndex] = parentValue;
this.elements[parentIndex] = rChildValue;
parentIndex = rChildIndex;
} else {
this.elements[lChildIndex] = parentValue;
this.elements[parentIndex] = lChildValue;
parentIndex = lChildIndex;
}
} else {
break;
}
}
}
}
Generalizing the Element Type
At the moment, our BinaryHeap
is not very useful. It can
only store numbers, whereas in many cases we'd like to store richer
types of things. Fortunately, it's not difficult to generalize our
current class to handle any type of element as long as it can be
compared.
First, let's parameterize our class by some type variable
A
:
class BinaryHeap<A> {
// ...
}
We need to update the type of the elements
array: where
it previously stored only numbers, we'll now store
<A>
s. We'll also have the user provide a comparator
function when constructing a heap, which we'll store for later use in
siftUp
and siftDown
.
class BinaryHeap<A> {
private elements: A[] = [];
constructor(private greaterThan: (x: A, y: A) => boolean) {}
// ...
}
Now all that remains is to replace each appropriate occurrence of
>
with the user-provided
greaterThan
function. We also need to update the
signatures of push
, pop
, and
peek
to expect and return elements of type
A
, instead of number
s:
class BinaryHeap<A> {
private elements: A[] = [];
constructor(private greaterThan: (x: A, y: A) => boolean) {}
isEmpty(): boolean {
return this.elements.length === 0;
}
push(x: A) {
this.elements.push(x);
this.siftUp();
}
private siftUp() {
let focusedIndex = this.elements.length - 1;
while (focusedIndex > 0) {
const parentIndex = Math.floor((focusedIndex - 1) / 2);
const focusedValue = this.elements[focusedIndex];
const parentValue = this.elements[parentIndex];
if (this.greaterThan(focusedValue, parentValue)) {
this.elements[focusedIndex] = parentValue;
this.elements[parentIndex] = focusedValue;
focusedIndex = parentIndex;
} else {
break;
}
}
}
peek(): A | null {
if (this.elements.length === 0) {
return null;
}
return this.elements[0];
}
pop(): A | null {
if (this.elements.length === 0) {
return null;
}
const root = this.elements[0];
const last = this.elements.pop() as A;
if (root !== last) {
this.elements[0] = last;
this.siftDown();
}
return root;
}
private siftDown() {
let parentIndex = 0;
while (true) {
const lChildIndex = 2 * parentIndex + 1;
const rChildIndex = 2 * parentIndex + 2;
const parentValue = this.elements[parentIndex];
const lChildValue = this.elements[lChildIndex];
const rChildValue = this.elements[rChildIndex];
if (
(lChildValue && this.greaterThan(lChildValue, parentValue)) ||
(rChildValue && this.greaterThan(rChildValue, parentValue))
) {
if (rChildValue && this.greaterThan(rChildValue, lChildValue)) {
this.elements[rChildIndex] = parentValue;
this.elements[parentIndex] = rChildValue;
parentIndex = rChildIndex;
} else {
this.elements[lChildIndex] = parentValue;
this.elements[parentIndex] = lChildValue;
parentIndex = lChildIndex;
}
} else {
break;
}
}
}
}
Now, given some Task
type, perhaps defined like so:
interface Task {
priority: number;
name: string;
}
we can create a BinaryHeap
capable of storing tasks.
const heap = new BinaryHeap<Task>((t1, t2) => t1.priority > t2.priority);
heap.push({ priority: 4, name: 'warning' });
heap.push({ priority: 8, name: 'meltdown' });
heap.push({ priority: 1, name: 'todo' });
heap.push({ priority: 9, name: '!' });
console.log(heap.pop()); // => { priority: 9, name: '!' }
console.log(heap.pop()); // => { priority: 8, name: 'meltdown' }
A Real World Example
Binary heaps often appear in “real-time” systems in which
tasks are constantly created and need to be handled according to their
priority. As an example, React uses a
binary heap
in its scheduler
package. The scheduler determines when
various UI-related updates should be performed.